I have implemented a function in Fortran and C++ each:
#include <math.h>
void dbl_sqrt_c(double *x, double *y){
*y = sqrt(*x - 1.0);
return;
}
pure subroutine my_dbl_sqrt(x,y) bind(c, name="dbl_sqrt_fort")
USE, INTRINSIC :: ISO_C_BINDING
implicit none
real(kind=c_double), intent(in) :: x
real(kind=c_double), intent(out) :: y
y = sqrt(x - 1d0)
end subroutine my_dbl_sqrt
I compared them in the compiler explorer:
Fortran: https://godbolt.org/z/froz4rx97
C++: https://godbolt.org/z/45aex99Yz
And the way I read the assembler, they do basically the same thing, but C++ checks whether the argument of the sqrt is negative, which Fortran doesn't. I compared their performance using googles benchmark, but they are pretty evenly matched:
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
bm_dbl_c/8 2.07 ns 2.07 ns 335965892
bm_dbl_fort/8 2.06 ns 2.06 ns 338643106
Here is the interesting part. If I turn this into integer based functions:
void int_sqrt_c(int *x, int *y){
*y = floor(sqrt(*x - 1.0));
return;
}
and
pure subroutine my_int_root(x,y) bind(c, name="int_sqrt_fort")
USE, INTRINSIC :: ISO_C_BINDING
implicit none
integer(kind=c_int), intent(in) :: x
integer(kind=c_int), intent(out) :: y
y = floor(sqrt(x - 1d0))
end subroutine my_int_root
Then this is where they start to diverge:
--------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------
bm_int_c/8 3.05 ns 3.05 ns 229239198
bm_int_fort/8 2.13 ns 2.13 ns 328933185
The Fortran code seems not significantly slower by this change, but the C++ code slowed down by 50%. This seems quite large. These are the assemblies:
Fortran: https://godbolt.org/z/axqqrc5E1
C++: https://godbolt.org/z/h7K75oKbn
The Fortran assembly seems pretty straight forward. It just adds conversion between double
and int
and not much else, but C++ seems to do a lot more, which I don't full understand.
Why is the C++ assembler so much more complicated? How could I improve the C++ code to achieve matching performance?
question from:
https://stackoverflow.com/questions/67046739/comparing-fortran-c-assembler-for-int-floorsqrt