Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
192 views
in Technique[技术] by (71.8m points)

performance - Comparing Fortran & C++ assembler for int = floor(sqrt(...))

I have implemented a function in Fortran and C++ each:

#include <math.h>

void dbl_sqrt_c(double *x, double *y){
   *y = sqrt(*x - 1.0);
   return;
}
pure subroutine my_dbl_sqrt(x,y) bind(c, name="dbl_sqrt_fort")
   USE, INTRINSIC :: ISO_C_BINDING
   implicit none
   real(kind=c_double), intent(in)  :: x
   real(kind=c_double), intent(out) :: y

   y = sqrt(x - 1d0)
end subroutine my_dbl_sqrt

I compared them in the compiler explorer:

Fortran: https://godbolt.org/z/froz4rx97
C++: https://godbolt.org/z/45aex99Yz

And the way I read the assembler, they do basically the same thing, but C++ checks whether the argument of the sqrt is negative, which Fortran doesn't. I compared their performance using googles benchmark, but they are pretty evenly matched:

--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
bm_dbl_c/8          2.07 ns         2.07 ns    335965892
bm_dbl_fort/8       2.06 ns         2.06 ns    338643106

Here is the interesting part. If I turn this into integer based functions:

void int_sqrt_c(int *x, int *y){
   *y = floor(sqrt(*x - 1.0));
   return;
}

and

pure subroutine my_int_root(x,y) bind(c, name="int_sqrt_fort")
   USE, INTRINSIC :: ISO_C_BINDING
   implicit none
   integer(kind=c_int), intent(in)  :: x
   integer(kind=c_int), intent(out) :: y

   y = floor(sqrt(x - 1d0))
end subroutine my_int_root

Then this is where they start to diverge:

--------------------------------------------------------
Benchmark              Time             CPU   Iterations
--------------------------------------------------------
bm_int_c/8          3.05 ns         3.05 ns    229239198
bm_int_fort/8       2.13 ns         2.13 ns    328933185

The Fortran code seems not significantly slower by this change, but the C++ code slowed down by 50%. This seems quite large. These are the assemblies:

Fortran: https://godbolt.org/z/axqqrc5E1
C++: https://godbolt.org/z/h7K75oKbn

The Fortran assembly seems pretty straight forward. It just adds conversion between double and int and not much else, but C++ seems to do a lot more, which I don't full understand.

Why is the C++ assembler so much more complicated? How could I improve the C++ code to achieve matching performance?

question from:https://stackoverflow.com/questions/67046739/comparing-fortran-c-assembler-for-int-floorsqrt

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

TL;DR: You're hobbled by bad defaults and compatibility with obsolete machines: Bad defaults are gcc setting errno for floating-point computations (despite this not being required by the C language), and compatibility with x86 machines that don't have any better SSE instructions than SSE2. If you want decent code generation, add -fno-math-errno -msse4 to the compiler flags.

Modern processor architectures that contain floating-point hardware typically offer square root calculation as a primitive operation (instruction), which sets an error flag in the floating-point environment if the operand of the square root instruction was out of range (negative). On the other hand, old instruction set architectures may not have had floating point instructions, or not have had hardware accelerated square root instructions, so the C language permits an implementation to set errno on an out of range argument instead, but errno being a thread-local memory location practically prevents any sane architecture from setting errno directly from the square root instruction. To get decent performance, gcc inlines the square root calculation by calling the hardware instruction (sqrtsd), but to set errno, it seperately checks the sign of the argument, and calls to the library function only in case the argument was negative, so the library function can set errno. Yes, this is crazy, but that in turn is par for the course. You can avoid this braindamage that nobody ever needs or wants by setting -fno-math-errno in the compiler flags.

Reasonably recent x86-64 processors have more instructions than were present in the original x86-64 as first developed by AMD (which included only SSE2 vector/floating-point instructions). Among the added instructions are float/integer conversion instructions that allow controlled rounding/truncation, so this doesn't have to be implemented in software. You can get gcc to use these new instructions by specifying a target that supports these instructions, for example by using the -msse4 compiler flag. Note that this will cause the generated program to fault if it is run on a target that doesn't support these instructions, so the generated program will be less portable (though it doesn't reduce portability of the source code obviously).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...