Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
281 views
in Technique[技术] by (71.8m points)

gcc - Why is fetestexcept in C++ compiled to a function call rather than inlined

I am evaluating the usage (clearing and querying) of Floating-Point Exceptions in performance-critical/"hot" code. Looking at the binary produced I noticed that neither GCC nor Clang expand the call to an inline sequence of instructions that I would expect; instead they seem to generate a call to the runtime library. This is prohibitively expensive for my application.

Consider the following minimal example:

#include <fenv.h>
#pragma STDC FENV_ACCESS on

inline int fetestexcept_inline(int e)
{
  unsigned int mxcsr;
  asm volatile ("vstmxcsr" " %0" : "=m" (*&mxcsr));
  return mxcsr & e & FE_ALL_EXCEPT;
}

double f1(double a)
{
    double r = a * a;
    if(r == 0 || fetestexcept_inline(FE_OVERFLOW)) return -1;
    else return r;
}

double f2(double a)
{
    double r = a * a;
    if(r == 0 || fetestexcept(FE_OVERFLOW)) return -1;
    else return r;
}

And the output as produced by GCC: https://godbolt.org/z/jxjzYY

The compiler seems to know that he can use the CPU-family-dependent AVX-instructions for the target (it uses "vmulsd" for the multiplication). However, no matter which optimization flags I try, it will always produce the much more expensive function call to glibc rather than the assembly that (as far as I understand) should do what the corresponding glibc function does.

This is not intended as a complaint, I am OK with adding the inline assembly. I just wonder whether there might be a subtle difference that I am overlooking that could be a bug in the inline-assembly-version.

question from:https://stackoverflow.com/questions/65891873/why-is-fetestexcept-in-c-compiled-to-a-function-call-rather-than-inlined

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It's required to support long double arithmetic. fetestexcept needs to merge the SSE and FPU states because long double operations only update the FPU state, but not the MXSCR register. Therefore, the benefit from inlining is somewhat reduced.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...