gcc - Why is fetestexcept in C++ compiled to a function call rather than inlined

Question

Welcome To Ask or Share your Answers For Others

gcc - Why is fetestexcept in C++ compiled to a function call rather than inlined

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

gcc - Why is fetestexcept in C++ compiled to a function call rather than inlined

I am evaluating the usage (clearing and querying) of Floating-Point Exceptions in performance-critical/"hot" code. Looking at the binary produced I noticed that neither GCC nor Clang expand the call to an inline sequence of instructions that I would expect; instead they seem to generate a call to the runtime library. This is prohibitively expensive for my application.

Consider the following minimal example:

#include <fenv.h>
#pragma STDC FENV_ACCESS on

inline int fetestexcept_inline(int e)
{
  unsigned int mxcsr;
  asm volatile ("vstmxcsr" " %0" : "=m" (*&mxcsr));
  return mxcsr & e & FE_ALL_EXCEPT;
}

double f1(double a)
{
    double r = a * a;
    if(r == 0 || fetestexcept_inline(FE_OVERFLOW)) return -1;
    else return r;
}

double f2(double a)
{
    double r = a * a;
    if(r == 0 || fetestexcept(FE_OVERFLOW)) return -1;
    else return r;
}

And the output as produced by GCC: https://godbolt.org/z/jxjzYY

The compiler seems to know that he can use the CPU-family-dependent AVX-instructions for the target (it uses "vmulsd" for the multiplication). However, no matter which optimization flags I try, it will always produce the much more expensive function call to glibc rather than the assembly that (as far as I understand) should do what the corresponding glibc function does.

This is not intended as a complaint, I am OK with adding the inline assembly. I just wonder whether there might be a subtle difference that I am overlooking that could be a bug in the inline-assembly-version.

question from:https://stackoverflow.com/questions/65891873/why-is-fetestexcept-in-c-compiled-to-a-function-call-rather-than-inlined

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:18:48+0000

It's required to support long double arithmetic. fetestexcept needs to merge the SSE and FPU states because long double operations only update the FPU state, but not the MXSCR register. Therefore, the benefit from inlining is somewhat reduced.

Categories

gcc - Why is fetestexcept in C++ compiled to a function call rather than inlined

gcc - Why is fetestexcept in C++ compiled to a function call rather than inlined

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags