c++ - Loss of precision for int to float conversion

Question

Welcome To Ask or Share your Answers For Others

c++ - Loss of precision for int to float conversion

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

c++ - Loss of precision for int to float conversion

In C++, the conversion of an integer value of type I to a floating point type F will be exact — as static_cast<I>(static_cast<F>(i)) == i — if the range of I is a part of the range of integral values of F.

Is it possible, and if yes how, to calculate the loss of precision of static_cast<F>(i) (without using another floating point type with a wider range)?

As a start, I tried to code a function that would return if a conversion is safe or not (safe, meaning no loss of precision), but I must admit I am not so sure about its correctness.

template <class F, class I>
bool is_cast_safe(I value)
{
    return std::abs(alue) < std::numeric_limits<F>::digits;
}

std::cout << is_cast_safe<float>(4) << std::endl; // true
std::cout << is_cast_safe<float>(0x1000001) << std::endl; // false

Thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:05:47+0000

is_cast_safe can be implemented with:

static const F One = 1;
F ULP = std::scalbn(One, std::ilogb(value) - std::numeric_limits<F>::digits + 1);
I U = std::max(ULP, One);
return value % U;

This sets ULP to the value of the least digit position in the result of converting value to F. ilogb returns the position (as an exponent of the floating-point radix) for the highest digit position, and subtracting one less than the number of digits adjusts to the lowest digit position. Then scalbn gives us the value of that position, which is the ULP.

Then value can be represented exactly in F if and only if it is a multiple of the ULP. To test that, we convert the ULP to I (but substitute 1 if it is less than 1), and then take the remainder of value divided by the ULP (or 1).

Also, if one is concerned the conversion to F might overflow, code can be inserted to handle this as well.

Calculating the actual amount of the change is trickier. The conversion to floating-point could round up or down, and the rule for choosing is implementation-defined, although round-to-nearest-ties-to-even is common. So the actual change cannot be calculated from the floating-point properties we are given in numeric_limits. It must involve performing the conversion and doing some work in floating-point. This definitely can be done, but it is a nuisance. I think an approach that should work is:

Assume value is non-negative. (Negative values can be handled similarly but are omitted for now for simplicity.)
First, test for overflow in conversion to F. This in itself is tricky, as the behavior is undefined if the value is too large. Some similar considerations were addressed in this answer to a question about safely converting from floating-point to integer (in C).
If the value does not overflow, then convert it. Let the result be x. Divide x by the floating-point radix r, producing y. If y is not an integer (which can be tested using fmod or trunc) the conversion was exact.
Otherwise, convert y to I, producing z. This is safe because y is less than the original value, so it must fit in I.
Then the error due to conversion is (z-value/r)*r + value%r.

Categories

c++ - Loss of precision for int to float conversion

c++ - Loss of precision for int to float conversion

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags