Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
928 views
in Technique[技术] by (71.8m points)

c# - Parallel.For() with Interlocked.CompareExchange(): poorer performance and slightly different results to serial version

I experimented with calculating the mean of a list using Parallel.For(). I decided against it as it is about four times slower than a simple serial version. Yet I am intrigued by the fact that it does not yield exactly the same result as the serial one and I thought it would be instructive to learn why.

My code is:

public static double Mean(this IList<double> list)
{
        double sum = 0.0;


        Parallel.For(0, list.Count, i => {
                        double initialSum;
                        double incrementedSum;
                        SpinWait spinWait = new SpinWait();

                        // Try incrementing the sum until the loop finds the initial sum unchanged so that it can safely replace it with the incremented one.
                        while (true) {
                            initialSum = sum;
                            incrementedSum = initialSum + list[i];
                            if (initialSum == Interlocked.CompareExchange(ref sum, incrementedSum, initialSum)) break;
                            spinWait.SpinOnce();
                        }
                     });

        return sum / list.Count;
    }

When I run the code on a random sequence of 2000000 points, I get results that are different in the last 2 digits to the serial mean.

I searched stackoverflow and found this: VB.NET running sum in nested loop inside Parallel.for Synclock loses information. My case, however, is different to the one described there. There a thread-local variable temp is the cause of inaccuracy, but I use a single sum that is updated (I hope) according to the textbook Interlocked.CompareExchange() pattern. The question is of course moot because of the poor performance (which surprises me, but I am aware of the overhead), yet I am curious whether there is something to be learnt from this case.

Your thoughts are appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using double is the underlying problem, you can feel better about the synchronization not being the cause by using long instead. The results you got are in fact correct but that never makes a programmer happy.

You discovered that floating point math is communicative but not associative. Or in other words, a + b == b + a but a + b + c != a + c + b. Implicit in your code that the order in which the numbers are added is quite random.

This C++ question talks about it as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...