Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
175 views
in Technique[技术] by (71.8m points)

c++ - Why is this C++11 code containing rand() slower with multiple threads than with one?

I'm trying around on the new C++11 threads, but my simple test has abysmal multicore performance. As a simple example, this program adds up some squared random numbers.

#include <iostream>
#include <thread>
#include <vector>
#include <cstdlib>
#include <chrono>
#include <cmath>

double add_single(int N) {
    double sum=0;
    for (int i = 0; i < N; ++i){
        sum+= sqrt(1.0*rand()/RAND_MAX);
    }
    return sum/N;
}

void add_multi(int N, double& result) {
    double sum=0;
    for (int i = 0; i < N; ++i){
        sum+= sqrt(1.0*rand()/RAND_MAX);
    }
    result = sum/N;
}

int main() {
    srand (time(NULL));
    int N = 1000000;

    // single-threaded
    auto t1 = std::chrono::high_resolution_clock::now();
    double result1 = add_single(N);
    auto t2 = std::chrono::high_resolution_clock::now();
    auto time_elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout << "time single: " << time_elapsed << std::endl;

    // multi-threaded
    std::vector<std::thread> th;
    int nr_threads = 3;
    double partual_results[] = {0,0,0};
    t1 = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < nr_threads; ++i) 
        th.push_back(std::thread(add_multi, N/nr_threads, std::ref(partual_results[i]) ));
    for(auto &a : th)
        a.join();
    double result_multicore = 0;
    for(double result:partual_results)
        result_multicore += result;
    result_multicore /= nr_threads;
    t2 = std::chrono::high_resolution_clock::now();
    time_elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    std::cout << "time multi: " << time_elapsed << std::endl;

    return 0;
}

Compiled with 'g++ -std=c++11 -pthread test.cpp' on Linux and a 3core machine, a typical result is

time single: 33
time multi: 565

So the multi threaded version is more than an order of magnitude slower. I've used random numbers and a sqrt to make the example less trivial and prone to compiler optimizations, so I'm out of ideas.

edit:

  1. This problem scales for larger N, so the problem is not the short runtime
  2. The time for creating the threads is not the problem. Excluding it does not change the result significantly

Wow I found the problem. It was indeed rand(). I replaced it with an C++11 equivalent and now the runtime scales perfectly. Thanks everyone!

question from:https://stackoverflow.com/questions/16716005/why-is-this-c11-code-containing-rand-slower-with-multiple-threads-than-with

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

On my system the behavior is same, but as Maxim mentioned, rand is not thread safe. When I change rand to rand_r, then the multi threaded code is faster as expected.

void add_multi(int N, double& result) {
double sum=0;
unsigned int seed = time(NULL);
for (int i = 0; i < N; ++i){
    sum+= sqrt(1.0*rand_r(&seed)/RAND_MAX);
}
result = sum/N;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...