Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
294 views
in Technique[技术] by (71.8m points)

c++ - OpenBLAS will only use 4 threads, though 32 are available

Recently, I have installed OpenBLAS using the Windows Subsystem for Linux on Windows 10 so that I could run optimised matrix calculations in C++, however I don't think the library is making full use of the hardware I am running it on.

For example, if I run a simple dgemm call to multiply two 10,000x10,000 matrices, it takes roughly 10-11 seconds to run, while numpy on exactly the same size of matrix, using the same datatype (double/float64), takes only 4-5 seconds. Looking in task manager, it appears that numpy is able to use roughly 16 of my 32 threads, while OpenBLAS only uses 4 (this was confirmed when I ran openblas_get_num_threads())

Even after explicitly telling OpenBLAS to use more, I still get 4 threads being used, as shown in the code below:

openblas_set_num_threads(8); // This should set the number of OpenBLAS threads to 8
goto_set_num_threads(8); // This should also set the number of OpenBLAS threads to 8

std::cout << "OpenBLAS number of threads: " << openblas_get_num_threads() << "
"; // Always gives 4
std::cout << "Number of cores: " << openblas_get_num_procs() << "
"; // 32 (correct)
std::cout << "Parallel type: " << openblas_get_parallel() << "
"; // 1 -- Default parallel type -- i.e. no OpenMP

My questions is, is there a hard-coded limit of 4 threads set in the libopenblas.lib file or elsewhere, or is there something I can do to make the dgemm call run on more threads and boost performance, ideally reaching or exceeding numpy's time?

Thanks in advance

=========== EDIT ===========

I have played around with this some more, and found that there is, in fact, a limit of 4 threads being set, however I can't find a way to change this. I tried setting it in the make configuration like this: make MAX_THREADS=32 ...... but this hasn't changed anything. Is there some way of fixing this?

Here is how I found that there is a set limit of 4:

std::cout << "Config type: " << openblas_get_config() << "
"; // ... MAX_THREADS=4

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...