c++ - OpenBLAS will only use 4 threads, though 32 are available

Question

Welcome To Ask or Share your Answers For Others

c++ - OpenBLAS will only use 4 threads, though 32 are available

asked Feb 19, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - OpenBLAS will only use 4 threads, though 32 are available

Recently, I have installed OpenBLAS using the Windows Subsystem for Linux on Windows 10 so that I could run optimised matrix calculations in C++, however I don't think the library is making full use of the hardware I am running it on.

For example, if I run a simple dgemm call to multiply two 10,000x10,000 matrices, it takes roughly 10-11 seconds to run, while numpy on exactly the same size of matrix, using the same datatype (double/float64), takes only 4-5 seconds. Looking in task manager, it appears that numpy is able to use roughly 16 of my 32 threads, while OpenBLAS only uses 4 (this was confirmed when I ran openblas_get_num_threads())

Even after explicitly telling OpenBLAS to use more, I still get 4 threads being used, as shown in the code below:

openblas_set_num_threads(8); // This should set the number of OpenBLAS threads to 8
goto_set_num_threads(8); // This should also set the number of OpenBLAS threads to 8

std::cout << "OpenBLAS number of threads: " << openblas_get_num_threads() << "
"; // Always gives 4
std::cout << "Number of cores: " << openblas_get_num_procs() << "
"; // 32 (correct)
std::cout << "Parallel type: " << openblas_get_parallel() << "
"; // 1 -- Default parallel type -- i.e. no OpenMP

My questions is, is there a hard-coded limit of 4 threads set in the libopenblas.lib file or elsewhere, or is there something I can do to make the dgemm call run on more threads and boost performance, ideally reaching or exceeding numpy's time?

Thanks in advance

=========== EDIT ===========

I have played around with this some more, and found that there is, in fact, a limit of 4 threads being set, however I can't find a way to change this. I tried setting it in the make configuration like this: make MAX_THREADS=32 ...... but this hasn't changed anything. Is there some way of fixing this?

Here is how I found that there is a set limit of 4:

std::cout << "Config type: " << openblas_get_config() << "
"; // ... MAX_THREADS=4

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

c++ - OpenBLAS will only use 4 threads, though 32 are available

c++ - OpenBLAS will only use 4 threads, though 32 are available

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags