General information
A lot of functions have implicit multi-threading built-in, making a parfor
loop not more efficient, when using these functions, than a serial for
loop, since all cores are already being used. parfor
will actually be a detriment in this case, since it has the allocation overhead, whilst being as parallel as the function you are trying to use.
When not using one of the implicitly multithreaded functions parfor
is basically recommended in two cases: lots of iterations in your loop (i.e., like 1e10
), or if each iteration takes a very long time (e.g., eig(magic(1e4))
). In the second case you might want to consider using spmd
(slower than parfor
in my experience). The reason parfor
is slower than a for
loop for short ranges or fast iterations is the overhead needed to manage all workers correctly, as opposed to just doing the calculation.
Check this question for information on splitting data between separate workers.
Benchmarking
Code
Consider the following example to see the behaviour of for
as opposed to that of parfor
. First open the parallel pool if you've not already done so:
gcp; % Opens a parallel pool using your current settings
Then execute a couple of large loops:
n = 1000; % Iteration number
EigenValues = cell(n,1); % Prepare to store the data
Time = zeros(n,1);
for ii = 1:n
tic
EigenValues{ii,1} = eig(magic(1e3)); % Might want to lower the magic if it takes too long
Time(ii,1) = toc; % Collect time after each iteration
end
figure; % Create a plot of results
plot(1:n,t)
title 'Time per iteration'
ylabel 'Time [s]'
xlabel 'Iteration number[-]';
Then do the same with parfor
instead of for
. You will notice that the average time per iteration goes up (0.27s to 0.39s for my case). Do realise however that the parfor
used all available workers, thus the total time (sum(Time)
) has to be divided by the number of cores in your computer. So for my case the total time went down from around 270s to 49s, since I have an octacore processor.
So, whilst the time to do each separate iteration goes up using parfor
with respect to using for
, the total time goes down considerably.
Results
This picture shows the results of the test as I just ran it on my home PC. I used n=1000
and eig(500)
; my computer has an I5-750 2.66GHz processor with four cores and runs MATLAB R2012a. As you can see the average of the parallel test hovers around 0.29s with a lot of spread, whilst the serial code is quite steady around 0.24s. The total time, however, went down from 234s to 72s, which is a speed up of 3.25 times. The reason that this is not exactly 4 is the memory overhead, as expressed in the extra time each iteration takes. The memory overhead is due to MATLAB having to check what each core is doing and making sure that each loop iteration is performed only once and that the data is put into the correct storage location.