python - Numpy multiplication much slower after reversal

Question

Welcome To Ask or Share your Answers For Others

python - Numpy multiplication much slower after reversal

asked Feb 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Numpy multiplication much slower after reversal

I was multiplying two numpy arrays:

import numpy as np
X = np.random.randn(4500,3500)
v = np.random.randn(3500,200)

Both of them are C_CONTIGUOUS by default:

X.flags
# C_CONTIGUOUS : True
v.flags
# C_CONTIGUOUS : True

And the multiplication is fast:

%timeit X @ v
# 41 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

However, if I reverse X array than something weird happens:

%timeit X[::-1,::-1] @ v
# 3.97 s ± 54.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Questions:

This post says that reversion operation creates a view. The resulting view is neither C_CONTIGUOUS nor F_CONTIGUOUS. What does it mean ?

X[::-1,::-1].flags
# C_CONTIGUOUS : False
# F_CONTIGUOUS : False

Why the reversion operation slows down multiplication so badly ?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-06T00:21:35+0000

A c_contiguous arrays is an array represented as a row-major scan over a contiguous buffer. When you create a reversed view the array, this is no longer the case, and so the array is no longer c_contiguous.

As for why the operation is slower over a reversed array, computational details like this will generally vary depending on your system's BLAS/LAPACK installations. In this case, I suspect your BLAS installation has optimized code-paths for the common case of matrix products over contiguous buffers, and does not have optimized code paths for operations over non-contiguous buffers, which are less common.

Indeed, running this on a machine with numpy built against ubuntu's libblas gives the following:

%timeit X @ v
# 1 loop, best of 3: 200 ms per loop
%timeit X[::-1,::-1] @ v
# 1 loop, best of 3: 4.64 s per loop

while running on a machine with numpy built against MKL shows different behavior:

%timeit X @ v                                                                          
# 92.6 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit X[::-1,::-1] @ v                                                               
# 128 ms ± 2.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(different IPython versions account for the different %timeit outputs)

Categories

python - Numpy multiplication much slower after reversal

python - Numpy multiplication much slower after reversal

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags