Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

python - Numpy multiplication much slower after reversal

I was multiplying two numpy arrays:

import numpy as np
X = np.random.randn(4500,3500)
v = np.random.randn(3500,200)

Both of them are C_CONTIGUOUS by default:

X.flags
# C_CONTIGUOUS : True
v.flags
# C_CONTIGUOUS : True

And the multiplication is fast:

%timeit X @ v
# 41 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

However, if I reverse X array than something weird happens:

%timeit X[::-1,::-1] @ v
# 3.97 s ± 54.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Questions:

  1. This post says that reversion operation creates a view. The resulting view is neither C_CONTIGUOUS nor F_CONTIGUOUS. What does it mean ?
X[::-1,::-1].flags
# C_CONTIGUOUS : False
# F_CONTIGUOUS : False
  1. Why the reversion operation slows down multiplication so badly ?

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A c_contiguous arrays is an array represented as a row-major scan over a contiguous buffer. When you create a reversed view the array, this is no longer the case, and so the array is no longer c_contiguous.

As for why the operation is slower over a reversed array, computational details like this will generally vary depending on your system's BLAS/LAPACK installations. In this case, I suspect your BLAS installation has optimized code-paths for the common case of matrix products over contiguous buffers, and does not have optimized code paths for operations over non-contiguous buffers, which are less common.

Indeed, running this on a machine with numpy built against ubuntu's libblas gives the following:

%timeit X @ v
# 1 loop, best of 3: 200 ms per loop
%timeit X[::-1,::-1] @ v
# 1 loop, best of 3: 4.64 s per loop

while running on a machine with numpy built against MKL shows different behavior:

%timeit X @ v                                                                          
# 92.6 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit X[::-1,::-1] @ v                                                               
# 128 ms ± 2.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(different IPython versions account for the different %timeit outputs)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...