A c_contiguous arrays is an array represented as a row-major scan over a contiguous buffer. When you create a reversed view the array, this is no longer the case, and so the array is no longer c_contiguous.
As for why the operation is slower over a reversed array, computational details like this will generally vary depending on your system's BLAS/LAPACK installations. In this case, I suspect your BLAS installation has optimized code-paths for the common case of matrix products over contiguous buffers, and does not have optimized code paths for operations over non-contiguous buffers, which are less common.
Indeed, running this on a machine with numpy built against ubuntu's libblas gives the following:
%timeit X @ v
# 1 loop, best of 3: 200 ms per loop
%timeit X[::-1,::-1] @ v
# 1 loop, best of 3: 4.64 s per loop
while running on a machine with numpy built against MKL shows different behavior:
%timeit X @ v
# 92.6 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit X[::-1,::-1] @ v
# 128 ms ± 2.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
(different IPython versions account for the different %timeit
outputs)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…