cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

Question

Welcome To Ask or Share your Answers For Others

cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

I have a strange problem which origin I cannot determine:

I have a working Kernel for some special Matrix-Vector-multiplication, which I want to speed up. Basically the big matrix (10^6 times 10^6) is constructed from few small matrices. So I want to put that data in shared memory. However when I try to add the shared memory, I only get the error:

pycuda._driver.LogicError: cuLaunchKernel failed: invalid value

So my working kernel is:

#define FIELD_SIZE {field}
#define BLOCK_SIZE {block}

__global__ void MatrixMulKernel(double *gpu_matrix, double *gpu_b, double *gpu_y)
{
    int tx = ... + threadIdx.x;

    if(tx < FIELD_SIZE*FIELD_SIZE*BLOCK_SIZE) 
    { ... multiplication ... }
}

And if I try to add the shared memory part it looks like

#define FIELD_SIZE {field}
#define BLOCK_SIZE {block}

__global__ void MatrixMulKernel(double *gpu_matrix_ptr, double *gpu_b, double *gpu_y)
{
    __shared__ double gpu_matrix[BLOCK_SIZE*BLOCK_SIZE*13];

    int tx = ... + threadIdx.x;
    if(tx < BLOCK_SIZE*BLOCK_SIZE*13) {  gpu_matrix[tx] = gpu_matrix_ptr[tx];  }
    __syncthreads();

    if(tx < FIELD_SIZE*FIELD_SIZE*BLOCK_SIZE) 
    { ... multiplication ... }
}

This is the only part I changed, so basically it has to be the gpu_matrix[tx] = gpu_matrix_ptr[tx] statement, hasnt it? But I fail to see how that should be. I basically tried to copy the tiled matrix-multiplication example from the pycuda examples. http://wiki.tiker.net/PyCuda/Examples/MatrixmulTiled

The invocation is:

self.kernel.prepare([np.intp, np.intp, np.intp])
self.kernel.prepared_call(grid_shape,
              block_shape,
              self.matrix_gpu.gpudata,
              b_gpu.gpudata,
              y_gpu.gpudata)

where matrix_gpu, b_gpu and y_gpu are pycuda.gpuarray instances.

Hope that you can clear up some of my confusion...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:05:51+0000

According to your description, the shared mem your allocated is too big.

__shared__ double gpu_matrix[BLOCK_SIZE*BLOCK_SIZE*13];

shared mem is one of the hardware resources of cuda gpu. the total size is about 48KBytes, which you can not increase.

CUDA actually provides a tool in the following dir to help you calculate the hardware resources you can use.

$CUDA_ROOT/tools/CUDA_Occupancy_Calculator.xls

On the other hand, the size of shared mem required by mat-vec-mul-like kernels should be able to reduce from O(BLOCK_SIZE^2) to O(BLOCK_SIZE). You may want to read code of some successful mat-vec-mul kernels such as MAGMA before implement your own.

Categories

cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

Categories

cuda - pycuda shared memory error &quot;pycuda._driver.LogicError: cuLaunchKernel failed: invalid value&quot;

cuda - pycuda shared memory error &quot;pycuda._driver.LogicError: cuLaunchKernel failed: invalid value&quot;

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags

cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"

cuda - pycuda shared memory error "pycuda._driver.LogicError: cuLaunchKernel failed: invalid value"