Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

python - CUDA_ERROR_OUT_OF_MEMORY: out of memory (NOT DURING TRAINING)

I have been using Tensorflow 2.3.0 for quite some time with cuda 10.1 and CUDNN 7.6.5 on Windows 10.

Driver API nvidia-smi
Thu Jan  7 15:50:14 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.09       Driver Version: 461.09       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P8     8W /  N/A |     92MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Runtime API nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:12:52_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.1, V10.1.243
GPU: NVIDIA GeForce GTX 1060 with Max-Q Design 

I have been able to train Tensorflow models and run inference just fine. A few days back I am getting a "CUDA_ERROR_OUT_OF_MEMORY: out of memory" for just running inference on models which I could run inference on before. The code that runs inference has not changed either. Could there be some other process that is now filling the CUDA memory? I have already tried removing CUDA and cuDNN and reinstalling it.

Here are the log of the error when I run inference

I also ran cuda-memcheck to check if there were any leaks or not.

Here are the logs of cuda-memcheck --leak-check full

Any help is much appreciated!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...