[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

Python/Python Error

[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

도도걸만단 2025. 3. 21. 06:57

[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

결론 : 눈뜬 장님마냥

# export CUDA_VISIBLE_DEVICES=6

코드상 이렇게 맨 첫 줄 적혀있는걸 못보고 터미널만 쳐다보다가 .....

0으로 바꾸고(또는 주석처리) 해결했습니다

내 상황 :

RuntimeError: No CUDA GPUs are available 에러가 떴다.

1) nvidia-smi를 쳐보았을 때 CUDA는 12.4였고 Pytorch 공식 문서에서 버전에 맞는 Pytorch를 다운받았다. 근데 여전히 에러

# CUDA 11.8
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1  pytorch-cuda=11.8 -c pytorch -c nvidia
# CUDA 12.1
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia
# CUDA 12.4
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia
# CPU Only
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 cpuonly -c pytorch

공식 문서 :

https://pytorch.org/get-started/previous-versions/

Previous PyTorch Versions

Installing previous versions of PyTorch

pytorch.org

2) 알고보니 CUDA Toolkit가 11.8이었다. 그래서 12.4로 똑같이 맞춰줌. 방법은 아래 작성한 것과 같다. 근데 또 안됨. 에 러.

3) 그와중에 diffusers, hugging face 충돌함. 또 다른 에러. huggingface cached_download¿¿ 이런게 있는데 이건 현재 버전에선 삭제되어 구버전을 요구하고, diffusers에서는 KL~~이 최신버전을 요구해서 둘 다를 만족하는 라이브러리는 없다고 판단

4) 그래서 구버전을 요구하는 huggingface 관련 코드는 디버깅함 --> 방금 생긴 에러는 해결

5) 그렇게 눈뜨고 코드를 살펴보니 맨위에 cuda visible devices설정을 뒤늦게 발견 아뿔싸. --> 주석처리하고 RuntimeError: No CUDA GPUs are available 에러도 해결

6) 그런데 근본적으로 VLAM(GPU의 RAM이라고 한다) 이 논문과 성능이 달라 모델 못돌림 ...

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 88.00 MiB. GPU 0 has a total capacity of 21.95 GiB of which 36.12 MiB is free. Including non-PyTorch memory, this process has 21.91 GiB memory in use. Of the allocated memory 21.28 GiB is allocated by PyTorch, and 406.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

CUDA semantics — PyTorch 2.6 documentation

CUDA semantics torch.cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU, and all CUDA tensors you allocate will by default be created on that device. The selected device can be changed with a torch.cuda.device cont

pytorch.org

어쨌든 이랬답니다 트러블슈팅 기록이었구요

아래는 CUDA Toolkit 12.4로 업그레이드하는 두 가지 방법(apt repository 방식과 runfile 설치 방식)을 단계별로 터미널 명령어로 설명한 것이다.

※ 주의:

• GPU 드라이버는 이미 설치되어 있으므로, 드라이버는 제거하지 말고 CUDA Toolkit만 제거·업그레이드한다.

• 아래 명령어들은 Debian 11(또는 Ubuntu 기반) 시스템을 기준으로 하며, 배포판에 따라 명령어가 조금씩 다를 수 있다.

• 먼저, 기존 CUDA Toolkit(11.8)을 제거한 후 새 버전을 설치한다.

방법 1: apt repository 방식을 이용하여 CUDA Toolkit 12.4 설치

1. 기존 CUDA Toolkit 11.8 제거

(※ 설치된 패키지 이름이 다를 수 있으니, dpkg -l | grep cuda-toolkit으로 확인 후 진행)

sudo apt-get purge cuda-toolkit-11-8
sudo apt-get autoremove

2. NVIDIA CUDA keyring 패키지 다운로드 및 설치

(최신 버전의 keyring 패키지로 repository 인증을 진행)

wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb

3. apt 패키지 목록 업데이트

sudo apt-get update

4. CUDA Toolkit 12.4 설치

sudo apt-get install cuda-toolkit-12-4

5. 환경 변수 설정

~/.bashrc 파일에 다음 줄을 추가하고 적용한다:

export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH

변경 사항 적용:

source ~/.bashrc

6. 설치 확인

nvcc --version

출력에 CUDA 12.4가 표시되면 정상적으로 업데이트된 것이다.

방법 2: Runfile Installer 방식을 이용하여 CUDA Toolkit 12.4 설치

1. 기존 CUDA Toolkit 11.8 제거

(apt 방식과 동일하게 제거)

sudo apt-get purge cuda-toolkit-11-8
sudo apt-get autoremove

2. CUDA Toolkit 12.4 runfile 다운로드

NVIDIA CUDA 12.4 다운로드 페이지에서 최신 runfile 설치 링크를 확인한 후, 아래 예시와 같이 다운로드한다.

wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_linux.run

※ 실제 파일명과 링크는 NVIDIA 웹사이트에서 최신 정보를 확인할 것

3. Runfile Installer 실행

드라이버는 이미 설치되어 있으므로 드라이버 설치는 건너뛰도록 옵션을 선택한다.

sudo sh cuda_12.4.0_linux.run --override

설치 과정에서 CUDA Toolkit만 설치하도록 선택한다.

4. 환경 변수 설정

~/.bashrc 파일에 다음 내용을 추가 후 적용:

export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH
source ~/.bashrc

5. 설치 확인

nvcc --version

CUDA 12.4가 표시되면 정상적으로 업데이트된 것이다.

요약:

• GPU 드라이버는 이미 CUDA 12.4를 제공하므로, CUDA Toolkit만 12.4 버전으로 업데이트하면 된다.

• 위 두 가지 방법 중 한 가지를 선택하여 기존 CUDA Toolkit 11.8을 제거한 후, CUDA Toolkit 12.4를 설치하고 환경 변수를 업데이트하면 된다.

• 설치 후 nvcc --version 명령어로 버전을 확인하여 제대로 설치되었는지 검증하자.

이렇게 하면 PyTorch 2.4.1 (CUDA 12.4 빌드)와 시스템이 일치하게 되어, CUDA 관련 RuntimeError(“No CUDA GPUs are available”) 문제가 해결될 가능성이 높다.

nvcc -version 으로 자신의 버전 확인 가능

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

cached_download huggingface 오류

#  ms add이거 제거된부분 바꾸기
from huggingface_hub import HfFolder, hf_hub_download, model_info

# from huggingface_hub import HfFolder, cached_download, hf_hub_download, model_info

'Python > Python Error' 카테고리의 다른 글

[Error] ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device (0)	2025.03.07
[Error] ImportError: Cannot load backend 'TkAgg' which requires the 'tk' interactive framework, as 'headless' is currently running (0)	2025.03.03
[Python Error] 터미널 이상한 명령어에 갇혔을 때 빠져나오기 (0)	2024.09.10
[Python Error] git@gitlab.aicrowd.com: Permission denied (publickey).fatal: 리모트 저장소에서 읽을 수 없습니다 올바른 접근 권한이 있는지, 그리고 저장소가 있는지확인하십시오. (0)	2024.06.18
[Python Error] Mac Homebrew 설치 / Run these two commands in your terminal to add Homebrew to your PATH: (echo; echo 'eval "$(/opt/homebrew/bin/brew shellenv)"') (0)	2024.06.18

현재글[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

프로그래밍선

nvs, depth, ffmpeg, depth pro, code, 프로그래머스, streamlit, PIP, error, 경사하강법, SGD, OpenGL, novel view synthesis, 챗봇만들기, LLM, Depth estimation, cv2, tiled multiplane images for practical 3d photography, Python, tmpi,

Today :
Yesterday :

프로그래밍선

[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

결론 : 눈뜬 장님마냥

내 상황 :

아래는 CUDA Toolkit 12.4로 업그레이드하는 두 가지 방법(apt repository 방식과 runfile 설치 방식)을 단계별로 터미널 명령어로 설명한 것이다.

nvcc -version 으로 자신의 버전 확인 가능

cached_download huggingface 오류

'Python > Python Error' 카테고리의 다른 글

'Python/Python Error'의 다른글

티스토리툴바

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

[Error] GPU 있는데 인식안될때 / RuntimeError: No CUDA GPUs are available & torch.OutOfMemoryError: CUDA out of memory. Tried to allocate

결론 : 눈뜬 장님마냥

내 상황 :

아래는 CUDA Toolkit 12.4로 업그레이드하는 두 가지 방법(apt repository 방식과 runfile 설치 방식)을 단계별로 터미널 명령어로 설명한 것이다.

nvcc -version 으로 자신의 버전 확인 가능

cached_download huggingface 오류

'Python > Python Error' 카테고리의 다른 글

'Python/Python Error'의 다른글

관련글

티스토리툴바