Development Artist

[Issue, GPU] Unable to determine the device handle for GPU0000:01:00.0: Unknown Error 본문

TroubleShooting/Linux Issue

[Issue, GPU] Unable to determine the device handle for GPU0000:01:00.0: Unknown Error

JMcunst 2025. 2. 12. 13:17
728x90
반응형

문제 상황

nvidia-smi 명령어를 쳤을때 에러 출력이 발생함

초기 점검

1. lspci 명령어 실행 결과

# lspci -vv -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
  • Unknown header type 7f가 표시됨.
  • Kernel driver in use: nvidia로 나오지만, 문제가 있을 가능성이 있음.

2. NVIDIA 장치 확인

# lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev ff)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev ff)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev ff)
01:00.3 Serial bus controller: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev ff)
  • GPU가 PCI 슬롯에서 감지되지만, 장치 상태가 올바르지 않을 수 있음.

3. 커널 버전 확인

# uname -r
5.15.0-118-generic
  • 사용 중인 커널 버전: 5.15.0-118-generic

4. 설치된 NVIDIA 드라이버 패키지 확인

 

# dpkg -l | grep nvidia 
ii  libnvidia-cfg1-535:amd64                   535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-535                       535.183.01-0ubuntu0.20.04.1          all          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-470:amd64                470.256.02-0ubuntu0.20.04.1          amd64        NVIDIA libcompute package
rc  libnvidia-compute-515:amd64                515.65.01-0ubuntu0.20.04.1           amd64        NVIDIA libcompute package
ii  libnvidia-compute-535:amd64                535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA libcompute package
ii  libnvidia-compute-535:i386                 535.183.01-0ubuntu0.20.04.1          i386         NVIDIA libcompute package
ii  libnvidia-container-tools                  1.13.5-1                             amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                 1.13.5-1                             amd64        NVIDIA container runtime library
ii  libnvidia-decode-535:amd64                 535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-decode-535:i386                  535.183.01-0ubuntu0.20.04.1          i386         NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-535:amd64                 535.183.01-0ubuntu0.20.04.1          amd64        NVENC Video Encoding runtime library
ii  libnvidia-encode-535:i386                  535.183.01-0ubuntu0.20.04.1          i386         NVENC Video Encoding runtime library
ii  libnvidia-extra-535:amd64                  535.183.01-0ubuntu0.20.04.1          amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-535:amd64                   535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-fbc1-535:i386                    535.183.01-0ubuntu0.20.04.1          i386         NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-535:amd64                     535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-gl-535:i386                      535.183.01-0ubuntu0.20.04.1          i386         NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
rc  nvidia-compute-utils-470                   470.256.02-0ubuntu0.20.04.1          amd64        NVIDIA compute utilities
rc  nvidia-compute-utils-515                   515.65.01-0ubuntu0.20.04.1           amd64        NVIDIA compute utilities
ii  nvidia-compute-utils-535                   535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA compute utilities
ii  nvidia-container-runtime                   3.13.0-1                             all          NVIDIA container runtime
ii  nvidia-container-toolkit                   1.13.5-1                             amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base              1.13.5-1                             amd64        NVIDIA Container Toolkit Base
rc  nvidia-dkms-470                            470.256.02-0ubuntu0.20.04.1          amd64        NVIDIA DKMS package
rc  nvidia-dkms-515                            515.65.01-0ubuntu0.20.04.1           amd64        NVIDIA DKMS package
ii  nvidia-dkms-535                            535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA DKMS package
ii  nvidia-docker2                             2.13.0-1                             all          nvidia-docker CLI wrapper
ii  nvidia-driver-535                          535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA driver metapackage
ii  nvidia-firmware-535-535.183.01             535.183.01-0ubuntu0.20.04.1          amd64        Firmware files used by the kernel module
rc  nvidia-kernel-common-470                   470.256.02-0ubuntu0.20.04.1          amd64        Shared files used with the kernel module
rc  nvidia-kernel-common-515                   515.65.01-0ubuntu0.20.04.1           amd64        Shared files used with the kernel module
ii  nvidia-kernel-common-535                   535.183.01-0ubuntu0.20.04.1          amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-535                   535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA kernel source package
ii  nvidia-prime                               0.8.16~0.20.04.2                     all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            470.57.01-0ubuntu0.20.04.3           amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-535                           535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                    0.18build1                           all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-535              535.183.01-0ubuntu0.20.04.1          amd64        NVIDIA binary Xorg driver
  • nvidia-driver-535이 설치되어 있음.
  • 여러 버전의 libnvidia-compute  nvidia-dkms 패키지가 있음.

5. NVIDIA 커널 모듈 확인

# lsmod | grep nvidia
nvidia_uvm           1540096  4
nvidia_drm             77824  3
nvidia_modeset       1306624  5 nvidia_drm
nvidia              56737792  214 nvidia_uvm,nvidia_modeset
drm_kms_helper        307200  1 nvidia_drm
drm                   618496  7 drm_kms_helper,nvidia,nvidia_drm
i2c_nvidia_gpu         16384  0
  • NVIDIA 커널 모듈이 로드됨.

6. dmesg 로그에서 에러 확인

# dmesg | grep -i nvidia
[4809942.908085] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4809943.908591] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4809947.424849] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810008.600146] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810009.600287] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810013.350475] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810252.953380] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810253.953973] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810257.701462] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810317.171089] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810318.171298] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810321.217548] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810556.922156] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810557.922658] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810561.669848] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810632.107968] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810633.108160] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810636.858012] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810863.910889] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810864.910924] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810868.290537] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4810941.092071] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4810942.092266] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4810945.438309] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4811174.912553] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4811175.912789] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4811179.663415] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4811244.089473] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4811245.090040] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4811248.839594] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4811478.967512] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4811479.967569] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4811483.661890] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4811549.093731] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4811550.094033] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4811553.764579] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4811781.921284] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4811782.921525] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4811786.669273] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4811855.087080] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4811856.087508] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4811859.772672] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4812088.925126] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4812089.925505] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4812093.674332] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4812160.152019] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4812161.152097] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4812164.782915] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4812398.952287] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4812399.952391] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4812403.691816] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4812467.192293] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4812468.192620] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4812471.942073] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4812699.975482] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4812700.975713] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4812704.694288] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4812772.154529] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4812773.154572] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4812776.905148] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813005.950457] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813006.950513] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813010.700962] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813080.106275] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813081.106466] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813084.852484] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813315.961125] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813316.961359] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813320.709066] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813392.127117] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813393.127264] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813396.731004] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813616.913903] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813617.914369] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813621.656412] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813694.055075] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813695.055115] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813698.762733] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4813922.917793] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4813923.917826] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4813927.653545] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814006.128028] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814007.128149] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814010.878082] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814223.943132] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814224.943497] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814228.691523] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814309.055861] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814310.056068] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814313.718526] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814533.252391] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814534.252512] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814537.999447] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814621.126041] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814622.126501] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814625.875790] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814843.963289] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814844.963773] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814848.712534] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4814934.355821] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4814935.356202] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4814939.095788] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4815158.960062] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4815159.960517] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4815163.708988] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4815245.063099] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4815246.063526] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4815249.801758] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4815465.959473] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4815466.959848] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4815470.077218] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4815551.083621] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4815552.084116] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4815555.834053] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4815767.958885] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4815768.958981] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4815772.708825] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4815863.128577] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4815864.128884] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4815867.879314] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816070.921337] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816071.921462] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816075.672070] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816174.134352] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816175.134858] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816178.884201] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816371.968913] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816372.969146] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816376.719896] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816489.115358] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816490.115845] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816493.866600] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816684.923978] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816685.924056] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816689.674673] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816790.104675] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816791.104721] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816794.191215] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4816987.978782] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4816988.978917] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4816992.728199] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817098.319348] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817099.319692] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817103.063672] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817293.981859] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817294.982321] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817298.673509] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817407.083862] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817408.084192] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817411.729942] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817597.925227] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817598.925643] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817602.675317] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817665.195836] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817666.196358] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817669.946405] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817833.819397] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817834.819777] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817838.566964] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817868.545926] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817869.546463] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817873.296183] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817883.673716] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817884.673982] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817888.035690] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817901.257542] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817902.257735] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817905.996897] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817911.620756] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817912.621220] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817916.239390] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4817954.340617] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4817955.340701] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4817959.089093] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4818037.726052] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4818038.726130] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4818042.476219] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4818209.929858] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4818210.930208] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4818214.032825] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4818512.931655] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4818513.931932] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4818517.682314] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4818820.937133] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4818821.937286] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4818825.684412] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4818827.162034] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4818830.792381] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4819129.973753] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4819130.974106] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4819134.723695] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4819443.164332] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4819444.164391] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4819447.811069] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4819747.103518] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4819748.103691] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4819751.773338] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4819753.989052] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4819754.989201] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4819758.737645] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820056.116637] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820057.116691] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820060.814923] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820358.098589] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820359.098655] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820362.849098] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820367.977555] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820368.977636] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820372.428547] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820663.140551] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820664.141068] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820667.890897] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820679.951144] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820680.951436] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820684.107720] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820973.195568] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820974.195688] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820977.946473] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4820992.954658] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4820993.955209] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4820997.705972] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4821280.204394] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4821281.204804] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4821284.919380] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4821302.987820] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4821303.988003] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4821307.734960] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4821593.138326] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4821594.138755] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4821597.837878] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4821615.965047] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4821616.965240] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4821620.596874] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4821899.112885] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4821900.112942] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4821903.387923] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4821918.969799] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4821919.969909] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4821923.720658] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
[4822133.233844] nvidia-gpu 0000:01:00.3: can't change power state from D3cold to D0 (config space inaccessible)
[4822134.239631] nvidia-gpu 0000:01:00.3: i2c timeout error ffffffff
[4822137.963844] nvidia-gpu 0000:01:00.3: invalid power transition (from D3cold to D3hot)
  • GPU가 D3cold 전원 상태에서 D0로 변경되지 못하는 문제 발생.
  • i2c timeout error가 다수 발생하여, GPU가 정상적으로 작동하지 않을 가능성이 높음.

7. NVIDIA 드라이버 버전 확인

 

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.183.01  Sun May 12 19:39:15 UTC 2024
GCC version:  gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)
  • 드라이버 버전: 535.183.01

해결 방법

1. 기존 드라이버 제거 및 재설치

sudo apt remove --purge '^nvidia.*'
sudo apt autoremove

2. Nouveau 드라이버 블랙리스트 추가

echo -e "blacklist nouveau
options nouveau modeset=0" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u

3. NVIDIA 드라이버 재설치

sudo apt install -y nvidia-driver-535

4. GPU 초기화 및 전원 관리 설정

echo "options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_EnableGpuFirmware=1 NVreg_RegistryDwords=RMUseSwI2C=0x1" | sudo tee /etc/modprobe.d/nvidia-power.conf
sudo update-initramfs -u

5. 시스템 재부팅 후 nvidia-smi 확인

sudo reboot
nvidia-smi

결론

  • nvidia-smi 오류는 전력 상태(D3cold) 문제와 관련이 있을 가능성이 높음.
  • nvidia-driver-535 드라이버를 다시 설치하고, Nouveau 드라이버를 비활성화하여 해결 가능.
  • 커널 파라미터 변경으로 GPU 전원 상태를 관리하여 문제 해결 가능.
  • dmesg 로그를 계속 모니터링하면서 추가적인 문제 발생 여부 확인 필요.
728x90
반응형
Comments