PVE 8 mit nvidia gfx geht oder nicht?

Oct 27, 2022
109
13
23
Moin,

hatte auf pve7 (mit 6er kernel) eine 3060 erfolgreich laufen (nur als Rechenknecht). nach dem upgrade auf 8 geht der nvidia treiber aber leider nicht mehr. Treiber sind zwar installiert aber es kommt
root@pve-hv-01:~# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

In der Upgrade Anleitung habe ich zwar einen hinweis nvidia gesehen, aber ist das mein Problem?

Code:
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GA106 High Definition Audio Controller (rev a1)

# dpkg -l | grep nvidia
ii glx-alternative-nvidia 1.2.2 amd64 allows the selection of NVIDIA as GLX provider
ii libegl-nvidia0:amd64 535.54.03-1 amd64 NVIDIA binary EGL library
ii libgl1-nvidia-glvnd-glx:amd64 535.54.03-1 amd64 NVIDIA binary OpenGL/GLX library (GLVND variant)
ii libgles-nvidia1:amd64 535.54.03-1 amd64 NVIDIA binary OpenGL|ES 1.x library
ii libgles-nvidia2:amd64 535.54.03-1 amd64 NVIDIA binary OpenGL|ES 2.x library
ii libglx-nvidia0:amd64 535.54.03-1 amd64 NVIDIA binary GLX library
ii libnvidia-allocator1:amd64 535.54.03-1 amd64 NVIDIA allocator runtime library
ii libnvidia-cfg1:amd64 535.54.03-1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-eglcore:amd64 535.54.03-1 amd64 NVIDIA binary EGL core libraries
ii libnvidia-encode1:amd64 535.54.03-1 amd64 NVENC Video Encoding runtime library
ii libnvidia-fbc1:amd64 535.54.03-1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-glcore:amd64 535.54.03-1 amd64 NVIDIA binary OpenGL/GLX core libraries
ii libnvidia-glvkspirv:amd64 535.54.03-1 amd64 NVIDIA binary Vulkan Spir-V compiler library
ii libnvidia-ml1:amd64 535.54.03-1 amd64 NVIDIA Management Library (NVML) runtime library
ii libnvidia-nvvm4:amd64 535.54.03-1 amd64 NVIDIA NVVM
ii libnvidia-opticalflow1:amd64 535.54.03-1 amd64 NVIDIA Optical Flow runtime library
ii libnvidia-pkcs11:amd64 535.54.03-1 amd64 NVIDIA pkcs runtime library
ii libnvidia-ptxjitcompiler1:amd64 535.54.03-1 amd64 NVIDIA PTX JIT Compiler
ii libnvidia-rtcore:amd64 535.54.03-1 amd64 NVIDIA binary Vulkan ray tracing (rtcore) library
ii libnvidia-wayland-client:amd64 535.54.03-1 amd64 NVIDIA client for wayland library
ii nvidia-alternative 535.54.03-1 amd64 allows the selection of NVIDIA as GLX provider
ii nvidia-cuda-mps 535.54.03-1 amd64 NVIDIA CUDA Multi Process Service (MPS)
rc nvidia-cuda-toolkit 11.2.2-3+deb11u3 amd64 NVIDIA CUDA development toolkit
ii nvidia-detect 535.54.03-1 amd64 NVIDIA GPU detection utility
ii nvidia-driver 535.54.03-1 amd64 NVIDIA metapackage
ii nvidia-driver-bin 535.54.03-1 amd64 NVIDIA driver support binaries
ii nvidia-driver-libs:amd64 535.54.03-1 amd64 NVIDIA metapackage (OpenGL/GLX/EGL/GLES libraries)
ii nvidia-egl-common 535.54.03-1 amd64 NVIDIA binary EGL driver - common files
ii nvidia-egl-icd:amd64 535.54.03-1 amd64 NVIDIA EGL installable client driver (ICD)
ii nvidia-installer-cleanup 20220217+3~deb12u1 amd64 cleanup after driver installation with the nvidia-installer
ii nvidia-kernel-common 20220217+3~deb12u1 amd64 NVIDIA binary kernel module support files
ii nvidia-kernel-dkms 535.54.03-1 amd64 NVIDIA binary kernel module DKMS source
ii nvidia-kernel-support 535.54.03-1 amd64 NVIDIA binary kernel module support files
rc nvidia-legacy-390xx-alternative 390.157-1~deb11u1 amd64 allows the selection of NVIDIA as GLX provider (390xx legacy version)
ii nvidia-legacy-check 535.54.03-1 amd64 check for NVIDIA GPUs requiring a legacy driver
ii nvidia-libopencl1:amd64 535.54.03-1 amd64 NVIDIA OpenCL ICD Loader library
ii nvidia-modprobe 535.54.03-1 amd64 utility to load NVIDIA kernel modules and create device nodes
ii nvidia-opencl-common 535.54.03-1 amd64 NVIDIA OpenCL driver - common files
ii nvidia-opencl-icd:amd64 535.54.03-1 amd64 NVIDIA OpenCL installable client driver (ICD)
ii nvidia-persistenced 535.54.03-1 amd64 daemon to maintain persistent software state in the NVIDIA driver
ii nvidia-settings 535.54.03-1 amd64 tool for configuring the NVIDIA graphics driver
ii nvidia-smi 535.54.03-1 amd64 NVIDIA System Management Interface
ii nvidia-support 20220217+3~deb12u1 amd64 NVIDIA binary graphics driver support files
ii nvidia-vdpau-driver:amd64 535.54.03-1 amd64 Video Decode and Presentation API for Unix - NVIDIA driver
ii nvidia-vulkan-common 535.54.03-1 amd64 NVIDIA Vulkan driver - common files
ii nvidia-vulkan-icd:amd64 535.54.03-1 amd64 NVIDIA Vulkan installable client driver (ICD)
ii nvidia-xconfig 535.54.03-1 amd64 deprecated X configuration tool for non-free NVIDIA drivers
ii xserver-xorg-video-nvidia 535.54.03-1 amd64 NVIDIA binary Xorg driver

Code:
# uname -a
Linux pve-hv-01 6.2.16-5-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-6 (2023-07-25T15:33Z) x86_64 GNU/Linux
 
Last edited:
# systemctl status nvidia-persistenced.service
Code:
× nvidia-persistenced.service - NVIDIA Persistence Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Sun 2023-07-30 18:15:49 CEST; 14s ago
    Process: 110761 ExecStart=/usr/bin/nvidia-persistenced --user nvpd (code=exited, status=1/FAILURE)
    Process: 110767 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)
        CPU: 7ms


Jul 30 18:15:49 pve-hv-01 systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
Jul 30 18:15:49 pve-hv-01 nvidia-persistenced[110762]: Started (110762)
Jul 30 18:15:49 pve-hv-01 nvidia-persistenced[110762]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 114 has read and write permissions for those files.
Jul 30 18:15:49 pve-hv-01 nvidia-persistenced[110762]: Shutdown (110762)
Jul 30 18:15:49 pve-hv-01 nvidia-persistenced[110761]: nvidia-persistenced failed to initialize. Check syslog for more details.
Jul 30 18:15:49 pve-hv-01 systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
Jul 30 18:15:49 pve-hv-01 systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Jul 30 18:15:49 pve-hv-01 systemd[1]: Failed to start nvidia-persistenced.service - NVIDIA Persistence Daemon.

# ls /dev/nv
nvme0 nvme0n1 nvme0n1p1 nvme1 nvme1n1 nvme-fabrics nvram
 
Code:
# nvidia-detect
Detected NVIDIA GPUs:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] [10de:2504] (rev a1)


Checking card:  NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)
Uh oh. Your card is not supported by any driver version up to 535.54.03.
A newer driver may add support for your card.
Newer driver releases may be available in backports, unstable or experimental.
 
OK beantworte ich das mir mal selber
alten müll entferrnen
Code:
apt remove nvidia*

Treiber install von NVidia herunterladen
Code:
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.86.05/NVIDIA-Linux-x86_64-535.86.05.run
chmod +x NVIDIA-Linux-x86_64-535.86.05.run
apt install pve-headers
./NVIDIA-Linux-x86_64-535.86.05.run --kernel-source-path /usr/src/linux-headers-6.2.16-5-pve/

Fragen die kommen beantworten (die defaults passen meist). fertig

nvidia-smi funktioniert
Code:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0 Off |                  N/A |
| 30%   39C    P0             ERR! / 170W |      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+


+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!