Proxmox VE 8.1.3 Nvidia Drivers not working

Hiload

New Member
Jun 21, 2024
2
0
1
Hi everyone,

I am still a Proxmox novice and struggling at the time to get my "NVIDIA Corporation GA106 [RTX A2000 12GB] (rev a1)" working on my Proxmox VE 8.1.3. I added the "non-free" and "non-free-firmware" packages to my apt source and installed the drivers with
Bash:
apt install nvidia-driver

I followed the debian tutorial (https://wiki.debian.org/NvidiaGraphicsDrivers#Version_525.105.17-1) for installing the nvidia drivers but apt threw a warning as I tried to install "firmware-misc-nonfree" so I only installed "nvidia-driver"
Bash:
W: (pve-apt-hook) !! WARNING !!
W: (pve-apt-hook) You are attempting to remove the meta-package 'proxmox-ve'!
W: (pve-apt-hook)
W: (pve-apt-hook) If you really want to permanently remove 'proxmox-ve' from your system, run the following command
W: (pve-apt-hook)       touch '/please-remove-proxmox-ve'
W: (pve-apt-hook) run apt purge proxmox-ve to remove the meta-package
W: (pve-apt-hook) and repeat your apt invocation.
W: (pve-apt-hook)
W: (pve-apt-hook) If you are unsure why 'proxmox-ve' would be removed, please verify
W: (pve-apt-hook)       - your APT repository settings
W: (pve-apt-hook)       - that you are using 'apt full-upgrade' to upgrade your system
E: Sub-process /usr/share/proxmox-ve/pve-apt-hook returned an error code (1)
E: Failure running script /usr/share/proxmox-ve/pve-apt-hook

After that I tried to use "nvidia-smi" but it only gave me this error:
Bash:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

As I looked at "journalctl" I also noticed that the nvidia driver aren't loaded properly.
Code:
May 23 14:34:23 pve systemd-modules-load[2762]: Error running install command 'modprobe nvidia-modeset ; modprobe -i nvidia-current-drm ' for module nvidia_drm: retcode 1
May 23 14:34:23 pve systemd-modules-load[2762]: Failed to insert module 'nvidia_drm': Invalid argument
May 23 14:34:23 pve systemd-modules-load[2776]: modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/6.5.11-4-pve
May 23 14:34:23 pve systemd-modules-load[2772]: modprobe: ERROR: ../libkmod/libkmod-module.c:1047 command_do() Error running install command 'modprobe -i nvidia-current ' for module nvidia: retcode 1
May 23 14:34:23 pve systemd-modules-load[2772]: modprobe: ERROR: could not insert 'nvidia': Invalid argument
May 23 14:34:23 pve systemd-modules-load[2777]: modprobe: FATAL: Module nvidia-current-modeset not found in directory /lib/modules/6.5.11-4-pve
May 23 14:34:23 pve systemd-modules-load[2767]: modprobe: ERROR: ../libkmod/libkmod-module.c:1047 command_do() Error running install command 'modprobe nvidia ; modprobe -i nvidia-current-modeset ' for module nvidia_modeset: retcode 1
May 23 14:34:23 pve systemd-modules-load[2767]: modprobe: ERROR: could not insert 'nvidia_modeset': Invalid argument
May 23 14:34:23 pve systemd-modules-load[2778]: modprobe: FATAL: Module nvidia-current-drm not found in directory /lib/modules/6.5.11-4-pve
May 23 14:34:25 pve systemd-udevd[2949]: modprobe: FATAL: Module nvidia-current not found in directory /lib/modules/6.5.11-4-pve
May 23 14:34:25 pve (udev-worker)[2869]: Error running install command 'modprobe -i nvidia-current ' for module nvidia: retcode 1
May 23 14:34:26 pve audit[3407]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=3407 comm="apparmor_parser"
May 23 14:34:26 pve audit[3407]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=3407 comm="apparmor_parser"
May 23 14:34:26 pve kernel: audit: type=1400 audit(1716467666.198:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=3407 comm="apparmor_parser"
May 23 14:34:26 pve kernel: audit: type=1400 audit(1716467666.198:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=3407 comm="apparmor_parser"
May 23 14:34:26 pve systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
May 23 14:34:26 pve nvidia-persistenced[3466]: Started (3466)
May 23 14:34:26 pve nvidia-persistenced[3466]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 110 has read and write permissions for those files.
May 23 14:34:26 pve nvidia-persistenced[3459]: nvidia-persistenced failed to initialize. Check syslog for more details.
May 23 14:34:26 pve nvidia-persistenced[3466]: Shutdown (3466)
May 23 14:34:26 pve systemd[1]: nvidia-persistenced.service: Control process exited, code=exited, status=1/FAILURE
May 23 14:34:26 pve systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
May 23 14:34:26 pve systemd[1]: Failed to start nvidia-persistenced.service - NVIDIA Persistence Daemon

I also tried to install "nvidia-kernel-dkms" but it couldn't be found in the source list and I couldn't find the proper backports repository for apt.

I want to use the GPU to render my desktops of my VMs via VirGL. It's a similar problem like https://forum.proxmox.com/threads/nvidia-driver-to-use-virtio-gl.144117/#post-666140 but for me even "nvidia-smi" doesn't work.

I hope you can help me with my problem. I appreciate every help I can get.
 
Late reply, but yes, you were in the right path, and you really need nvidia-kernel-dkms for the kernel module to be built.

The problem is that, that package is provided on the official nvidia repository, so you have that add that repository with:
Code:
wget https://developer.download.nvidia.com/compute/cuda/keys/nvidia-gpgkey.pub

sudo apt-key add nvidia-gpgkey.pub

echo "deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda.list
 
Late reply, but yes, you were in the right path, and you really need nvidia-kernel-dkms for the kernel module to be built.

The problem is that, that package is provided on the official nvidia repository, so you have that add that repository with:
Code:
wget https://developer.download.nvidia.com/compute/cuda/keys/nvidia-gpgkey.pub

sudo apt-key add nvidia-gpgkey.pub

echo "deb https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda.list
Thanks for your reply.
I appreciate your help and the reply.

On one of my test systems (same Proxmox version but with a NVIDIA GeForce GTX 1060 6GB) I downloaded the newest driver (NVIDIA-Linux-x86_64-550.100.run (newest at the time)) directly from NVIDIA (Link: Linux x64 (AMD64/EM64T) Display Driver) and could use nvidia-smi to commuicate with the graphics card.

I also replicated the error I got from only installing the nvidia-drivers with apt on the test system. But currently haven't found a way to remove the installed packages from apt and reinstall the drivers from the NVIDIA site without damaging the OS.

So is it possible to install the nvidia-kernel-dkms package after I installed the drivers via the Proxmox reposetories or do I have to follow certain steps and be careful in which order I install/remove which package?
 
Hi,

also late to the party, but I just ran into the same problem. I didn't have it on a headless debian machine which used SeaBIOS. So are you using UEFI? If so, you probably have to configure or disable SecureBoot. SecureBoot in vanilla settings will not allow the kernel module to load...

I didn't have to use nvidia's official repo, btw. apt install nvidia-driver is fine...