I installed an RTX 5060 in one of my Proxmox nodes and I am trying to install the driver on the host so that LXCs can use the GPU. It keeps failing due to a kernel error. I see that there are issues with the 6.17 kernel, since I am running VE 9.1 I reinstalled the latest 6.14 kernel, 6.14.11-5, and according to the documentation removed proxmox-default-headers. The 6.17 kernel wasn't removed because of the proxmox-default-kernel dependency, so only the 6.17 headers were removed.
Everything else appears to be the typical install process.
Blacklist nouveau
Verified no Kernel drivers were in use on the device after rebooting.
The Documentation shows nVidia Driver 580.105.06, but when I currently search their website for the driver, when I view more versions, the latest 580.105 is 580.105.08, which is what I download and run.
During the install I choose the MIT option when presented and it then loads to 100% and then errors, unable to load the kernel module and points me to this log.
Every example I can find of an error log similar to this appears to be resolved by making sure nouveau and vfio are not claiming the device, but the lspci output indicates nothing is claiming it? I'm not sure what to try next, if anyone could provide any help that would be greatly appreciated.
Edit:
I should probably also point out, secure boot is not enabled
I also realized that I did not install the plain 6.14 headers, only the 6.14.11-5 headers, so I installed those and got the same error.
I ran the
Code:
root@pve02:~# uname -r
6.14.11-5-pve
root@pve02:~# apt list --installed | grep -E '(pve|proxmox)-kernel'
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
proxmox-kernel-6.14.11-5-pve-signed/stable,now 6.14.11-5 amd64 [installed,automatic]
proxmox-kernel-6.14/stable,now 6.14.11-5 all [installed]
proxmox-kernel-6.17.4-2-pve-signed/stable,now 6.17.4-2 amd64 [installed,automatic]
proxmox-kernel-6.17/stable,now 6.17.4-2 all [installed,automatic]
proxmox-kernel-helper/stable,now 9.0.4 all [installed]
root@pve02:~# apt list --installed | grep -E 'proxmox-headers|pve-headers'
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
proxmox-headers-6.14.11-5-pve/stable,now 6.14.11-5 amd64 [installed]
Everything else appears to be the typical install process.
Blacklist nouveau
Code:
root@pve02:/etc/modprobe.d# cat blacklist-nouveau.conf
blacklist nouveau
blacklist nvidiafb
blacklist snd_hda_intel
options nouveau modeset=0
root@pve02:~# update-initramfs -u -k $(uname -r)
update-initramfs: Generating /boot/initrd.img-6.14.11-5-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
root@pve02:~# reboot
Verified no Kernel drivers were in use on the device after rebooting.
Code:
root@pve02:~# lspci -k
...
01:00.0 VGA compatible controller: NVIDIA Corporation GB206 [GeForce RTX 5060] (rev a1)
Subsystem: Gigabyte Technology Co., Ltd Device 41a2
Kernel modules: nvidiafb, nouveau
01:00.1 Audio device: NVIDIA Corporation Device 22eb (rev a1)
Subsystem: NVIDIA Corporation Device 0000
Kernel modules: snd_hda_intel
...
The Documentation shows nVidia Driver 580.105.06, but when I currently search their website for the driver, when I view more versions, the latest 580.105 is 580.105.08, which is what I download and run.
Code:
root@pve02:~# ./NVIDIA-Linux-x86_64-580.105.08.run --dkms
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 580.105.08......................................
During the install I choose the MIT option when presented and it then loads to 100% and then errors, unable to load the kernel module and points me to this log.
Code:
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'. This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.
Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 822.078889] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
NVRM: occur when a driver such as rivatv is loaded and claims
NVRM: ownership of the device's registers.
[ 822.083620] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 822.083670] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 822.083672] NVRM: None of the NVIDIA devices were initialized.
[ 822.084503] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 1084.610146] VFIO - User Level meta-driver version: 0.3
[ 1084.777020] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 1084.777030] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
NVRM: occur when a driver such as rivatv is loaded and claims
NVRM: ownership of the device's registers.
[ 1084.781985] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 1084.782028] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1084.782030] NVRM: None of the NVIDIA devices were initialized.
[ 1084.782842] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 2110.583706] VFIO - User Level meta-driver version: 0.3
[ 2110.748528] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 2110.748538] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
NVRM: occur when a driver such as rivatv is loaded and claims
NVRM: ownership of the device's registers.
[ 2110.753225] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 2110.753289] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2110.753291] NVRM: None of the NVIDIA devices were initialized.
[ 2110.754234] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
Every example I can find of an error log similar to this appears to be resolved by making sure nouveau and vfio are not claiming the device, but the lspci output indicates nothing is claiming it? I'm not sure what to try next, if anyone could provide any help that would be greatly appreciated.
Edit:
I should probably also point out, secure boot is not enabled
Code:
root@pve02:~# mokutil --sb-state
SecureBoot disabled
I also realized that I did not install the plain 6.14 headers, only the 6.14.11-5 headers, so I installed those and got the same error.
Code:
root@pve02:~# apt list --installed | grep -E 'proxmox-headers|pve-headers'
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
proxmox-headers-6.14.11-5-pve/stable,now 6.14.11-5 amd64 [installed]
proxmox-headers-6.14/stable,now 6.14.11-5 all [installed]
I ran the
pve-nvidia-vgpu-helper setup command and it wanted to install the 6.17 headers, which seemed counter intuitive but I let it install them and it verified everything was ok at that point. I re-ran the installer and got the same error above.
Code:
root@pve02:~# pve-nvidia-vgpu-helper setup
You are running the Proxmox kernel 6.14.11-5, searching the associated and newer kernel headers package.
The following packages are missing:
proxmox-default-headers
proxmox-headers-6.17
Would you like to install them now (y/N)? y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
proxmox-headers-6.17.4-2-pve
The following NEW packages will be installed:
proxmox-default-headers proxmox-headers-6.17 proxmox-headers-6.17.4-2-pve
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 15.3 MB of archives.
After this operation, 105 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 proxmox-headers-6.17.4-2-pve amd64 6.17.4-2 [15.3 MB]
Get:2 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 proxmox-headers-6.17 all 6.17.4-2 [11.3 kB]
Get:3 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 proxmox-default-headers all 2.0.2 [1,944 B]
Fetched 15.3 MB in 2s (7,785 kB/s)
Selecting previously unselected package proxmox-headers-6.17.4-2-pve.
(Reading database ... 101215 files and directories currently installed.)
Preparing to unpack .../proxmox-headers-6.17.4-2-pve_6.17.4-2_amd64.deb ...
Unpacking proxmox-headers-6.17.4-2-pve (6.17.4-2) ...
Selecting previously unselected package proxmox-headers-6.17.
Preparing to unpack .../proxmox-headers-6.17_6.17.4-2_all.deb ...
Unpacking proxmox-headers-6.17 (6.17.4-2) ...
Selecting previously unselected package proxmox-default-headers.
Preparing to unpack .../proxmox-default-headers_2.0.2_all.deb ...
Unpacking proxmox-default-headers (2.0.2) ...
Setting up proxmox-headers-6.17.4-2-pve (6.17.4-2) ...
Setting up proxmox-headers-6.17 (6.17.4-2) ...
Setting up proxmox-default-headers (2.0.2) ...
All done, you can continue with the NVIDIA vGPU driver installation.
root@pve02:~# pve-nvidia-vgpu-helper setup
You are running the Proxmox kernel 6.14.11-5, searching the associated and newer kernel headers package.
All required packages are already installed.
All done, you can continue with the NVIDIA vGPU driver installation.
Last edited: