nVidia driver install error

g4m3r7ag

New Member
May 7, 2024
4
0
1
I installed an RTX 5060 in one of my Proxmox nodes and I am trying to install the driver on the host so that LXCs can use the GPU. It keeps failing due to a kernel error. I see that there are issues with the 6.17 kernel, since I am running VE 9.1 I reinstalled the latest 6.14 kernel, 6.14.11-5, and according to the documentation removed proxmox-default-headers. The 6.17 kernel wasn't removed because of the proxmox-default-kernel dependency, so only the 6.17 headers were removed.

Code:
root@pve02:~# uname -r
6.14.11-5-pve

root@pve02:~# apt list --installed | grep -E '(pve|proxmox)-kernel'

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

proxmox-kernel-6.14.11-5-pve-signed/stable,now 6.14.11-5 amd64 [installed,automatic]
proxmox-kernel-6.14/stable,now 6.14.11-5 all [installed]
proxmox-kernel-6.17.4-2-pve-signed/stable,now 6.17.4-2 amd64 [installed,automatic]
proxmox-kernel-6.17/stable,now 6.17.4-2 all [installed,automatic]
proxmox-kernel-helper/stable,now 9.0.4 all [installed]

root@pve02:~# apt list --installed | grep -E 'proxmox-headers|pve-headers'

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

proxmox-headers-6.14.11-5-pve/stable,now 6.14.11-5 amd64 [installed]

Everything else appears to be the typical install process.

Blacklist nouveau

Code:
root@pve02:/etc/modprobe.d# cat blacklist-nouveau.conf
blacklist nouveau
blacklist nvidiafb
blacklist snd_hda_intel
options nouveau modeset=0

root@pve02:~# update-initramfs -u -k $(uname -r)
update-initramfs: Generating /boot/initrd.img-6.14.11-5-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

root@pve02:~# reboot

Verified no Kernel drivers were in use on the device after rebooting.

Code:
root@pve02:~# lspci -k
...
01:00.0 VGA compatible controller: NVIDIA Corporation GB206 [GeForce RTX 5060] (rev a1)
        Subsystem: Gigabyte Technology Co., Ltd Device 41a2
        Kernel modules: nvidiafb, nouveau
01:00.1 Audio device: NVIDIA Corporation Device 22eb (rev a1)
        Subsystem: NVIDIA Corporation Device 0000
        Kernel modules: snd_hda_intel
...

The Documentation shows nVidia Driver 580.105.06, but when I currently search their website for the driver, when I view more versions, the latest 580.105 is 580.105.08, which is what I download and run.

Code:
root@pve02:~# ./NVIDIA-Linux-x86_64-580.105.08.run --dkms
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 580.105.08......................................

During the install I choose the MIT option when presented and it then loads to 100% and then errors, unable to load the kernel module and points me to this log.

Code:
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[  822.078889] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[  822.083620] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[  822.083670] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  822.083672] NVRM: None of the NVIDIA devices were initialized.
[  822.084503] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 1084.610146] VFIO - User Level meta-driver version: 0.3
[ 1084.777020] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 1084.777030] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[ 1084.781985] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 1084.782028] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1084.782030] NVRM: None of the NVIDIA devices were initialized.
[ 1084.782842] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 2110.583706] VFIO - User Level meta-driver version: 0.3
[ 2110.748528] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 2110.748538] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[ 2110.753225] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 2110.753289] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 2110.753291] NVRM: None of the NVIDIA devices were initialized.
[ 2110.754234] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Every example I can find of an error log similar to this appears to be resolved by making sure nouveau and vfio are not claiming the device, but the lspci output indicates nothing is claiming it? I'm not sure what to try next, if anyone could provide any help that would be greatly appreciated.

Edit:

I should probably also point out, secure boot is not enabled

Code:
root@pve02:~# mokutil --sb-state
SecureBoot disabled

I also realized that I did not install the plain 6.14 headers, only the 6.14.11-5 headers, so I installed those and got the same error.

Code:
root@pve02:~# apt list --installed | grep -E 'proxmox-headers|pve-headers'

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

proxmox-headers-6.14.11-5-pve/stable,now 6.14.11-5 amd64 [installed]
proxmox-headers-6.14/stable,now 6.14.11-5 all [installed]

I ran the pve-nvidia-vgpu-helper setup command and it wanted to install the 6.17 headers, which seemed counter intuitive but I let it install them and it verified everything was ok at that point. I re-ran the installer and got the same error above.

Code:
root@pve02:~# pve-nvidia-vgpu-helper setup
You are running the Proxmox kernel 6.14.11-5, searching the associated and newer kernel headers package.
The following packages are missing:
        proxmox-default-headers
        proxmox-headers-6.17
Would you like to install them now (y/N)? y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  proxmox-headers-6.17.4-2-pve
The following NEW packages will be installed:
  proxmox-default-headers proxmox-headers-6.17 proxmox-headers-6.17.4-2-pve
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 15.3 MB of archives.
After this operation, 105 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 proxmox-headers-6.17.4-2-pve amd64 6.17.4-2 [15.3 MB]
Get:2 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 proxmox-headers-6.17 all 6.17.4-2 [11.3 kB]
Get:3 http://download.proxmox.com/debian/pve trixie/pve-no-subscription amd64 proxmox-default-headers all 2.0.2 [1,944 B]
Fetched 15.3 MB in 2s (7,785 kB/s)
Selecting previously unselected package proxmox-headers-6.17.4-2-pve.
(Reading database ... 101215 files and directories currently installed.)
Preparing to unpack .../proxmox-headers-6.17.4-2-pve_6.17.4-2_amd64.deb ...
Unpacking proxmox-headers-6.17.4-2-pve (6.17.4-2) ...
Selecting previously unselected package proxmox-headers-6.17.
Preparing to unpack .../proxmox-headers-6.17_6.17.4-2_all.deb ...
Unpacking proxmox-headers-6.17 (6.17.4-2) ...
Selecting previously unselected package proxmox-default-headers.
Preparing to unpack .../proxmox-default-headers_2.0.2_all.deb ...
Unpacking proxmox-default-headers (2.0.2) ...
Setting up proxmox-headers-6.17.4-2-pve (6.17.4-2) ...
Setting up proxmox-headers-6.17 (6.17.4-2) ...
Setting up proxmox-default-headers (2.0.2) ...
All done, you can continue with the NVIDIA vGPU driver installation.

root@pve02:~# pve-nvidia-vgpu-helper setup
You are running the Proxmox kernel 6.14.11-5, searching the associated and newer kernel headers package.
All required packages are already installed.
All done, you can continue with the NVIDIA vGPU driver installation.
 
Last edited:
vGPU drivers make little sense with a 5060. Remove all nvidia packages, then follow this: https://gist.github.com/Impact123/3...e3#install-nvidia-driversmodules-via-run-file
That is what I've been doing and it gives the same error output

Code:
root@pve02:~# apt install -y proxmox-default-headers gcc make dkms
proxmox-default-headers is already the newest version (2.0.2).
gcc is already the newest version (4:14.2.0-1).
gcc set to manually installed.
make is already the newest version (4.4.1-2).
make set to manually installed.
dkms is already the newest version (3.2.2-1~deb13u1).
Summary:
  Upgrading: 0, Installing: 0, Removing: 0, Not Upgrading: 0

root@pve02:~# ./$(ls -t NVIDIA*.run | head -n 1) --dkms --disable-nouveau --kernel-module-type proprietary --no-install-libglvnd
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 580.119.02....................

Here is the log output after the above failed

Code:
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[57724.077489] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[57724.081842] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[57724.081902] NVRM: The NVIDIA probe routine failed for 1 device(s).
[57724.081906] NVRM: None of the NVIDIA devices were initialized.
[57724.082579] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[57780.426763] VFIO - User Level meta-driver version: 0.3
[57780.592211] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[57780.592219] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[57780.597075] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[57780.597132] NVRM: The NVIDIA probe routine failed for 1 device(s).
[57780.597134] NVRM: None of the NVIDIA devices were initialized.
[57780.597970] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[73071.635495] VFIO - User Level meta-driver version: 0.3
[73071.874740] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[73071.874749] NVRM: request_mem_region failed for 64M @ 0xd0000000. This can
               NVRM: occur when a driver such as rivatv is loaded and claims
               NVRM: ownership of the device's registers.
[73071.881622] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[73071.881648] NVRM: The NVIDIA probe routine failed for 1 device(s).
[73071.881649] NVRM: None of the NVIDIA devices were initialized.
[73071.882867] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
 
Maybe your current kernel doesn't match the headers? Check these
Bash:
uname -a
proxmox-boot-tool kernel list
apt depends proxmox-default-headers

Or maybe you configured PCI(e) passthrough? Check these
Bash:
lspci -vnnk | awk '/VGA/{print $0}' RS= | grep -Pi --color "^|(?<=Kernel driver in use: |Kernel modules: )[^ ]+"
grep -sREi "blacklist|softdep|vfio|iommu|acs" /etc/modprobe.d/ /etc/default/grub /etc/kernel/cmdline
 
Last edited:
Maybe your current kernel doesn't match the headers? Check these
Bash:
uname -a
proxmox-boot-tool kernel list
apt depends proxmox-default-headers

Or maybe you configured PCI(e) passthrough? Check these
Bash:
lspci -vnnk | awk '/VGA/{print $0}' RS= | grep -Pi --color "^|(?<=Kernel driver in use: |Kernel modules: )[^ ]+"
grep -sREi "blacklist|softdep|vfio|iommu|acs" /etc/modprobe.d/ /etc/default/grub /etc/kernel/cmdline

Headers match, not really sure why were checking the proxmox-default-headers, since those are for 6.17 and the Proxmox Documentation specifically tells you to remove the proxmox-default-headers package, so I listed the installed headers as well. Under Known Issues & Breaking Changes

Code:
root@pve02:~# uname -a
Linux pve02 6.14.11-5-pve #2 SMP PREEMPT_DYNAMIC PMX 6.14.11-5 (2025-12-15T08:44Z) x86_64 GNU/Linux

root@pve02:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.14.11-5-pve
6.17.4-2-pve

Pinned kernel:
6.14.11-5-pve

root@pve02:~# apt depends proxmox-default-headers
proxmox-default-headers
  Depends: proxmox-headers-6.17

root@pve02:~# apt list --installed | grep -E 'proxmox-headers|pve-headers'

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

proxmox-headers-6.14.11-5-pve/stable,now 6.14.11-5 amd64 [installed]
proxmox-headers-6.14/stable,now 6.14.11-5 all [installed]

I don't believe I did anything to try and configure passthrough, I for sure didn't map the device to any VMs

Code:
root@pve02:~# lspci -vnnk | awk '/VGA/{print $0}' RS= | grep -Pi --color "^|(?<=Kernel driver in use: |Kernel modules: )[^ ]+"
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GB206 [GeForce RTX 5060] [10de:2d05] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Device [1458:41a2]
        Flags: fast devsel, IRQ 243, IOMMU group 11
        Memory at d0000000 (32-bit, non-prefetchable) [size=64M]
        Memory at d4000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] MSI: Enable- Count=1/16 Maskable+ 64bit+
        Capabilities: [60] Express Legacy Endpoint, IntMsgNum 0
        Capabilities: [9c] Vendor Specific Information: Len=14 <?>
        Capabilities: [b0] MSI-X: Enable- Count=9 Masked-
        Capabilities: [100] Secondary PCI Express
        Capabilities: [12c] Latency Tolerance Reporting
        Capabilities: [134] Physical Resizable BAR
        Capabilities: [140] Virtual Resizable BAR
        Capabilities: [14c] Data Link Feature <?>
        Capabilities: [158] Physical Layer 16.0 GT/s <?>
        Capabilities: [188] Physical Layer 32.0 GT/s <?>
        Capabilities: [1b8] Advanced Error Reporting
        Capabilities: [200] Lane Margining at the Receiver
        Capabilities: [248] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [250] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [2a4] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?>
        Capabilities: [2bc] Power Budgeting <?>
        Capabilities: [2f4] Device Serial Number 61-e7-ac-81-b7-2d-b0-48
        Kernel modules: nvidiafb, nouveau

root@pve02:~# grep -sREi "blacklist|softdep|vfio|iommu|acs" /etc/modprobe.d/ /etc/default/grub /etc/kernel/cmdline
/etc/modprobe.d/amd64-microcode-blacklist.conf:blacklist microcode
/etc/modprobe.d/block-nouveau.conf:blacklist nouveau
/etc/modprobe.d/pve-blacklist.conf:blacklist nvidiafb
/etc/modprobe.d/blacklist-nouveau.conf:blacklist nouveau
/etc/modprobe.d/blacklist-nouveau.conf:blacklist nvidiafb
/etc/modprobe.d/blacklist-nouveau.conf:blacklist snd_hda_intel
 
proxmox-default-headers because this is part of my guide I linked which is what I use all the time with success. This only makes sense if you use the latest kernel though. Why did you pin the older kernel? There's now a 580.119.02 version but I doubt it changes things. The rest looks good.
 
Last edited:
proxmox-default-headers because this is part of my guide I linked which is what I use all the time with success. This only makes sense if you use the latest kernel though. Why did you pin the older kernel? There's now a 580.119.02 version but I doubt it changes things. Does it behave the same way if you choose the proprietary option? The rest looks good.
I pinned the older Kernel because that is what the Proxmox documentation I linked to advises to do. I found that documentation after I started troubleshooting the driver install failures trying to find a solution as I was already on the 6.17 kernel. So I followed the documentation. I also originally started with 580.119.02 but since the Proxmox documentation recommends 580.105.06 I started trying 580.105.08, since I assume the only differences between .06 and .08 are minor bugfixes and .06 wasn't showing up on the nvidia webpage anymore as an option to download.