Proxmox 8.4.1 - NVIDIA.ko Error During M60 Installation

Scaff · 2025-06-19T10:38:06+0200

Hello everyone,

I'm trying to install NVIDIA vGPU drivers for a Tesla M60 on Proxmox 8.4.1.

I've followed the Polloloco guide and obtained the drivers from the NVIDIA hub (NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run), applying the patch found on this post.

No matter what I try, I consistently encounter an error with nvidia.ko.

lsmod | grep -E "nouveau|rivafb|nvidiafb|rivatv" returns nothing, confirming these modules are not loaded.

I've hit a wall and could really use some assistance. Any help or insights would be greatly appreciated!

Code:

# dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet video=efifb:off intel_iommu=on iommu=pt pci-stub.ids=10de:13f2
[    0.457326] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=/dev/mapper/pve-root ro quiet video=efifb:off intel_iommu=on iommu=pt pci-stub.ids=10de:13f2
[    0.457399] DMAR: IOMMU enabled

Code:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Code:

# uname -r
6.8.12-11-pve

Code:

# dpkg -l | grep proxmox-headers-$(uname -r | sed 's/-pve//')
ii  proxmox-headers-6.8.12-11-pve        6.8.12-11                           amd64        Proxmox Kernel Headers

Code:

# mokutil --sb-state
SecureBoot disabled

Code:

# cat /etc/modules-load.d/modules.conf
# Modules required for PCI passthrough
vfio
vfio_iommu_type1
vfio_pci
pci-stub

Code:

# cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0

Code:

# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:13f2,10de:13f2

Code:

# lspci -nnk | grep -i -E "nvidia|vga|3d|pci-stub"
0b:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. G200eR2 [102b:0534] (rev 01)
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Tesla M60] [10de:115e]
        Kernel modules: nvidiafb, nouveau
85:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204GL [Tesla M60] [10de:13f2] (rev a1)
        Subsystem: NVIDIA Corporation GM204GL [Tesla M60] [10de:115e]
        Kernel modules: nvidiafb, nouveau

Code:

# ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel --no-drm

Code:

-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[  189.016041] vmbr0: port 19(veth1009i0) entered forwarding state
[  307.774150] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  308.051389] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[  308.051408] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[  308.055165] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[  308.055169] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[  308.055171] NVRM: No NVIDIA devices probed.
[  308.055509] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
[ 2014.294624] perf: interrupt took too long (2506 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 2598.516202] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[ 2598.516222] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 2598.520101] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 2598.520105] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[ 2598.520108] NVRM: No NVIDIA devices probed.
[ 2598.520543] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Edit 1:

I try with https://wvthoog.nl/proxmox-vgpu-v3/ but same result nividia.ko with kernel 6.5 pinned

Edit 2:

I try

Code:

#pve-nvidia-vgpu-helper setup
You are running the Proxmox kernel 6.8.12-11, searching the associated and newer kernel headers package.
All required packages are already installed.
All done, you can continue with the NVIDIA vGPU driver installation.

Code:

-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2329.168270] perf: interrupt took too long (3168 > 3143), lowering kernel.perf_event_max_sample_rate to 63000
[ 3178.509846] perf: interrupt took too long (3968 > 3960), lowering kernel.perf_event_max_sample_rate to 50000
[ 4521.853629] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 4522.132360] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[ 4522.132379] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 4522.135882] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 4522.135885] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[ 4522.135887] NVRM: No NVIDIA devices probed.
[ 4522.136289] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
[ 4707.208477] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[ 4707.208495] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 4707.211751] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[ 4707.211754] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[ 4707.211757] NVRM: No NVIDIA devices probed.
[ 4707.212082] nvidia-nvlink: Unregistered Nvlink Core, major device number 235

Scaff · 2025-06-19T16:07:12+0200

Edit 3:

I reset everything and retry on following polloloco guide with newer drivers 535.247.02 without patch but that change nothing

Code:

     LD [M]  /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia.o
     LD [M]  /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia-vgpu-vfio.o
     MODPOST /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/Module.symvers
     CC [M]  /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia.mod.o
     CC [M]  /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia-vgpu-vfio.mod.o
     LD [M]  /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia-vgpu-vfio.ko
     BTF [M] /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia-vgpu-vfio.ko
   Skipping BTF generation for /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia-vgpu-vfio.ko due to unavailability of vmlinux
     LD [M]  /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia.ko
     BTF [M] /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia.ko
   Skipping BTF generation for /tmp/selfgz35290/NVIDIA-Linux-x86_64-535.247.02-vgpu-kvm/kernel/nvidia.ko due to unavailability of vmlinux
   make[1]: Leaving directory '/usr/src/linux-headers-6.8.12-11-pve'
-> done.
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[  177.813922] vmbr0: port 19(veth1009i0) entered blocking state
[  177.813936] vmbr0: port 19(veth1009i0) entered forwarding state
[  254.890854] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  255.160904] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[  255.160921] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[  255.164490] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[  255.164493] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[  255.164496] NVRM: No NVIDIA devices probed.
[  255.164832] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
[  768.447726] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
[  768.447745] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[  768.451941] NVRM: This can occur when a driver such as:
               NVRM: nouveau, rivafb, nvidiafb or rivatv
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[  768.451945] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[  768.451948] NVRM: No NVIDIA devices probed.
[  768.452305] nvidia-nvlink: Unregistered Nvlink Core, major device number 235
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Scaff · 2025-06-19T18:48:27+0200

Edit 4:

After moving my GPU(s) to different PCI-e slots, their PCI bus IDs changed from 84:00.0 and 85:00.0 to 06:00.0 and 07:00.0 respectively. The driver installation appeared to complete successfully with the new IDs. However, now when I run nvidia-smi, I receive the error 'No devices were found.' I'm looking for ideas to resolve this.

Code:

# nvidia-smi
No devices were found

Code:

# lsmod | grep nvidia
nvidia_vgpu_vfio       98304  0
nvidia              56795136  2
mdev                   24576  1 nvidia_vgpu_vfio
kvm                  1339392  47 nvidia_vgpu_vfio,kvm_intel
vfio_pci_core          86016  2 nvidia_vgpu_vfio,vfio_pci
irqbypass              12288  89 vfio_pci_core,nvidia_vgpu_vfio,kvm
vfio                   65536  4 vfio_pci_core,nvidia_vgpu_vfio,vfio_iommu_type1,vfio_pci

Code:

06:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)
07:00.0 VGA compatible controller: NVIDIA Corporation GM204GL [Tesla M60] (rev a1)

Code:

-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if your kernel changes later. (Answer: Yes)
-> Registering the kernel modules with DKMS:
-> done.
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (535.161.05):
   executing: '/usr/sbin/ldconfig'...
   executing: '/usr/sbin/depmod -a '...
   executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Running distribution scripts
   executing: '/usr/lib/nvidia/post-install'...

   nvidia-vgpud systemd service successfully installed.

   nvidia-vgpu-mgr systemd service successfully installed.
-> done.
-> Running post-install sanity check:
-> done.
-> Post-install sanity check passed.
-> Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 535.161.05) is now complete.

Scaff · 2025-06-19T20:57:43+0200

Edit 5:

New Finding: Power Cable Issue Identified

After further investigation, I examined the dmesg output and found the following critical messages related to the NVIDIA driver:

Code:

root@r730:~# dmesg | grep NVRM
[   30.235222] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.161.05  Thu Jan 25 17:36:41 UTC 2024
[   32.904274] NVRM: GPU 0000:06:00.0: GPU does not have the necessary power cables connected.
[   32.904961] NVRM: GPU 0000:06:00.0: RmInitAdapter failed! (0x24:0x1c:1435)

For reference, the part number for this GPU is N08NH. This information was found here

Search

Search

Proxmox 8.4.1 - NVIDIA.ko Error During M60 Installation

Scaff

New Member

Scaff

New Member

Scaff

New Member

Scaff

New Member

We value your privacy