Trouble Installing Nvidia Drivers for Quadro P2000 for vGPU

Dec 25, 2023
11
5
3
22
Germany
Hello!

I currently have trouble installing the drivers for my Quadro p2000.
In the same System I also have a HBA Installed that is passed through to a VM, so IOMMU is enabled. For this I followed the Proxmox Guide.

I have blacklisted the nouveau driver and removed the blacklist entry for nvidiafb and updated the initramfs
Running lspci -k:
Code:
10:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2000] (rev a1)
        Subsystem: Dell GP106GL [Quadro P2000]
        Kernel modules: nvidiafb, nouveau
10:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)
        Subsystem: Dell GP106 High Definition Audio Controller
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

As there is no Kernel driver in use for the GPU itself, I assume this has worked.
I have also installed the linux headers like this:
Code:
apt-get install linux-headers-`uname -r`

Running the latest driver installer supported for my GPU (15.7), I get this error message:
Code:
./NVIDIA-Linux-x86_64-535.183.04-vgpu-kvm-custom.run --dkms -m=kernel

[..]

ERROR: Unable to load the kernel module 'nvidia-vgpu-vfio.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from
         the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this
         system is supported by this NVIDIA Linux graphics driver release.                                                                                                                                                         
      
         Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.

Code:
root@pvenew:~/16.7/Host_Drivers# ./NVIDIA-Linux-x86_64-535.183.04-vgpu-kvm-custom.run --dkms -m=kernel
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 535.183.04........................................................................................................................................................................................................................................................................................
root@pvenew:~/16.7/Host_Drivers# ^Ct /etc/modprobe.d/vifo.conf
root@pvenew:~/16.7/Host_Drivers# cat /var/log/nvidia-installer.log
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Fri Aug 16 12:42:24 2024
installer version: 535.183.04

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer
    --dkms
    -m=kernel

Using: nvidia-installer ncurses v6 user interface
-> Detected 16 CPUs online; setting concurrency level to 16.
-> Installing NVIDIA driver version 535.183.04.
-> Performing CC sanity check with CC="/usr/bin/cc".
-> Performing CC check.
-> Kernel source path: '/lib/modules/6.8.8-4-pve/build'
-> Kernel output path: '/lib/modules/6.8.8-4-pve/build'
-> Performing Compiler check.
-> Performing Dom0 check.
-> Performing Xen check.
-> Performing PREEMPT_RT check.
-> Performing vgpu_kvm check.
-> Cleaning kernel module build directory.
   executing: 'cd kernel; /usr/bin/make -k -j16  NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/6.8.8-4-pve/build" SYSOUT="/lib/modules/6.8.8-4-pve/build" clean'...
   rm -f -r conftest
   make[1]: Entering directory '/usr/src/linux-headers-6.8.8-4-pve'
   make[1]: Leaving directory '/usr/src/linux-headers-6.8.8-4-pve'
-> Building kernel modules
   executing: 'cd kernel; /usr/bin/make -k -j16  NV_EXCLUDE_KERNEL_MODULES="" SYSSRC="/lib/modules/6.8.8-4-pve/build" SYSOUT="/lib/modules/6.8.8-4-pve/build" '...
   make[1]: Entering directory '/usr/src/linux-headers-6.8.8-4-pve'
   warning: the compiler differs from the one used to build the kernel
     The kernel was built by: gcc (Debian 12.2.0-14) 12.2.0
     You are using:           cc (Debian 12.2.0-14) 12.2.0
     SYMLINK /tmp/selfgz86248/NVIDIA-Linux-x86_64-535.183.04-vgpu-kvm-custom/kernel/nvidia/nv-kernel.o
    CONFTEST: hash__remap_4k_pfn
    CONFTEST: set_pages_uc
[...]
-> done.
-> Kernel module compilation complete.
-> Unable to determine if Secure Boot is enabled: No such file or directory
ERROR: Unable to load the kernel module 'nvidia-vgpu-vfio.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Invalid argument
-> Kernel messages:
[  987.821389] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.183.04  Fri May 24 18:08:53 UTC 2024
[  987.831394] failing symbol_get of non-GPLONLY symbol nvidia_vgpu_vfio_get_ops.
[  987.831397] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[ 1062.358318] nvidia-nvlink: Unregistered Nvlink Core, major device number 508
[ 1103.757751] nvidia-nvlink: Nvlink Core is being initialized, major device number 508

[ 1103.759058] nvidia 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=io+mem
[ 1103.959415] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.183.04  Fri May 24 18:08:53 UTC 2024
[ 1103.970609] failing symbol_get of non-GPLONLY symbol nvidia_vgpu_vfio_get_ops.
[ 1103.970613] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[ 1105.790390] nvidia-nvlink: Unregistered Nvlink Core, major device number 508
[ 1203.578521] nvidia-nvlink: Nvlink Core is being initialized, major device number 508

[ 1203.579832] nvidia 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=io+mem
[ 1203.779406] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.183.04  Fri May 24 18:08:53 UTC 2024
[ 1203.789585] failing symbol_get of non-GPLONLY symbol nvidia_vgpu_vfio_get_ops.
[ 1203.789589] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[ 1304.972113] nvidia-nvlink: Unregistered Nvlink Core, major device number 508
[ 1527.191601] nvidia-nvlink: Nvlink Core is being initialized, major device number 508

[ 1527.192983] nvidia 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=io+mem
[ 1527.392431] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.183.04  Fri May 24 18:08:53 UTC 2024
[ 1527.403379] failing symbol_get of non-GPLONLY symbol nvidia_vgpu_vfio_get_ops.
[ 1527.403382] [nvidia-vgpu-vfio] Unable to get symbol for nvidia_vgpu_vfio_get_ops from nvidia.ko
[ 1576.786194] nvidia-nvlink: Unregistered Nvlink Core, major device number 508
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

Might the Issue be IOMMU? I'm not sure.

Thanks,
Trace
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!