[SOLVED] Kernel 5.11 & NVIDIA Linux vgpu-kvm

mishki

Member
May 1, 2020
71
11
13
37
How to install a driver on a 5.11 kernel?
5.4 works.


Proxmox 7. Kernel 5.11. PRIMERGY RX2540 M5. RTX 6000

# uname -a
Linux omega 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200) x86_64 GNU/Linux#

# lspci -nn | grep NVID
18:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] [10de:1e30] (rev a1)

# bash NVIDIA-Linux-x86_64-460.91.03-vgpu-kvm.run

# dkms status
nvidia, 460.91.03: added

# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

-------
Proxmox 7. Kernel 5.4

# uname -a
Linux omega 5.4.128-1-pve #1 SMP PVE 5.4.128-1 (Wed, 21 Jul 2021 18:32:02 +0200) x86_64 GNU/Linux

# lspci -nn | grep NVID
18:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] [10de:1e30] (rev a1)

# bash NVIDIA-Linux-x86_64-460.91.03-vgpu-kvm.run

Code:
The CC version check failed:

  The kernel was built with gcc version 8.3.0 (Debian 8.3.0-6), but the current compiler version is cc (Debian  
  10.2.1-6) 10.2.1 20210110.

  This may lead to subtle problems; if you are not certain whether the mismatched compiler will be compatible    
  with your kernel, you may wish to abort installation, set the CC environment variable to the name of the      
  compiler used to compile your kernel, and restart installation.

                           Ignore CC version check    OK

Code:
WARNING: Ignoring CC version mismatch:

           The kernel was built with gcc version 8.3.0 (Debian 8.3.0-6), but the current compiler version is cc  
           (Debian 10.2.1-6) 10.2.1 20210110.

                                                        OK

Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 460.91.03) is now complete.


# nvidia-smi
Tue Aug 10 21:33:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:18:00.0 Off | Off |
| 33% 41C P0 77W / 260W | 159MiB / 24575MiB | 0% Default |
| | | N/A |
 

Attachments

  • dkms001.png
    dkms001.png
    49.1 KB · Views: 43
  • dkms002.png
    dkms002.png
    21.8 KB · Views: 35
  • dkms003.png
    dkms003.png
    24.3 KB · Views: 30
  • driver001.png
    driver001.png
    18.1 KB · Views: 26
  • driver002.png
    driver002.png
    23.5 KB · Views: 24
  • driver003.png
    driver003.png
    14.5 KB · Views: 21
  • driver004.png
    driver004.png
    24.2 KB · Views: 25
  • make.log
    9.7 KB · Views: 4
  • nvidia-installer.log
    38.2 KB · Views: 2
  • 1628620228253.png
    1628620228253.png
    27.2 KB · Views: 34
Last edited:
  • Like
Reactions: davidmueller13
today's updates pve-kernel 5.4 & 5.11:

Bash:
uname -a
Linux omega 5.4.128-1-pve #1 SMP PVE 5.4.128-2 (Wed, 18 Aug 2021 16:20:02 +0200) x86_64 GNU/Linux
Bash:
bash NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run --no-cc-version-check

Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 470.63) is now complete.

and 5.11..
Bash:
uname -a
Linux omega 5.11.22-3-pve #1 SMP PVE 5.11.22-7 (Wed, 18 Aug 2021 15:06:12 +0200) x86_64 GNU/Linux
Bash:
bash NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run --no-cc-version-check

ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log for details.
 
Last edited:
thanks to this post and thread

Linux omega 5.11.22-3-pve #1 SMP PVE 5.11.22-7 (Wed, 18 Aug 2021 15:06:12 +0200) x86_64 GNU/Linux

download and unzip patch:
Bash:
cd /opt
wget https://github.com/rupansh/vgpu_unlock_5.12/archive/refs/heads/master.zip
unzip master.zip

prepare and install:
Bash:
bash NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run -x
cd NVIDIA-Linux-x86_64-470.63-vgpu-kvm/
patch -p0 < ../vgpu_unlock_5.12-master/twelve.patch
sed -i.bak '/MODULE_LICENSE("MIT");/c MODULE_LICENSE("Dual MIT/GPL");' kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c
sed -i.bak '/MODULE_LICENSE("NVIDIA");/c MODULE_LICENSE("Dual MIT/GPL");' kernel/nvidia/nv-frontend.c

./nvidia-installer --dkms


Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 470.63) is now complete.
 
thanks to this post and thread

Linux omega 5.11.22-3-pve #1 SMP PVE 5.11.22-7 (Wed, 18 Aug 2021 15:06:12 +0200) x86_64 GNU/Linux

download and unzip patch:
Bash:
cd /opt
wget https://github.com/rupansh/vgpu_unlock_5.12/archive/refs/heads/master.zip
unzip master.zip

prepare and install:
Bash:
bash NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run -x
cd NVIDIA-Linux-x86_64-470.63-vgpu-kvm/
patch -p0 < ../vgpu_unlock_5.12-master/twelve.patch
sed -i.bak '/MODULE_LICENSE("MIT");/c MODULE_LICENSE("Dual MIT/GPL");' kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c
sed -i.bak '/MODULE_LICENSE("NVIDIA");/c MODULE_LICENSE("Dual MIT/GPL");' kernel/nvidia/nv-frontend.c

./nvidia-installer --dkms


Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 470.63) is now complete.

I can confirm, that the 470.63 patched driver works, but 460.32.04 and 450.89 (and below possibly) gives (sys_close and kmap_types) errors during build even if it's patched!

I think it's related to the fact that nvidia fixed (some part of) 5.11 kernel compatibility in 460.67 and onward:

https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-460.67-Linux-Driver

(See the release highlights)
https://www.nvidia.com/Download/driverResults.aspx/171392/en-us
 
Last edited:
thanks to this post and thread

Linux omega 5.11.22-3-pve #1 SMP PVE 5.11.22-7 (Wed, 18 Aug 2021 15:06:12 +0200) x86_64 GNU/Linux

download and unzip patch:
Bash:
cd /opt
wget https://github.com/rupansh/vgpu_unlock_5.12/archive/refs/heads/master.zip
unzip master.zip

prepare and install:
Bash:
bash NVIDIA-Linux-x86_64-470.63-vgpu-kvm.run -x
cd NVIDIA-Linux-x86_64-470.63-vgpu-kvm/
patch -p0 < ../vgpu_unlock_5.12-master/twelve.patch
sed -i.bak '/MODULE_LICENSE("MIT");/c MODULE_LICENSE("Dual MIT/GPL");' kernel/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c
sed -i.bak '/MODULE_LICENSE("NVIDIA");/c MODULE_LICENSE("Dual MIT/GPL");' kernel/nvidia/nv-frontend.c

./nvidia-installer --dkms


Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 470.63) is now complete.
This worked on Debian 11.2 kernel 5.10.0-11-amd64!
Thanks so much for your help and documentation!
 
PVE 7.1, Debian 11.3,
# uname -a Linux tgn-pve1 5.13.19-6-pve #1 SMP PVE 5.13.19-15 (Tue, 29 Mar 2022 15:59:50 +0200) x86_64 GNU/Linux
Succefuly installed this
NVIDIA-Linux-x86_64-510.47.03-vgpu-kvm.run
But got empty in mdevctl types:
# nvidia-smi Thu May 26 08:27:36 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTX A6000 On | 00000000:18:00.0 Off | 0 | | 30% 40C P8 33W / 300W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A6000 On | 00000000:3B:00.0 Off | 0 | | 30% 46C P8 31W / 300W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA RTX A6000 On | 00000000:86:00.0 Off | 0 | | 30% 53C P8 42W / 300W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA RTX A6000 On | 00000000:AF:00.0 Off | 0 | | 30% 44C P8 35W / 300W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ root@tgn-pve1:~# mdevctl types root@tgn-pve1:~# mdevctl list

This driver gives me an error on newer kernel: 5.15.35-1-pve.
 
Thanks for the guide!
After installed custom drivers, I have either no output from:
mdevctl types
Maybe my graphic card NVIDIA RTX A6000 no supported by custom driver?
Also ● nvidia-topologyd.service failed to load:
Code:
# systemctl status nvidia-topologyd.service
● nvidia-topologyd.service - NVIDIA Topology Daemon
     Loaded: loaded (/lib/systemd/system/nvidia-topologyd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2022-06-03 10:50:17 MSK; 6min ago
    Process: 1718 ExecStart=/usr/bin/nvidia-topologyd (code=exited, status=203/EXEC)
        CPU: 765us

Jun 03 10:50:17 tgn-pve1 systemd[1]: Starting NVIDIA Topology Daemon...
Jun 03 10:50:17 tgn-pve1 systemd[1718]: nvidia-topologyd.service: Failed to locate executable /usr/bin/nvidia-topologyd: No such file or directory
Jun 03 10:50:17 tgn-pve1 systemd[1718]: nvidia-topologyd.service: Failed at step EXEC spawning /usr/bin/nvidia-topologyd: No such file or directory
Jun 03 10:50:17 tgn-pve1 systemd[1]: nvidia-topologyd.service: Control process exited, code=exited, status=203/EXEC
Jun 03 10:50:17 tgn-pve1 systemd[1]: nvidia-topologyd.service: Failed with result 'exit-code'.
Jun 03 10:50:17 tgn-pve1 systemd[1]: Failed to start NVIDIA Topology Daemon.
 
Also nvidia-topologyd.service failed to load:
Code:
# systemctl status nvidia-topologyd.service
nvidia-topologyd.service - NVIDIA Topology Daemon

I don't know what nvidia-topologyd is.

What the command shows:
nvidia-smi vgpu
 
Last edited:
Maybe my graphic card NVIDIA RTX A6000 no supported by custom driver?
No, doing something wrong. If the driver is installed, then everything is fine
Try to install clean updated drivers, they are compatible with Linux 5.15.35-1-pve.
(NVIDIA-Linux-x86_64-510.73.06-vgpu-kvm.run).
 
Just got installed NVIDIA-Linux-x86_64-510.73.06-vgpu-kvm.run. Still got same issue:

Code:
# nvidia-smi vgpu
Fri Jun  3 14:22:14 2022      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.06              Driver Version: 510.73.06                 |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  NVIDIA RTX A6000           | 00000000:3B:00.0             |   0%       |
+---------------------------------+------------------------------+------------+
|   1  NVIDIA RTX A6000           | 00000000:86:00.0             |   0%       |
+---------------------------------+------------------------------+------------+
|   2  NVIDIA RTX A6000           | 00000000:AF:00.0             |   1%       |
+---------------------------------+------------------------------+------------+
# mdevctl types
#
 
By having the 3 section listed, can you now see the 3 available pci as passthrough in vm option / add pci ? The a6000 gpu 0 should be the only one with the hdmi/dp port, and 1/2 as sole vgpu.

And for the whole install, @mishki i it's only the step listed in polloloco/vgpu-5.15 that are required ?
And of course, not putting the card in vfio.conf as : options vfio-pci ids xxxx;xxxx disable_vga=1
 
Server left to production use, and I'm limited in my experiments right now.
PCI pass-through is what I'm using right now. It works without any issues. When I tried to get VGPU working, I comment string in vfio.conf:
options vfio-pci ids=10de:2230,10de:1aef disable_vga=1
The a6000 gpu 0 should be the only one with the hdmi/dp port, and 1/2 as sole vgpu.
What do you mean? Display mode selector utility?
 
Hi
Help with such problems. When trying to install vGpu Nvidia with dkms get this error.
Applied all different patches from github.

ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 470.63 -k 5.15.35-2-pve`:
Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area...
'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.35-2-pve IGNORE_CC_MISMATCH='' modules...(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.15.35-2-pve (x86_64)
Consult /var/lib/dkms/nvidia/470.63/build/make.log for more information.


ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the
README available on the Linux driver download page at www.nvidia.com.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!