[TUTORIAL] PVE 8.22 / Kernel 6.8 and NVidia vGPU

dooferorg

Member
Apr 12, 2024
32
16
8
I thought I'd make this post since I see other ones made, and have commented on them myself, about getting NVidia vgpu working under PVE 8.2.2 and kernel 6.8

Basically read through this guide:

https://gitlab.com/polloloco/vgpu-proxmox

There is a caveat though with kernel 6.8 and requires a patch. I was able to find the author of a patch to get 16.5 compiling on kernel 6.8. https://gitlab.com/polloloco/vgpu-proxmox/-/merge_requests/9 .. However there seems they may be some corruption of that patch file. I've attached the one I used successfully (.txt suffix added).

Patch your downloaded NVidia driver. I went with 16.5 because I have a Tesla P4 card. You can see if your card is supported here: https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch vGPU-Grid-16.5-535.161.05-Linux-6.8.patch

You should be seeing that the patch applied cleanly.

Then run and install the 16.5 installer:

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel


Compiles and installs fine. Proxmox 8.2.2 on Kernel 6.8. mdev devices are available and work well.

Nothing groundbreaking and not my work to get it going but I thought others might like to have it referenced here.

It should be noted that I was using this patch against the originally downloaded NVidia host vgpu drivers (16.5) for the card I needed. YMMV if you have a card that needs to have the polloloco applied first. You may also be able to extract the polloloco patched archive and then apply this additional patch (e.g. patch -p1 < vGPU-Grid-16.5-535.161.05-Linux-6.8.patch) and then have a fully patched up source tree. I didn't test that though since I have the Tesla card. Just wanted to mention that.
 

Attachments

Last edited:
I have a rtx2060 and I don't know How apply polloloco patch and this patch, I tried to patching driver with polloloco and after with this patch bit I had a error... can you help me?
thanks a lot
 
Last edited:
Thanks for posting this. I was getting a bit frustrated at why 6.8 kernels wouldn't build the DKMS modules. I built your patch sans the binary file for my 535.154.02 build running a Tesla P40 (just deleted the binary out of the patch). Appreciate the thought of putting this out into the atmosphere.
 
Hi,
Thanks for this. I have a A40 and would like to get the same with ver. 17.1 compiling. Would the same patch work or we would need a different patch for the latest version.
 
I thought I'd make this post since I see other ones made, and have commented on them myself, about getting NVidia vgpu working under PVE 8.2.2 and kernel 6.8

Basically read through this guide:

https://gitlab.com/polloloco/vgpu-proxmox

There is a caveat though with kernel 6.8 and requires a patch. I was able to find the author of a patch to get 16.5 compiling on kernel 6.8. https://gitlab.com/polloloco/vgpu-proxmox/-/merge_requests/9 .. However there seems they may be some corruption of that patch file. I've attached the one I used successfully (.txt suffix added).

Patch your downloaded NVidia driver. I went with 16.5 because I have a Tesla P4 card. You can see if your card is supported here: https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch vGPU-Grid-16.5-535.161.05-Linux-6.8.patch

You should be seeing that the patch applied cleanly.

Then run and install the 16.5 installer:

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel


Compiles and installs fine. Proxmox 8.2.2 on Kernel 6.8. mdev devices are available and work well.

Nothing groundbreaking and not my work to get it going but I thought others might like to have it referenced here.

It should be noted that I was using this patch against the originally downloaded NVidia host vgpu drivers (16.5) for the card I needed. YMMV if you have a card that needs to have the polloloco applied first. You may also be able to extract the polloloco patched archive and then apply this additional patch (e.g. patch -p1 < vGPU-Grid-16.5-535.161.05-Linux-6.8.patch) and then have a fully patched up source tree. I didn't test that though since I have the Tesla card. Just wanted to mention that.
I am going to configure this today. The below are the resources.
Nvidia GPU : Nvidia A100 PCIe.
Processor: Intel Xeon
Proxmox Version 8.2.2
Kernel : 6.8.4-2

If any I hit any Roadblock, I will post here.
 
I went through the whole process on https://gitlab.com/polloloco/vgpu-proxmox and everything installed perfectly, no errors. But after rebooting and running `nvidia-smi` I get No devices were found :(

Maybe this is not the right thread but does anyone have any idea what this might be?

For more context:

Code:
root@proxmox:~# dmesg | grep -i nvidia
[    5.725757] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input6
[    5.725926] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input7
[    5.726078] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input8
[    5.726223] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input9
[    5.856866] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    6.009097] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[    6.013186] nvidia 0000:af:00.0: enabling device (0000 -> 0003)
[    6.013366] nvidia 0000:af:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    6.059905] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.161.05  Thu Jan 25 17:36:41 UTC 2024
[    6.739172] audit: type=1400 audit(1716473858.486:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1402 comm="apparmor_parser"
[    6.739182] audit: type=1400 audit(1716473858.486:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1402 comm="apparmor_parser"

Code:
root@proxmox:~# lspci  | grep -i nvidia
af:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 SUPER] (rev a1)
af:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
af:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
af:00.3 Serial bus controller: NVIDIA Corporation TU106 USB Type-C UCSI Controller (rev a1)

Code:
root@proxmox:~# lsmod | grep -i nvidia
nvidia_vgpu_vfio       98304  0
nvidia              56803328  2
mdev                   24576  1 nvidia_vgpu_vfio
kvm                  1372160  56 nvidia_vgpu_vfio,kvm_intel
vfio_pci_core          86016  2 nvidia_vgpu_vfio,vfio_pci
irqbypass              12288  113 vfio_pci_core,nvidia_vgpu_vfio,kvm
vfio                   69632  4 vfio_pci_core,nvidia_vgpu_vfio,vfio_iommu_type1,vfio_pci
i2c_nvidia_gpu         12288  0
i2c_ccgx_ucsi          12288  1 i2c_nvidia_gpu
 
Last edited:
this version also did not work for me (I spend nearly a week trying). I am waiting for a new ISO to try again...
 
  • Like
Reactions: brabb
I just gave up, reinstalled, pinned the kernel to ver. 6.5 and everything just works beautifully.

only thing is that we dont have a good VDI solution as spice just sucks with no utilization of the GPU

only option is to do an RDP. so we are exploring a web based guca with some gpu acceleration if possible.

wrong thread i know ! but, any hints in that direction?
 
Thanks for this. I just ran thru this without error. Note that 535.161.05.patch is available on PolloLoco's site

  1. ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --dkms -m=kernel --uninstall
  2. proxmox-boot-tool kernel unpin
  3. reboot, run uname -r to verify kernel 6.8 was loaded after the reboot
  4. ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch 535.161.05.patch
  5. ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel

After install, I verified it still showed up: "nvidia-smi"
However, the following didn't work "mdevctl types" and my VM failed to start, however, another reboot worked and both "nvidia-smi" and "mdevctl types" worked as expected and my VM booted up.

NOTE: I used the PolloLoco's guide when I first installed my P4 with unpatched drivers back with kernel 6.5, so I follow all the "Have a vgpu supported card? Read here!" notes.
 
Thanks for this. I just ran thru this without error. Note that 535.161.05.patch is available on PolloLoco's site

  1. ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --dkms -m=kernel --uninstall
  2. proxmox-boot-tool kernel unpin
  3. reboot, run uname -r to verify kernel 6.8 was loaded after the reboot
  4. ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch 535.161.05.patch
  5. ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel

After install, I verified it still showed up: "nvidia-smi"
However, the following didn't work "mdevctl types" and my VM failed to start, however, another reboot worked and both "nvidia-smi" and "mdevctl types" worked as expected and my VM booted up.

NOTE: I used the PolloLoco's guide when I first installed my P4 with unpatched drivers back with kernel 6.5, so I follow all the "Have a vgpu supported card? Read here!" notes.
I am using Nvidia A100. It supports only v15 GPU driver. I don't see the patches for that
 
Last edited:
I just gave up, reinstalled, pinned the kernel to ver. 6.5 and everything just works beautifully.

only thing is that we dont have a good VDI solution as spice just sucks with no utilization of the GPU

only option is to do an RDP. so we are exploring a web based guca with some gpu acceleration if possible.

wrong thread i know ! but, any hints in that direction?
Does Parsec work? I just got my Minisforum MS-01 and about to configure iGPU to pass thru to VM.
 
Does Parsec work? I just got my Minisforum MS-01 and about to configure iGPU to pass thru to VM.
I did not try parsec but I am trying to get Moonlight to work but i am stuck with the first step. not able to get the Nvidia RTX Experience to install

See, The Nvidia A40-8Q - with 8GB vRAM on an RDP session inside proxmox, this works perfectly. But the NVIDIA RTX Experience will not install... anyone with bright ideas on this one.


1717471982275.png
 
I just did a brand new install of Proxmox 8.2.2. Followed PolloLoco's guide and since I have a Tesla P4 installed I used 535.161.05 and applied the patch. Everything worked every step of the way without fail, but when I do mdevctl types I get a list of profiles for the Tesla P40. I can only find one other instance of this happening in this thread [https://forum.proxmox.com/threads/vgpu-tesla-p4-wrong-mdevctl-gpu.143247/] but other than repeatedly trying other drivers until they ended back up on 535.104.06, they don't know what caused the profiles to start displaying correctly. Does anyone here have an idea?

1717957383120.png
1717957449528.png
 
Last edited:
I'll be firing this up again Monday and can performing some further testing alongside you.
 
  • Like
Reactions: markc

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!