[TUTORIAL] PVE 8.22 / Kernel 6.8 and NVidia vGPU

dooferorg · May 15, 2024

I thought I'd make this post since I see other ones made, and have commented on them myself, about getting NVidia vgpu working under PVE 8.2.2 and kernel 6.8

Basically read through this guide:

https://gitlab.com/polloloco/vgpu-proxmox

There is a caveat though with kernel 6.8 and requires a patch. I was able to find the author of a patch to get 16.5 compiling on kernel 6.8. https://gitlab.com/polloloco/vgpu-proxmox/-/merge_requests/9 .. However there seems they may be some corruption of that patch file. I've attached the one I used successfully (.txt suffix added).

Patch your downloaded NVidia driver. I went with 16.5 because I have a Tesla P4 card. You can see if your card is supported here: https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch vGPU-Grid-16.5-535.161.05-Linux-6.8.patch

You should be seeing that the patch applied cleanly.

Then run and install the 16.5 installer:

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel

Compiles and installs fine. Proxmox 8.2.2 on Kernel 6.8. mdev devices are available and work well.

Nothing groundbreaking and not my work to get it going but I thought others might like to have it referenced here.

It should be noted that I was using this patch against the originally downloaded NVidia host vgpu drivers (16.5) for the card I needed. YMMV if you have a card that needs to have the polloloco applied first. You may also be able to extract the polloloco patched archive and then apply this additional patch (e.g. patch -p1 < vGPU-Grid-16.5-535.161.05-Linux-6.8.patch) and then have a fully patched up source tree. I didn't test that though since I have the Tesla card. Just wanted to mention that.

grignun · May 15, 2024

I have a rtx2060 and I don't know How apply polloloco patch and this patch, I tried to patching driver with polloloco and after with this patch bit I had a error... can you help me?
thanks a lot

ndrew · May 16, 2024

Thanks for posting this. I was getting a bit frustrated at why 6.8 kernels wouldn't build the DKMS modules. I built your patch sans the binary file for my 535.154.02 build running a Tesla P40 (just deleted the binary out of the patch). Appreciate the thought of putting this out into the atmosphere.

deepcloud · May 22, 2024

Hi,
Thanks for this. I have a A40 and would like to get the same with ver. 17.1 compiling. Would the same patch work or we would need a different patch for the latest version.

Sujith Arangan · May 22, 2024

dooferorg said:
I thought I'd make this post since I see other ones made, and have commented on them myself, about getting NVidia vgpu working under PVE 8.2.2 and kernel 6.8

Basically read through this guide:

https://gitlab.com/polloloco/vgpu-proxmox

There is a caveat though with kernel 6.8 and requires a patch. I was able to find the author of a patch to get 16.5 compiling on kernel 6.8. https://gitlab.com/polloloco/vgpu-proxmox/-/merge_requests/9 .. However there seems they may be some corruption of that patch file. I've attached the one I used successfully (.txt suffix added).

Patch your downloaded NVidia driver. I went with 16.5 because I have a Tesla P4 card. You can see if your card is supported here: https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch vGPU-Grid-16.5-535.161.05-Linux-6.8.patch

You should be seeing that the patch applied cleanly.

Then run and install the 16.5 installer:

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel

Compiles and installs fine. Proxmox 8.2.2 on Kernel 6.8. mdev devices are available and work well.

Nothing groundbreaking and not my work to get it going but I thought others might like to have it referenced here.

It should be noted that I was using this patch against the originally downloaded NVidia host vgpu drivers (16.5) for the card I needed. YMMV if you have a card that needs to have the polloloco applied first. You may also be able to extract the polloloco patched archive and then apply this additional patch (e.g. patch -p1 < vGPU-Grid-16.5-535.161.05-Linux-6.8.patch) and then have a fully patched up source tree. I didn't test that though since I have the Tesla card. Just wanted to mention that.

I am going to configure this today. The below are the resources.
Nvidia GPU : Nvidia A100 PCIe.
Processor: Intel Xeon
Proxmox Version 8.2.2
Kernel : 6.8.4-2

If any I hit any Roadblock, I will post here.

jolene · May 22, 2024

Where do you download 8.2.2? On Download page the only version I find is 8.2-1.

Sujith Arangan · May 22, 2024

jolene said:
Where do you download 8.2.2? On Download page the only version I find is 8.2-1.

Once installed it shows as 8.2.2.

deepcloud · May 22, 2024

Sujith Arangan said:
Once installed it shows as 8.2.2.
View attachment 68621

You update proxmox after installing 8.2

deepcloud · May 22, 2024

No Luck

Removed old drivers,
installed the new drivers
I Have put the patched drivers and also signed them as we have secureboot on

pedromcaraujo · May 23, 2024

I went through the whole process on https://gitlab.com/polloloco/vgpu-proxmox and everything installed perfectly, no errors. But after rebooting and running `nvidia-smi` I get No devices were found

Maybe this is not the right thread but does anyone have any idea what this might be?

For more context:

Code:

root@proxmox:~# dmesg | grep -i nvidia
[    5.725757] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input6
[    5.725926] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input7
[    5.726078] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input8
[    5.726223] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:ae/0000:ae:00.0/0000:af:00.1/sound/card0/input9
[    5.856866] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    6.009097] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[    6.013186] nvidia 0000:af:00.0: enabling device (0000 -> 0003)
[    6.013366] nvidia 0000:af:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=none:owns=none
[    6.059905] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.161.05  Thu Jan 25 17:36:41 UTC 2024
[    6.739172] audit: type=1400 audit(1716473858.486:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1402 comm="apparmor_parser"
[    6.739182] audit: type=1400 audit(1716473858.486:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1402 comm="apparmor_parser"

Code:

root@proxmox:~# lspci  | grep -i nvidia
af:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 SUPER] (rev a1)
af:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
af:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
af:00.3 Serial bus controller: NVIDIA Corporation TU106 USB Type-C UCSI Controller (rev a1)

Code:

root@proxmox:~# lsmod | grep -i nvidia
nvidia_vgpu_vfio       98304  0
nvidia              56803328  2
mdev                   24576  1 nvidia_vgpu_vfio
kvm                  1372160  56 nvidia_vgpu_vfio,kvm_intel
vfio_pci_core          86016  2 nvidia_vgpu_vfio,vfio_pci
irqbypass              12288  113 vfio_pci_core,nvidia_vgpu_vfio,kvm
vfio                   69632  4 vfio_pci_core,nvidia_vgpu_vfio,vfio_iommu_type1,vfio_pci
i2c_nvidia_gpu         12288  0
i2c_ccgx_ucsi          12288  1 i2c_nvidia_gpu

jolene · May 23, 2024

this version also did not work for me (I spend nearly a week trying). I am waiting for a new ISO to try again...

deepcloud · May 25, 2024

I just gave up, reinstalled, pinned the kernel to ver. 6.5 and everything just works beautifully.

only thing is that we dont have a good VDI solution as spice just sucks with no utilization of the GPU

only option is to do an RDP. so we are exploring a web based guca with some gpu acceleration if possible.

wrong thread i know ! but, any hints in that direction?

Randell · May 31, 2024

Thanks for this. I just ran thru this without error. Note that 535.161.05.patch is available on PolloLoco's site

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --dkms -m=kernel --uninstall
proxmox-boot-tool kernel unpin
reboot, run uname -r to verify kernel 6.8 was loaded after the reboot
./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch 535.161.05.patch
./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel

After install, I verified it still showed up: "nvidia-smi"
However, the following didn't work "mdevctl types" and my VM failed to start, however, another reboot worked and both "nvidia-smi" and "mdevctl types" worked as expected and my VM booted up.

NOTE: I used the PolloLoco's guide when I first installed my P4 with unpatched drivers back with kernel 6.5, so I follow all the "Have a vgpu supported card? Read here!" notes.

Sujith Arangan · May 31, 2024

Randell said:
Thanks for this. I just ran thru this without error. Note that 535.161.05.patch is available on PolloLoco's site

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --dkms -m=kernel --uninstall

proxmox-boot-tool kernel unpin

reboot, run uname -r to verify kernel 6.8 was loaded after the reboot

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm.run --apply-patch 535.161.05.patch

./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms -m=kernel

After install, I verified it still showed up: "nvidia-smi"
However, the following didn't work "mdevctl types" and my VM failed to start, however, another reboot worked and both "nvidia-smi" and "mdevctl types" worked as expected and my VM booted up.

NOTE: I used the PolloLoco's guide when I first installed my P4 with unpatched drivers back with kernel 6.5, so I follow all the "Have a vgpu supported card? Read here!" notes.

I am using Nvidia A100. It supports only v15 GPU driver. I don't see the patches for that

Randell · May 31, 2024

Sujith Arangan said:
I am using Nvidia A100. It supports only v15 GPU driver. I don't see the patches for that

I don't know anything about the A100. Does it require different drivers than the A10?

https://docs.nvidia.com/grid/16.0/grid-vgpu-release-notes-generic-linux-kvm/index.html

Here they explicitly mention support for the A10.

Sujith Arangan · May 31, 2024

Randell said:
I don't know anything about the A100. Does it require different drivers than the A10?

https://docs.nvidia.com/grid/16.0/grid-vgpu-release-notes-generic-linux-kvm/index.html

Here they explicitly mention support for the A10.

Yes A100 supports only up to v15.

https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html

Thunderusb · Jun 3, 2024

deepcloud said:
I just gave up, reinstalled, pinned the kernel to ver. 6.5 and everything just works beautifully.

only thing is that we dont have a good VDI solution as spice just sucks with no utilization of the GPU

only option is to do an RDP. so we are exploring a web based guca with some gpu acceleration if possible.

wrong thread i know ! but, any hints in that direction?

Does Parsec work? I just got my Minisforum MS-01 and about to configure iGPU to pass thru to VM.

deepcloud · Jun 4, 2024

Thunderusb said:
Does Parsec work? I just got my Minisforum MS-01 and about to configure iGPU to pass thru to VM.

I did not try parsec but I am trying to get Moonlight to work but i am stuck with the first step. not able to get the Nvidia RTX Experience to install

See, The Nvidia A40-8Q - with 8GB vRAM on an RDP session inside proxmox, this works perfectly. But the NVIDIA RTX Experience will not install... anyone with bright ideas on this one.

Terydan · Jun 9, 2024

I just did a brand new install of Proxmox 8.2.2. Followed PolloLoco's guide and since I have a Tesla P4 installed I used 535.161.05 and applied the patch. Everything worked every step of the way without fail, but when I do mdevctl types I get a list of profiles for the Tesla P40. I can only find one other instance of this happening in this thread [https://forum.proxmox.com/threads/vgpu-tesla-p4-wrong-mdevctl-gpu.143247/] but other than repeatedly trying other drivers until they ended back up on 535.104.06, they don't know what caused the profiles to start displaying correctly. Does anyone here have an idea?

Fimeg · Jun 9, 2024

I'll be firing this up again Monday and can performing some further testing alongside you.

[TUTORIAL] PVE 8.22 / Kernel 6.8 and NVidia vGPU

Member

Attachments

New Member

New Member

Active Member

Well-Known Member

New Member

Well-Known Member

Active Member

Active Member

New Member

New Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

New Member

Active Member

New Member

New Member

We value your privacy