vGPU with nVIDIA on Kernel 6.8

that's a question you have to ask nvidia. i'm not sure what their current policy regarding lts + newer kernels are, but only they have the capability to update the older branch of the driver for newer kernels...

As of v16.6 (or possibly v16.5), the kernel module for the R535 drivers does build correctly for the 6.8 kernels. I was asking specifically about this new way of handling vGPUs. As I recently found, booting into the 6.8 kernel with the v16.6 kernel module seems to work correctly except that no mdevs show up under mdevctl.

@AbsolutelyFree: https://docs.nvidia.com/vgpu/index.html - the R470 and R535 branches still are available. I don't know whether they've been updated for 6.8 yet since Ubuntu LTS is still on 6.5.

It is "safe" to pin 6.5 for a while depending on your security profile and features you want from the 6.8 kernel. Not sure if Canonical backports things into 6.5 for LTS.

In your OP, you said that this new way of handling vGPUs requires the R550 drivers. However, the relevant section of the documentation for the R535 drivers ( https://docs.nvidia.com/vgpu/16.0/pdf/grid-vgpu-user-guide.pdf#page71 ) seems quite similar to the documentation for the R550 drivers and implies that this new method of vGPU handling works with the R535 drivers as well. There was also another recent v16.7 update which I haven't had a chance to install yet and AFAIK the R535 drivers are supported until 2026, so I believe that support for the 6.8 kernel is being provided with the R535 drivers as well, but I haven't had the time to try this out yet.
 
@AbsolutelyFree: I could not get 535.154.02 to compile for Kernel 6.8 which is (by now) 2 versions old and the other ones I had access to (this was a month ago now) didn't seem to work either.

However, nVIDIA has released newer drivers in the past few weeks, which I have not tested, from the docs, it seems they may now be compatible. Not sure if they are feature parallel, things like the mixed size vGPU (-shm setting) may or may not work.

Yes, it is normal that no mdev show up under kernel 6.8 as it is now using the vendor framework, you can see how that works in the docs, the OP or in my Python hook script.
 
Last edited:
@AbsolutelyFree: I could not get 535.154.02 to compile for Kernel 6.8 which is ~2 versions old and I had tested with a newer version they released to me too. However, nVIDIA has released newer drivers, which I have not tested, from the docs, it seems they may now be compatible. Not sure if they are feature parallel, things like the mixed size vGPU may or may not work.

Yes the 535.154.02 (16.3) version of the host drivers did not work with the 6.8 kernel. I believe it was either 535.161.05 (16.5) or 535.183.04 (16.6) version of the host drivers that started installing and working successfully on the 6.8 kernel. I am currently pinning the 6.5 kernel on my vGPU nodes because the old mdev method still works on that kernel, and I wasn't aware of this new method of handling vGPUs until I saw this thread.

Yes, it is normal that no mdev show up under kernel 6.8 as it is now using the vendor framework, you can see how that works in the docs, the OP or in my Python hook script.

Exactly, and this paired with the documentation for the new handling of vGPU creation on R550 drivers seeming to be the same as the documentation for the R535 drivers is what leads me to believe that this new method of handling vGPUs seems like it should be compatible with the latest version of the R535 drivers as well. I hope to have some time to try it out soon.
 
Unfortunately it seems like it is a hard requirement that your GPU supports SR-IOV in order for this new method of handling vGPUs on kernel 6.8 to work. The Tesla P4, which I am using, does not support SR-IOV. According to the documentation though, the latest versions of the v16 drivers should work on kernel 6.8 as long as the GPU supports SR-IOV.
 
I updated my Gist: https://gist.github.com/guruevi/7d9673c6f44f49b1841eaf49bbd727f9 - it now handles multiple vGPU assigned to the same VM.

To use it, use the recommended -device vfio-pci,sysfsdev=/sys/bus/pci/devices/{available_vgpu} and plop another one right behind it -device vfio-pci,sysfsdev=/sys/bus/pci/devices/{second_available_vgpu} - it will use the tag in Proxmox to load both vGPU with the same type.

It also handles LIVE MIGRATION! It works between two nodes, they have to have the same drivers and obviously the vGPU type has to be available and match etc.

IF your guest has older drivers (eg. 535) and the host has 550, it won't crash, but the GPU won't be available until you install the proper drivers and reboot. I got "Unable to determine the device handle for GPU0000:00:02.0: Unknown Error" post-migration in that case. Typically nVIDIA allows the guest and the host to be 2 major versions different, if the guest has a lower version and the host has a higher version number, but in this case, they do seem to want a match, or perhaps I need an updated 535 driver, test it before you do this.

I also saw someone forked it and made it more Pythonic, when I have some time, I'll copy that work. Or make a repo with some sample Ansible playbooks.
 
Last edited:
fyi: i posted a patch series to our devel mailing list, so feel free to test it (if you want/can):

https://lists.proxmox.com/pipermail/pve-devel/2024-August/065046.html
What's the process for implement this on 6.8.8-4-pve, I've not ran any patches before.

I have everything running on a number of dual 6000 ada machines using guruevi's gist and some manual operations. Cards are all working well however it would be good to have things run a bit smoother. I need to add some updates to this script as well.

I was going to test you patch on one machine to see how it goes.
 
@Tallscott: The patches apply changes directly to the underlying files, rather than being done through typical package management (apt update).

If you are unfamiliar with the patch and diff command, it is a way of communicating changes in a file. A patch file specifically is a list of + and - and other instructions that tell the patch command which file(s) to modify and which line(s) in the file to add/remove.

You'd have to manually copy the patch file onto the underlying OS, then run the patch command with the patch file as input and pointing it at the directory it is referencing. I haven't looked at the files in depth, I just skimmed the code but don't have time to test it today, I will later this week, I wouldn't suggest doing this on a production system though.

https://www.howtogeek.com/415442/how-to-apply-a-patch-to-a-file-and-create-patches-in-linux/
 
a bit of general developer info: https://pve.proxmox.com/wiki/Developer_Documentation

the correct way to test those changes is to checkout the source repositories, apply the patches with e.g. `git am` and then rebuild the packages and install them
i'm not expecting anybody here to do that though, just wanted to mention it, in case there are some motivated people with dev experience ;)
 
I tested the patch on my instance and got 2 errors :
- missing find_on_current_node method: this might be due to the fact that I applied the live migration patch from July first. easy fix was to recreate the method
- Invalid configuration: 'mdev' does not match for 'test' (0 != 1): I'm able to create the mediated device mappings on the cluster ui but as soon as I try to add a mapped device to a VM, I get this error for all the mappings that I created.

Again, the second error might also be due to the fact that I have the live migration patch in applied.
 
ah ok, yes the live-migration patches will probably interfere/not work with that series

i focused now on getting it to work again, and after i'll send a new version of the live migration patches to adapt to the changes
 
Hi!

I am looking to get vGPU working for L40S GPUs on my server. I think I'm there 90% but need some push to get me over the finish line :)

I have Proxmox 8.2 installed. One of the GPUs is 0000:e3:00.0. I did the following:

Code:
# /usr/lib/nvidia/sriov-manage -e 0000:e3:00.0
# cd /sys/bus/pci/devices/0000:e3:00.4/nvidia
# echo 1150 > current_vgpu_type
Then I went into shell and added the following to the VM script:

Code:
args: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:e3:00.4 -uuid XXX

I already added a PCI device to the VM: 0000:e3:00.4. I then tried to start up the VM, It got stuck and kept in an infinite loop. The only way to get out of it, was to restart the entire server.

The idea would be to have a vGPU (eg L40S-4Q) attached to a VM. What am I missing? Any setting that I should still set?

Thanks!
 
Great! I could now get into the VM with those args included in the conf file. I also managed to start a VM with a profile L40S-4Q, which is great!!

However, three things I notice:

1/ After restart of the hypervisor, I had to redu the sriov-manage etc as all the settings were lost. Is that normal?
2/ I got a warning when starting the VM:
Code:
root@pve:~# qm start 100
swtpm_setup: Not overwriting existing state file.
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:e3:00.4: warning: vfio 0000:e3:00.4: Could not enable error recovery for the device
root@pve:~#
3/ When I stop the VM (either by Windows Start | Shutdown, or force Stop in PVE), I cannot seem to restart the VM anymore. It gets stuck in Start boot option:
1724072469638.png

After a while, it can be restarted. I checked that nvidia-smi the vGPU was gone, and it was.

1724072651648.png

Maybe there is some kind of parameter that should be set at shutdown VM?

Thanks!
 
@patrickob: Hence why I use the hook functionality to reset the GPU on every stop/start. Technically the current_vgpu_type should remain the same (you can read the setting as well as write it) until the GPU is reset.

HOWEVER:
It is potentially possible you have 'hung' the GPU in the guest (Windows driver framework is unstable garbage in general) and then you need to indeed remove the vGPU setting (set current_vgpu_type to 0, then set it back to the value you want). I have found the Windows driver doesn't like a complete turn off and turn on (the reset option in Proxmox will work if you didn't crash the Windows), but a Linux guest doesn't seem to care.

You shouldn't have to restart the physical GPU (which is what sriov-manage -e does behind the scenes) unless your guest software is doing some ugly stuff (I've had it happen though). Make sure you are using the matching driver (technically you can skip over 2 major versions, with caveats).
 
Haha, Windows is unstable rings a bell :)

Which hook functionality you are pointing to? Sounds interesting. Resetting the current_vgpu_type does not sound optimal, especially if I have multiple VMs running with the same vgpu type...

I just had it again, but after 30 min or so, it magically worked again. Very stable indeed...
 
You only need to reset the current_vgpu_type on that particular virtual function assigned to the guest, it doesn't affect the other vGPU (as you can now have multiple vGPU types). I would refer you to my initial post(s) in this thread: https://forum.proxmox.com/threads/vgpu-with-nvidia-on-kernel-6-8.150840/post-685963 - I wrote a quick and dirty Python script that hooks into the hook script functionality of Proxmox to "dynamically" start/stop the VF.
 
Top! I have attached the hook script to the VM. However, somehow I now have an infinite boot loading screen. It does not want to go to Windows... I have restarted the server, but now it goes to the Windows recovery on boot. After that, again the infinite boot screen.

1724091413477.png

These are my settings in 100.conf:
Code:
agent: 1
args: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:e3:03.3 -uuid e4afc546-88b4-4772-82df-ea93eb17188c
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 16
cpu: host
cpulimit: 32
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hookscript: local:snippets/nvidia_allocator.py
ide0: none,media=cdrom
ide2: none,media=cdrom
machine: pc-q35-9.0
memory: 65536
meta: creation-qemu=9.0.2,ctime=1724057417
name: windev-100
net0: virtio=BC:24:11:5E:37:32,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: local-lvm:vm-100-disk-1,cache=writeback,discard=on,iothread=1,size=128G
scsihw: virtio-scsi-single
smbios1: uuid=e4afc546-88b4-4772-82df-ea93eb17188c
sockets: 2
tags: nvidia-1150
tpmstate0: local-lvm:vm-100-disk-2,size=4M,version=v2.0
vmgenid: 747d8476-353d-467f-8ac6-766443bafe97

Wrt the current_vgpu_type reset, I think I (finally) got it. You state the reset of current_vgpu_type will only affect one VM as there is a one-to-one link between a vGPU. However, how do I avoid that I create too many of the same type effectively overrunning the available VRAM (eg I create 10 L40S-8Q, while only 6 can be created?)
 
So, I don't know why Windows won't boot, do you have a Linux machine to test or can you uninstall the current GPU drivers in Windows and boot without drivers (that way you will find whether the GPU hardware is properly recognized). With Linux you can at least test what/where it is hanging and see kernel output.

I don't know what your hardware config is like, but you should enable NUMA as the GPU has to map into physical memory attached to the CPU the GPU is attached to. You need to be able to map all your memory into that NUMA node (although if you run into that, mine just times out trying to start). Other than that, it looks very similar to mine (I use virtio for the drives instead of SCSI, but that shouldn't matter).

As far as your other question, the nVIDIA card won't let you map the n-th vGPU if you run out of VRAM. So if you create 10 L40S-8Q, you can start 6 of them but the 7th won't be allowed by the driver to create its vGPU, the hook script will error out when you attempt to create it and hopefully print an error message to that effect.
 
Last edited:
Good point on the NUMA. At this point, I put 2 sockets in the config. Maybe I should better take all vCPUs on 1 socket. Not sure how to ensure that the GPU is tied to the same NUMA as where the vCPUs are located in though...

I got it to run on Ubuntu 22 VM with L40S-8Q:
1724098526779.png

After a reboot, it also ran 2 times on Windows, but then it got stuck again on the same boot place :S Second time I did get a blue screen for a kernel fault. I did notice that the GPU drivers were not installed (anymore) the first time I boot in Windows. I cannot access windows anymore, so I'm not able to uninstall the driver. I managed to run it once more, after changing the PC-Q35 from 9.0 to 8.2. It first gave the recovery mode, then it booted in Windows. Next time after shutdown, it was stuck again in boot loop... Also, changing it back or to another value did not ensure a Windows startup.

So no real reproducable reason why this thing gets stuck...

While I am super happy that the vGPU finally works, and it does seem to work (for now) in Ubuntu, it is sadly required for our work to get it to work with Windows as well... Would there be no logs in Proxmox for me to go through?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!