vGPU with nVIDIA on Kernel 6.8

guruevi

Member
Apr 7, 2022
78
23
13
Quite a few changes with vGPU software on Kernel 6.8. The feature is called "Vendor-Specific VFIO Framework".

Note you will need 550.90.05 for this to work which you can download from nVIDIA. Things like mixed size vGPU (you can now have different memory allocations on the same GPU) and live migration on KVM are now a feature (haven't tested that feature yet, more to come).
Note: current versions of the 535 and perhaps other branches may now work as well (which you may need if you are using Pascal-era (2014) hardware). Review "Vendor-Specific VFIO" under KVM portion of nVIDIA documentation.

Do NOT use the Mediated Devices or PCI device through the GUI, once you do that, you will get a big kernel error in dmesg in relation to vfio and you won't be able to start the VM with the GPU anymore.
This may now be fixed as well in updated kernels.

Solution:
Create a vGPU allocation according to the documentation (https://docs.nvidia.com/vgpu/latest/pdf/grid-vgpu-user-guide.pdf#page71 - PDF page 71, human page 57)
Bash:
cd /sys/bus/pci/devices/domain\:bus\:vf-slot.v-function/nvidia
cat creatable_vgpu_types
echo <TYPE> > current_vgpu_type

Where the domain bus etc. is the PCI slot your GPU is plugged into. Note that this can change whenever any hardware is added or removed, depending on your system. You can find it in

Bash:
nvidia-smi
And look under Bus-Id
Code:
|   0  NVIDIA L40S                    On  |   00000000:0D:00.0 Off |                    0 |
| N/A   29C    P8             39W /  350W |   44545MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S                    On  |   00000000:B5:00.0 Off |                    0 |
| N/A   32C    P8             38W /  350W |       0MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
Add this to /etc/pve/qemu-server/<VMID>.conf
Code:
args: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:0d:01.1 -uuid aabababa-aabb-aabb-aabb-aabbccddeeff
Where 0000:0d:01.1 is the domain:bus:vf-slot.v-function you added the GPU type to. UUID is the uuid you can find in smbios1 (if it's set, depends on when your config was made, in 6 and 7 it doesn't seem to always have been there). If it's not set, set one through the SMBIOS option in the GUI (just randomly generate one)
 
Last edited:
Quite a few changes with vGPU software on Kernel 6.8. The feature is called "Vendor-Specific VFIO Framework".

Note you will need 550.90.05 for this to work which you can download from nVIDIA. Things like mixed size vGPU (you can now have different memory allocations on the same GPU) and live migration on KVM are now a feature (haven't tested that feature yet, more to come).

Do NOT use the Mediated Devices or PCI device through the GUI, once you do that, you will get a big kernel error in dmesg in relation to vfio and you won't be able to start the VM with the GPU anymore.

Solution:
Create a vGPU allocation according to the documentation (https://docs.nvidia.com/vgpu/latest/pdf/grid-vgpu-user-guide.pdf#page71 - PDF page 71, human page 57)
Bash:
cd /sys/bus/pci/devices/domain\:bus\:vf-slot.v-function/nvidia
cat creatable_vgpu_types
echo <TYPE> > current_vgpu_type

Where the domain bus etc. is the PCI slot your GPU is plugged into. Note that this can change whenever any hardware is added or removed, depending on your system. You can find it in

Bash:
nvidia-smi
And look under Bus-Id
Code:
|   0  NVIDIA L40S                    On  |   00000000:0D:00.0 Off |                    0 |
| N/A   29C    P8             39W /  350W |   44545MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S                    On  |   00000000:B5:00.0 Off |                    0 |
| N/A   32C    P8             38W /  350W |       0MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
Add this to /etc/pve/qemu-server/<VMID>.conf
Code:
args: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:0d:01.1 -uuid aabababa-aabb-aabb-aabb-aabbccddeeff
Where 0000:0d:01.1 is the domain:bus:vf-slot.v-function you added the GPU type to. UUID is the uuid you can find in smbios1 (if it's set, depends on when your config was made, in 6 and 7 it doesn't seem to always have been there). If it's not set, set one through the SMBIOS option in the GUI (just randomly generate one)

Hey, thanks for bringing this info. I was desparately looking for it in the huge documents of NVIDIA. I tested this on NVIDIA RTX A5000 and was able to do vgpu sucessfully. Mixed size vGPU also works quite well. I also used the vgpu_params to disable frame_rate_limiter for a VM to use it for gaming.
 
So I'm not sure if it is feasible for Proxmox to support Vendor-Specific VFIO, because it is REALLY vendor-specific to get it to work. So this is what I did (for each VM) to migrate them to vendor-specific VFIO

Setup a nvidia-sriov.service
INI:
[Unit]
Description=Enable NVIDIA SR-IOV
After=network.target nvidia-vgpud.service nvidia-vgpu-mgr.service
Before=pve-guests.service

[Service]
Type=oneshot
ExecStart=/usr/lib/nvidia/sriov-manage -e ALL
ExecStart=/usr/bin/nvidia-smi vgpu -shm 1

[Install]
WantedBy=multi-user.target
This enables SR-IOV and Mixed-Mode on all GPU

Publish this hookscript onto /var/lib/vz/snippets on each node. You may not want this in a central location if all your nodes are not simultaneously upgraded. You want this to fail on nodes that don't have the software yet.
https://gist.github.com/guruevi/7d9673c6f44f49b1841eaf49bbd727f9

Then to migrate the VM:
  • Turn the VM off
  • Remove the existing GPU from the VM in the GUI
  • Use this command to find an available slot for the GPU (eg. if you're looking for an "6Q" device (see nVIDIA docs, but 6Q is a 6GB virtual workstation slice. If none are found, you need to move it to a different host or pick a different type.
    • /var/lib/vz/snippets/nvidia_allocator.py find_gpu -6Q
    • Pick a "/sys/bus/pci/devices/" - you cannot have 2 VMs on the same host with the same Bus-ID, so choose sequentially for each VM on the system, once a VM has started, it will no longer appear in the find_gpu command above.
  • You will also find a 4-digit ID associated with that name (eg. 561 on an A40-6Q, again, some of this is in the nVIDIA docs and is specific to your GPU). Set a tag on the VM nvidia-xxxx where xxxx is the ID of the GPU type (eg nvidia-561)
  • These commands will set the configuration where 999 is the VM ID (see GUI), the devices/0000 is what you got from the above find_gpu command and uuid is the uuid under Options → SMBIOS settings in the GUI for the VM:
    • qm set 999 --args "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:3b:03.0 -uuid 7183686d-2d9b-4e3f-8f49-87246a9379c7"
    • qm set 999 --hookscript local:snippets/nvidia_allocator.py
  • Start the VM, install the drivers on the guest etc.
This works, unless the machine doesn't start due to an error, then the hookscript does not get called. You can manually call the hookscript to post-stop on a VMID or nvidia-smi -r / reboot in case of major issues (this resets the ENTIRE GPU, be careful nothing is running).

There are still cases where things go haywire with VFIO (see my first post) if you forget to disable legacy VFIO first and boot the VM with it still set.

I have tried finding a way to dynamically allocate the sysfsdev, unless we can modify the configuration pre-launch or pass "args" from the hook script, that won't happen. The parameter after sysfsdev seems to get parsed and then the address used somehow, it does not get used as an actual file descriptor (you can't symlink from somewhere else for example).
 
Last edited:
So I'm not sure if it is feasible for Proxmox to support Vendor-Specific VFIO, because it is REALLY vendor-specific to get it to work. So this is what I did (for each VM) to migrate them to vendor-specific VFIO

Setup a nvidia-sriov.service
INI:
[Unit]
Description=Enable NVIDIA SR-IOV
After=network.target nvidia-vgpud.service nvidia-vgpu-mgr.service
Before=pve-guests.service

[Service]
Type=oneshot
ExecStart=/usr/lib/nvidia/sriov-manage -e ALL
ExecStart=/usr/bin/nvidia-smi vgpu -shm 1

[Install]
WantedBy=multi-user.target
This enables SR-IOV and Mixed-Mode on all GPU

Publish this hookscript onto /var/lib/vz/snippets on each node. You may not want this in a central location if all your nodes are not simultaneously upgraded. You want this to fail on nodes that don't have the software yet.
https://gist.github.com/guruevi/7d9673c6f44f49b1841eaf49bbd727f9

Then to migrate the VM:
  • Turn the VM off
  • Remove the existing GPU from the VM in the GUI
  • Use this command to find an available slot for the GPU (eg. if you're looking for an "6Q" device (see nVIDIA docs, but 6Q is a 6GB virtual workstation slice. If none are found, you need to move it to a different host or pick a different type.
    • /var/lib/vz/snippets/nvidia_allocator.py find_gpu -6Q
    • Pick a "/sys/bus/pci/devices/" - you cannot have 2 VMs on the same host with the same Bus-ID, so choose sequentially for each VM on the system, once a VM has started, it will no longer appear in the find_gpu command above.
  • You will also find a 4-digit ID associated with that name (eg. 561 on an A40-6Q, again, some of this is in the nVIDIA docs and is specific to your GPU). Set a tag on the VM nvidia-xxxx where xxxx is the ID of the GPU type (eg nvidia-561)
  • These commands will set the configuration where 999 is the VM ID (see GUI), the devices/0000 is what you got from the above find_gpu command and uuid is the uuid under Options → SMBIOS settings in the GUI for the VM:
    • qm set 999 --args "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:3b:03.0 -uuid 7183686d-2d9b-4e3f-8f49-87246a9379c7"
    • qm set 999 --hookscript local:snippets/nvidia_allocator.py
  • Start the VM, install the drivers on the guest etc.
This works, unless the machine doesn't start due to an error, then the hookscript does not get called. You can manually call the hookscript to post-stop on a VMID or nvidia-smi -r / reboot in case of major issues (this resets the ENTIRE GPU, be careful nothing is running).

There are still cases where things go haywire with VFIO (see my first post) if you forget to disable legacy VFIO first and boot the VM with it still set.

I have tried finding a way to dynamically allocate the sysfsdev, unless we can modify the configuration pre-launch or pass "args" from the hook script, that won't happen. The parameter after sysfsdev seems to get parsed and then the address used somehow, it does not get used as an actual file descriptor (you can't symlink from somewhere else for example).
I noticed there is a symlink to the vfio devices - virtfn0.. created by the nvidia-smi script after it has been run. Maybe this can come into some use:

Code:
root@a40# bus=$(nvidia-smi -q |grep ^GPU |awk -F " 0000" '{print tolower($2)}')
root@a40# /usr/lib/nvidia/sriov-manage -e $bus
root@a40 # ls /sys/bus/pci/devices/$bus/| grep ^virtfn |wc -l
32
This is mentioned in this thread: https://forums.developer.nvidia.com...es-and-i-cant-create-vgpus-instances/172907/3
 
Thanks for the pointer, so I went and opened the nvidia-smi command, it is just a (proprietary) bash script.

It has potential for hypervisor managers to write a custom version to create and release VFs on demand.

Biggest problem is that I cannot control the ID of the VF (without hacking the kernel), it will always be sequential from 0..n. After 32 it stops working. With some smart placement you could make sure all your clients always get “their” unique addresses, especially if your cluster is small and homogenous.

But you could potentially write a driver that sits in between nVIDIA and pci-vf-stub and does some funky stuff and rewrite nvidia-smi but that is well above my capabilities.
 
Just FYI, we're aware of the changes that NVIDIA made here (only very recently though) and we're thinking about how we can go forward here. (This is one reason for why there's no 6.8 compatibility on the wiki yet)
 
This one really threw me for a loop. I upgraded to Proxmox 8.2.x for the ESXi migration stuff, and lost all my vGPUs. Glad I found this thread!

Question tho - is it safe to pin Kernel 6.5.13-5-pve and reinstall the NVIDIA GRID drivers for now?
 
This one really threw me for a loop. I upgraded to Proxmox 8.2.x for the ESXi migration stuff and lost all my vGPUs. Glad I found this thread!

Question tho - is it safe to pin Kernel 6.5.13-5-pve and reinstall the NVIDIA GRID drivers for now?
I was on a pinned kernel before I found this thread, and it worked well for the 5 VMs that I had. But in this way It was not possible to have the benefits of kernel updates and Proxmox updates. Now, with Vendor-Specific Vfio, the kernel is the latest, and I get all updates. So you need to decide for yourself :)
 
I was on a pinned kernel before I found this thread, and it worked well for the 5 VMs that I had. But in this way It was not possible to have the benefits of kernel updates and Proxmox updates. Now, with Vendor-Specific Vfio, the kernel is the latest, and I get all updates. So you need to decide for yourself :)
I completely agree. I'm just looking for a temporary solution until a more permanent, and supported, one is available.
 
What about if you have hardware that is not supported by the R550 drivers? Are users of Pascal generation hardware permanently stuck on the 6.5 kernel?
 
What about if you have hardware that is not supported by the R550 drivers? Are users of Pascal generation hardware permanently stuck on the 6.5 kernel?
that's a question you have to ask nvidia. i'm not sure what their current policy regarding lts + newer kernels are, but only they have the capability to update the older branch of the driver for newer kernels...
 
Is the noVNC console supposed to work after adding a vGPU to a Windows VM using this method? I'm getting a black screen once the OS has booted. RDP does work.

Edit: the VM has a stardard VGA display besides the vGPU.
 
Last edited:
no, VNC cannot access the display of the virtual gpu at the moment (there is an additional property for qemus pci device: 'display=on' that could work, but in my last tests this was very unstable and not really usable..)
 
I thought the noVNC console used the "Standard VGA" display, not the vGPU, as many vGPU do not even have display output. In fact, I tried to send Ctrl+Alt+Sup, typed the password and the VM did log in and then I can see the VM desktop. I mean, the noVNC console is black in the login screen but it works once logged in.
 
ok, i probably left too much out in my answer. the vnc part cannot access the vgpu part only the emulated gpu. though you generally don't have fun with such a setup, since mouse input via vnc is a bit weird AFAIR and windows probably will not use the nvidia gpu on the display of the emulated gpu. The best experience (currently) is to install remoting software inside the guest and use that (rdp/vnc/parsec/hp anywhere/etc...)
 
the vnc part cannot access the vgpu part only the emulated gpu. though you generally don't have fun with such a setup, since mouse input via vnc is a bit weird AFAIR and windows probably will not use the nvidia gpu on the display of the emulated gpu

Yes, that was clear to me. The console access is just for some basic admin tasks like setting network configs, doing updates, etc. It's true that the mouse behavior is quite funky and barely usable. Seems as if Windows detects two monitors and joins the noVNC and vGPU displays, making the mouse misbehave.
 
Update: got noVNC console to work again.

The problem is that Windows extends the display to both noVNC and vGPU display and sets the vGPU one as "primary". As the login screen is shown in the primary display only, the noVNC console remained black.

Once logged in, use Windows Key + i to open settings, then Windows Key + Shift + arrow to move the settings window to the noVNC display. Then change the setting to use only screen 1.

Use keyboard shortcuts: on Windows 2019, the mouse cursor is shown and may help to change this settings. On Windows 2022 the cursor disappears and you have to literally guess where the pointer is.
 
VNC works on my setup, my display does not get treated as 2 displays, but as 1 (A40/L40 Q-series vGPU). There is an "xvga" option to make the GPU an output at boot time, at which point only the UEFI type stuff will be displayed, although nVIDIA does have accelerated x11vnc etc.

Make sure you install the VirtIO drivers and the nVIDIA OpenGL patch for RDP on Windows. I'm playing around with Sunshine for a VDI streaming solution.

@AbsolutelyFree: https://docs.nvidia.com/vgpu/index.html - the R470 and R535 branches still are available. I don't know whether they've been updated for 6.8 yet since Ubuntu LTS is still on 6.5.

It is "safe" to pin 6.5 for a while depending on your security profile and features you want from the 6.8 kernel. Not sure if Canonical backports things into 6.5 for LTS.
 
Last edited:
This one really threw me for a loop. I upgraded to Proxmox 8.2.x for the ESXi migration stuff, and lost all my vGPUs. Glad I found this thread!

Question tho - is it safe to pin Kernel 6.5.13-5-pve and reinstall the NVIDIA GRID drivers for now?

I ended up going this route...for now. Everything is working fine. This cluster is in production, with NVIDIA GRID licensed A10 GPUs. After getting a renewal quote from Broadcom for VMware licensing at 10x last year's amount, it was a no-brainer. We already have Proxmox as our production server cluster for several years, but this one is for VDI, where Horizon View was king (unfortunately).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!