NVIDIA vGPU Software 18 Support for Proxmox VE

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
6,477
3,606
303
South Tyrol/Italy
shop.proxmox.com
Proxmox VE is the newest addition to the NVIDIA vGPU supported hypervisors, beginning with NVIDIA vGPU Software v18.0, released today.

NVIDIA vGPU software enables multiple virtual machines to share a single supported physical GPU, learn more at https://www.nvidia.com/en-us/data-center/virtual-solutions/.

A comprehensive guide how to configure NVIDIA's Virtual GPU (vGPU) technology within Proxmox VE is available from Proxmox at https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE.

Enterprise support for NVIDIA vGPU software on Proxmox VE is available requiring a Proxmox subscription level of Basic, Standard, or Premium as well as an NVIDIA vGPU entitlement.
 
I want to use gpu on a NVIDIA RTX 5000 Ada on the latest Version of Proxmox
Yesterday I downloaded the current Version of Proxmox ISO and installed it from scratch, registered with the subscription key und then updated everything to the latest stat and rebooted. This brought me here:

Code:
root@pve01:~# pveversion
pve-manager/8.3.4/65224a0f9cd294a3 (running kernel: 6.8.12-8-pve)

Then I followed these instructions given here:

https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE

I used the latest driver product version 18 that seemed suitable to me:
NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60

Every seems to work - but when I run this command:

Code:
lspci -d 10de:
81:00.0 VGA compatible controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)
81:00.1 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)

-> I do not get any virtual functions like mentioned in your documentation:

Code:
# lspci -d 10de:
01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.4 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.5 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
...

Here some analysis:

Code:
root@pve01:~# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03             Driver Version: 570.124.03     CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 5000 Ada Gene...    On  |   00000000:81:00.0 Off |                    0 |
| 30%   27C    P8             25W /  250W |       0MiB /  30712MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

To me this looks like to be expected.

Code:
root@pve01:~# nvidia-smi vgpu
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03             Driver Version: 570.124.03                |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  NVIDIA RTX 5000 Ada Ge...  | 00000000:81:00.0             |   0%       |
+---------------------------------+------------------------------+------------+

To me this looks like to be expected.

Code:
root@pve01:~# nvidia-smi vgpu -c
GPU 00000000:81:00.0
    NVIDIA RTX5000-Ada-1B
    NVIDIA RTX5000-Ada-2B
    NVIDIA RTX5000-Ada-1Q
    NVIDIA RTX5000-Ada-2Q
    NVIDIA RTX5000-Ada-4Q
    NVIDIA RTX5000-Ada-8Q
    NVIDIA RTX5000-Ada-16Q
    NVIDIA RTX5000-Ada-32Q
    NVIDIA RTX5000-Ada-1A
    NVIDIA RTX5000-Ada-2A
    NVIDIA RTX5000-Ada-4A
    NVIDIA RTX5000-Ada-8A
    NVIDIA RTX5000-Ada-16A
    NVIDIA RTX5000-Ada-32A

To me this looks like to be expected.

Code:
root@pve01:~# nvidia-smi vgpu -q
GPU 00000000:81:00.0
    Active vGPUs                          : 0

To me this looks like to be expected.

Code:
root@pve01:~# lspci -d 10de:
81:00.0 VGA compatible controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)
81:00.1 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)

And here I'm stuck. No additional virtual devices,

To me there is some step not mentioned in the proxmox documentation - at least for a RTX 5000 Ada GPU?

Do I need to create the virtual functions on a RTX 5000 Ada GPU somehow?
This seems to be the case - according to the NVIDIA documentation for Version 18 you can find here:

https://docs.nvidia.com/vgpu/latest...ndex.html#creating-vgpu-device-red-hat-el-kvm

If I follow along there I get stuck here:

Code:
root@pve01:~# ls /sys/bus/pci/devices/0000\:81\:00.0/
aer_dev_correctable  broken_parity_status    current_link_speed  driver         i2c-6  iommu_group    max_link_speed  numa_node    reset      resource1        resource3_wc  subsystem_device
aer_dev_fatal         class    current_link_width  driver_override  i2c-7  irq       max_link_width  power    reset_method      resource1_resize  resource5      subsystem_vendor
aer_dev_nonfatal     config    d3cold_allowed        enable         i2c-8  link       modalias       power_state    resource      resource1_wc        revision      uevent
ari_enabled         consistent_dma_mask_bits    device        i2c-10         i2c-9  local_cpulist  msi_bus       remove    resource0      resource3        rom      vendor
boot_vga         consumer:pci:0000:81:00.1    dma_mask_bits        i2c-11         iommu  local_cpus       msi_irqs       rescan    resource0_resize  resource3_resize  subsystem

According to the NVIDIA documentation in the mentioned directory should be entries called virtfnNN like this:

Code:
# ls -l /sys/bus/pci/devices/0000:41:00.0/ | grep virtfn
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn0 -> ../0000:41:00.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn1 -> ../0000:41:00.5
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn10 -> ../0000:41:01.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn11 -> ../0000:41:01.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn12 -> ../0000:41:02.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn13 -> ../0000:41:02.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn14 -> ../0000:41:02.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn15 -> ../0000:41:02.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn16 -> ../0000:41:02.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn17 -> ../0000:41:02.5
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn18 -> ../0000:41:02.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn19 -> ../0000:41:02.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn2 -> ../0000:41:00.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn20 -> ../0000:41:03.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn21 -> ../0000:41:03.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn22 -> ../0000:41:03.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn23 -> ../0000:41:03.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn24 -> ../0000:41:03.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn25 -> ../0000:41:03.5
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn26 -> ../0000:41:03.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn27 -> ../0000:41:03.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn28 -> ../0000:41:04.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn29 -> ../0000:41:04.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn3 -> ../0000:41:00.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn30 -> ../0000:41:04.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn31 -> ../0000:41:04.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn4 -> ../0000:41:01.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn5 -> ../0000:41:01.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn6 -> ../0000:41:01.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn7 -> ../0000:41:01.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn8 -> ../0000:41:01.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn9 -> ../0000:41:01.5

But I do not get any virtfnNN entry here... :(

This also does not look like it is actually working (not sure if relevant):
Code:
root@pve01:~# mdevctl types

root@pve01:~#
 
Last edited:
Hi,

Did anyone install this on a Windows 11 22H2 virtual machine with cpu type `host`?
It seems that nested virtualization doesnt work anymore after this updated guest driver.
 
Hi,

Did anyone install this on a Windows 11 22H2 virtual machine with cpu type `host`?
It seems that nested virtualization doesnt work anymore after this updated guest driver.
Hi,

are you sure it has something to do with the vgpu driver ? (i can't imagine that this should make a lot of difference) If yes, please open a new thread so it can be properly discussed
 
Per the documentation on the nVIDIA website
Nested Virtualization Is Not Supported by NVIDIA vGPU
 
yes, if you want to get support you have to have an nvidia entitlement according to their licensing terms: https://docs.nvidia.com/vgpu/18.0/grid-licensing-user-guide/index.html
Thanks, for me this is just a hobby and I won't hold you or anyone on this list to anything, but it is your understanding that if I do not want NVIDIA support I will:
  1. get full use of the paid for NVIDIA hardware
  2. get full use of whatever software/drivers are needed for virtualization with Proxmox, and
  3. I don't need to pay NVIDIA, other than for their hardware,
 
you actually have to pay NVIDIA a license to get a hold of the vGPU host and guest drivers. also note that this normally only works with some of their data center/workstation GPUs, not with consumer ones.