vgpu running on a NVIDIA RTX 5000 Ada on the latest Version of Proxmox

Mar 5, 2025
3
1
3
I want to use gpu on a NVIDIA RTX 5000 Ada on the latest Version of Proxmox
Yesterday I downloaded the current Version of Proxmox, registered with the subscription key und then updated everything to the latest stat and rebooted.
This brought me here:
Code:
root@pve01:~# pveversion
pve-manager/8.3.4/65224a0f9cd294a3 (running kernel: 6.8.12-8-pve)

Then I followed these instructions given here:

https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE

I used the latest driver that seem suitable to me: NVIDIA-GRID-Linux-KVM-570.124.03-570.124.06-572.60
This is the latest driver of version 18

Every seems to work - but when I run this command:

Code:
lspci -d 10de:
81:00.0 VGA compatible controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)
81:00.1 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)

-> I do not get any virtual functions.

Code:
root@pve01:~# nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03             Driver Version: 570.124.03     CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 5000 Ada Gene...    On  |   00000000:81:00.0 Off |                    0 |
| 30%   27C    P8             25W /  250W |       0MiB /  30712MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

To me this looks like to be expected.

Code:
root@pve01:~# nvidia-smi vgpu
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.03             Driver Version: 570.124.03                |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  NVIDIA RTX 5000 Ada Ge...  | 00000000:81:00.0             |   0%       |
+---------------------------------+------------------------------+------------+

To me this looks like to be expected.

Code:
root@pve01:~# nvidia-smi vgpu -c
GPU 00000000:81:00.0
    NVIDIA RTX5000-Ada-1B
    NVIDIA RTX5000-Ada-2B
    NVIDIA RTX5000-Ada-1Q
    NVIDIA RTX5000-Ada-2Q
    NVIDIA RTX5000-Ada-4Q
    NVIDIA RTX5000-Ada-8Q
    NVIDIA RTX5000-Ada-16Q
    NVIDIA RTX5000-Ada-32Q
    NVIDIA RTX5000-Ada-1A
    NVIDIA RTX5000-Ada-2A
    NVIDIA RTX5000-Ada-4A
    NVIDIA RTX5000-Ada-8A
    NVIDIA RTX5000-Ada-16A
    NVIDIA RTX5000-Ada-32A

To me this looks like to be expected.

Code:
root@pve01:~# nvidia-smi vgpu -q
GPU 00000000:81:00.0
    Active vGPUs                          : 0

To me this looks like to be expected.

According to the proxmox documentation I now should get this:

Code:
# lspci -d 10de:
01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.4 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
01:00.5 3D controller: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)

This is what I get:

Code:
root@pve01:~# lspci -d 10de:
81:00.0 VGA compatible controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)
81:00.1 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)

And here I'm stuck.

To me there seems there is missing some step not mentioned in the proxmox documentation - at least for a RTX 5000 Ada GPU?

Do I need to create the virtual functions on a RTX 5000 Ada GPU somehow?

This seems to be the case - according to the NVIDIA documentation for Version 18 you can find here:

https://docs.nvidia.com/vgpu/latest...ndex.html#creating-vgpu-device-red-hat-el-kvm

If I follow along there I get stuck here:

Code:
root@pve01:~# ls /sys/bus/pci/devices/0000\:81\:00.0/
aer_dev_correctable  broken_parity_status    current_link_speed  driver         i2c-6  iommu_group    max_link_speed  numa_node    reset      resource1        resource3_wc  subsystem_device
aer_dev_fatal         class    current_link_width  driver_override  i2c-7  irq       max_link_width  power    reset_method      resource1_resize  resource5      subsystem_vendor
aer_dev_nonfatal     config    d3cold_allowed        enable         i2c-8  link       modalias       power_state    resource      resource1_wc        revision      uevent
ari_enabled         consistent_dma_mask_bits    device        i2c-10         i2c-9  local_cpulist  msi_bus       remove    resource0      resource3        rom      vendor
boot_vga         consumer:pci:0000:81:00.1    dma_mask_bits        i2c-11         iommu  local_cpus       msi_irqs       rescan    resource0_resize  resource3_resize  subsystem

According to the NVIDIA documentation in the mentioned directory should be entries called virtfnNN like this:

Code:
# ls -l /sys/bus/pci/devices/0000:41:00.0/ | grep virtfn
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn0 -> ../0000:41:00.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn1 -> ../0000:41:00.5
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn10 -> ../0000:41:01.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn11 -> ../0000:41:01.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn12 -> ../0000:41:02.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn13 -> ../0000:41:02.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn14 -> ../0000:41:02.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn15 -> ../0000:41:02.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn16 -> ../0000:41:02.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn17 -> ../0000:41:02.5
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn18 -> ../0000:41:02.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn19 -> ../0000:41:02.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn2 -> ../0000:41:00.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn20 -> ../0000:41:03.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn21 -> ../0000:41:03.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn22 -> ../0000:41:03.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn23 -> ../0000:41:03.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn24 -> ../0000:41:03.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn25 -> ../0000:41:03.5
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn26 -> ../0000:41:03.6
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn27 -> ../0000:41:03.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn28 -> ../0000:41:04.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn29 -> ../0000:41:04.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn3 -> ../0000:41:00.7
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn30 -> ../0000:41:04.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn31 -> ../0000:41:04.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn4 -> ../0000:41:01.0
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn5 -> ../0000:41:01.1
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn6 -> ../0000:41:01.2
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn7 -> ../0000:41:01.3
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn8 -> ../0000:41:01.4
lrwxrwxrwx. 1 root root           0 Jul 16 04:42 virtfn9 -> ../0000:41:01.5

But I do not get any virtfnNN entry here... :(
 
Download the display selector tool here:

https://developer.nvidia.com/displaymodeselector

Yes, I had to register a NVIDIA once more - this time as a developer - to be allowed to download the command line tool

Then uninstall the driver:

./NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm.run --uninstall

Run the tool:

displaymodeselector --gpumode compute --auto

Reinstall the driver:

./NVIDIA-Linux-x86_64-570.124.03-vgpu-kvm.run --dkms

Reboot.

Now the virtual devices appear as in the instructions:

Code:
root@pve01:~# lspci -d 10de: 
81:00.0 3D controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)
81:00.4 3D controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)
81:00.5 3D controller: NVIDIA Corporation AD102GL [RTX 5000 Ada Generation] (rev a1)

:cool:
 
  • Like
Reactions: dcsapak