vGPU with nVIDIA on Kernel 6.8

Hey folks, so for fun, I added an P6000 in the box as well... Now that was not happy lights.
I Acctualy discovered that they did not co-work very well. L40S according to Nvidia should work with 16.1 --> drivers.

But it does not show in 16.5.
Code:
root@PVE-S02:~/NVIDIA-GRID-16.5/Host_Drivers# ./NVIDIA-Linux-x86_64-535.161.05-vgpu-kvm-custom.run --dkms
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 535.161.05........................................................................................................................................................................................................................................................................................
root@PVE-S02:~/NVIDIA-GRID-16.5/Host_Drivers# nvidia-smi
Tue Sep 17 02:30:23 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.05             Driver Version: 535.161.05   CUDA Version: N/A      |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P6000                   Off | 00000000:89:00.0 Off |                  Off |
| 21%   30C    P0              70W / 250W |     54MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                        
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Then, again for fun I re-installed 17.3 just to see what I got, and
Code:
root@PVE-S02:~/NVIDIA-GRID-17.3/Host_Drivers# ./NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run --dkms
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.90.05..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................
root@PVE-S02:~/NVIDIA-GRID-17.3/Host_Drivers# nvidia-smi
Tue Sep 17 02:31:23 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.05              Driver Version: 550.90.05      CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    Off |   00000000:C4:00.0 Off |                    0 |
| N/A   33C    P0            120W /  350W |       0MiB /  46068MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Now, isn't this more fun!

I did also add the polloloco patch on the 17.3, and then I only found the P6000...
 
For people dealing with crashes:
I ran into a similar problem on Linux and called nVIDIA support, found out that the previous driver was conflicting with the new one. Uninstall the driver completely (on Linux with apt purge, not just remove). For Windows make sure you don’t have drivers “accidentally” installed by Windows Update. I also found you have to have a current qemu model for Windows (I selected 8.1).

As far as the question on NUMA, you should have number of sockets equal to those in your server and then QEMU apparently splits your memory into the right one IF there is free memory on that NUMA node. So if you have many guests and they are all using memory on one CPU or are allocating GPUs, your guest may not be allocating memory on the CPU the GPU is attached to despite having apparently enough, which causes all traffic to run over the QuickPath Interconnect.

For Danny: please look at the earlier posts and hook scripts on how to allocate a GPU on the new kernel. It currently does not have a GUI solution.

For multiple displays: most OS can render multiple virtual displays on 1 GPU, RDP can do it. Not sure what you are trying to do, I have systems with multiple vGPU attached (2xA40-48Q) for some high end workloads.
Hi @guruevi , thanks for the heads up. Indeed we went through all the steps again but somewhere we are just missing something. The Python script works and outputs this for example:

/var/lib/vz/snipperts# ./nvidia_allocator.py 1003 get_command 12C
Found available: /sys/bus/pci/devices/0000:c4:01.1
nVIDIA ID, type: 687 : NVIDIA A30-12C
qm set 1003 --hookscript local:snippets/nvidia_allocator.py
qm set 1003 --args "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:c4:01.1 -uuid 3d4294b4-29e5-43ec-a67a-d42bfd0a197b"
qm set 1003 --tags "nvidia-687"

Added this to the VM's config. Booted the device and we still get this;
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:c4:01.1: vfio 0000:c4:01.1: error getting device from group 105: No such device
Verify all devices in group 105 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1

But the device is there... why?
What are we missing?
 
Can you print the output of
Code:
lspci -v

You should get something liek this:
Code:
ca:01.7 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
        Subsystem: NVIDIA Corporation GA102GL [A40]
        Flags: bus master, fast devsel, latency 0, NUMA node 1, IOMMU group 245
        Memory at e82c0000 (32-bit, non-prefetchable) [virtual] [size=256K]
        Memory at 28e580000000 (64-bit, prefetchable) [virtual] [size=2G]
        Memory at 28f016000000 (64-bit, prefetchable) [virtual] [size=32M]
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [7c] MSI-X: Enable- Count=3 Masked-
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia

Make sure the kernel module is nvidia, you can use a few grep to make sure IOMMU group 245 (in my case, your case may have different numbers) doesn't have "other" devices in it (typically a problem on AMD platforms).

Then check this:
Code:
dmesg | grep -e DMAR -e IOMMU
You should have
IOMMU enabled
You may have
Enabled IRQ remapping in x2apic mode
and
Intel(R) Virtualization Technology for Directed I/O

Also print the output of this
Code:
cat /proc/cmdline
You should have
intel_iommu=on iommu=pt in there

For
Code:
dmesg | grep -i nvidia
When starting the GPU, you should have many of these
nvidia 0000:ca:00.4: enabling device (0000 -> 0002)
 
Last edited:
Can you print the output of
Code:
lspci -v

You should get something liek this:
Code:
ca:01.7 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
        Subsystem: NVIDIA Corporation GA102GL [A40]
        Flags: bus master, fast devsel, latency 0, NUMA node 1, IOMMU group 245
        Memory at e82c0000 (32-bit, non-prefetchable) [virtual] [size=256K]
        Memory at 28e580000000 (64-bit, prefetchable) [virtual] [size=2G]
        Memory at 28f016000000 (64-bit, prefetchable) [virtual] [size=32M]
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [7c] MSI-X: Enable- Count=3 Masked-
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia

Make sure the kernel module is nvidia, you can use a few grep to make sure IOMMU group 245 (in my case, your case may have different numbers) doesn't have "other" devices in it (typically a problem on AMD platforms).

Then check this:
Code:
dmesg | grep -e DMAR -e IOMMU
You should have
IOMMU enabled
You may have
Enabled IRQ remapping in x2apic mode
and
Intel(R) Virtualization Technology for Directed I/O

Also print the output of this
Code:
cat /proc/cmdline
You should have
intel_iommu=on iommu=pt in there

For
Code:
dmesg | grep -i nvidia
When starting the GPU, you should have many of these
nvidia 0000:ca:00.4: enabling device (0000 -> 0002)
Thx for the super fast response... maybe I will have a few hairs on my head after all, lol.
Jokes aside, running lspci -v
we get (well find);

44:00.0 3D controller: NVIDIA Corporation GA100GL [A30 PCIe] (rev a1)
Subsystem: NVIDIA Corporation GA100GL [A30 PCIe]
Flags: bus master, fast devsel, latency 0, IRQ 274, IOMMU group 64
Memory at f2000000 (32-bit, non-prefetchable) [size=16M]
Memory at 2f000000000 (64-bit, prefetchable) [size=32G]
Memory at 30010000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] Null
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [c8] MSI-X: Enable+ Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
Capabilities: [d00] Lane Margining at the Receiver <?>
Capabilities: [e00] Data Link Feature <?>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia

Which looks all good.
this is an AMD based system and when we run the command dmesg | grep -e DMAR -e IOMMU we correctly see this;

[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 1.043296] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.045236] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.046595] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.047893] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.051838] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 1.051842] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[ 1.051845] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[ 1.051848] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).

And finally running cat /proc/cmdline we get this as the output as noted by a member here in the thread with the same hardware just he is lucky and has the l40s already (we are waiting for ours)

BOOT_IMAGE=/boot/vmlinuz-6.8.12-2-pve root=/dev/mapper/pve-root ro quiet iommu=pt pcie_acs_override=downstream,multifunction

Then just to share all details, running a status --now nvidia-sriov.service we see a postive result;

nvidia-sriov.service - Enable NVIDIA SR-IOV
Loaded: loaded (/lib/systemd/system/nvidia-sriov.service; enabled; preset: enabled)
Active: inactive (dead) since Mon 2024-09-30 18:44:26 CEST; 7min ago
Process: 14626 ExecStart=/usr/lib/nvidia/sriov-manage -e ALL (code=exited, status=0/SUCCESS)
Process: 14709 ExecStart=/usr/bin/nvidia-smi vgpu -shm 1 (code=exited, status=0/SUCCESS)
Main PID: 14709 (code=exited, status=0/SUCCESS)
CPU: 416ms
Sep 30 18:44:23 godzilla systemd[1]: Starting nvidia-sriov.service - Enable NVIDIA SR-IOV...
Sep 30 18:44:23 godzilla sriov-manage[14630]: Enabling VFs on 0000:44:00.0
Sep 30 18:44:25 godzilla sriov-manage[14676]: Enabling VFs on 0000:c4:00.0
Sep 30 18:44:26 godzilla nvidia-smi[14709]: Unable to enable vGPU heterogeneous mode for GPU 00000000:44:00.0: Not Supported
Sep 30 18:44:26 godzilla nvidia-smi[14709]: Unable to enable vGPU heterogeneous mode for GPU 00000000:C4:00.0: Not Supported
Sep 30 18:44:26 godzilla systemd[1]: nvidia-sriov.service: Deactivated successfully.
Sep 30 18:44:26 godzilla systemd[1]: Finished nvidia-sriov.service - Enable NVIDIA SR-IOV.

I am still troubleshooting the Unable parts as this should be possible considering the GPU type and driver.
 
Thx for the super fast response... maybe I will have a few hairs on my head after all, lol.
Jokes aside, running lspci -v
we get (well find);

44:00.0 3D controller: NVIDIA Corporation GA100GL [A30 PCIe] (rev a1)
Subsystem: NVIDIA Corporation GA100GL [A30 PCIe]
Flags: bus master, fast devsel, latency 0, IRQ 274, IOMMU group 64
Memory at f2000000 (32-bit, non-prefetchable) [size=16M]
Memory at 2f000000000 (64-bit, prefetchable) [size=32G]
Memory at 30010000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] Null
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [c8] MSI-X: Enable+ Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Physical Resizable BAR
Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
Capabilities: [c1c] Physical Layer 16.0 GT/s <?>
Capabilities: [d00] Lane Margining at the Receiver <?>
Capabilities: [e00] Data Link Feature <?>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia

Which looks all good.
this is an AMD based system and when we run the command dmesg | grep -e DMAR -e IOMMU we correctly see this;

[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 1.043296] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.045236] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.046595] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.047893] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 1.051838] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 1.051842] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[ 1.051845] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[ 1.051848] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).

And finally running cat /proc/cmdline we get this as the output as noted by a member here in the thread with the same hardware just he is lucky and has the l40s already (we are waiting for ours)

BOOT_IMAGE=/boot/vmlinuz-6.8.12-2-pve root=/dev/mapper/pve-root ro quiet iommu=pt pcie_acs_override=downstream,multifunction

Then just to share all details, running a status --now nvidia-sriov.service we see a postive result;

nvidia-sriov.service - Enable NVIDIA SR-IOV
Loaded: loaded (/lib/systemd/system/nvidia-sriov.service; enabled; preset: enabled)
Active: inactive (dead) since Mon 2024-09-30 18:44:26 CEST; 7min ago
Process: 14626 ExecStart=/usr/lib/nvidia/sriov-manage -e ALL (code=exited, status=0/SUCCESS)
Process: 14709 ExecStart=/usr/bin/nvidia-smi vgpu -shm 1 (code=exited, status=0/SUCCESS)
Main PID: 14709 (code=exited, status=0/SUCCESS)
CPU: 416ms
Sep 30 18:44:23 godzilla systemd[1]: Starting nvidia-sriov.service - Enable NVIDIA SR-IOV...
Sep 30 18:44:23 godzilla sriov-manage[14630]: Enabling VFs on 0000:44:00.0
Sep 30 18:44:25 godzilla sriov-manage[14676]: Enabling VFs on 0000:c4:00.0
Sep 30 18:44:26 godzilla nvidia-smi[14709]: Unable to enable vGPU heterogeneous mode for GPU 00000000:44:00.0: Not Supported
Sep 30 18:44:26 godzilla nvidia-smi[14709]: Unable to enable vGPU heterogeneous mode for GPU 00000000:C4:00.0: Not Supported
Sep 30 18:44:26 godzilla systemd[1]: nvidia-sriov.service: Deactivated successfully.
Sep 30 18:44:26 godzilla systemd[1]: Finished nvidia-sriov.service - Enable NVIDIA SR-IOV.

I am still troubleshooting the Unable parts as this should be possible considering the GPU type and driver.
If I get it correctly you try to create a profile type "C"...Can you try with a "Q" profile. I think only profiles "A" and "Q" were supported on Linux, but do not have the documentation to verify it right now. One more thing...from the provided info...you seem to have two GPUs in the system with different PCIe BulUS ID...one is C4.00.0 and the other is 44.00.0. Your A30 seems to be on C4.00.0... as "lspci -v" indicates. However in your first post you seem to try to enable vGPU VFs on the card with PCIe ID C4.00.0. I think the system is confused. Maybe remove one card if possible :) and also try with a "Q" profile if possible.
 
Last edited:
You should have amd_iommu=on and iommu=pt in your kernel options no? Because right now it doesn't seem like you have IOMMU enabled, it should explicitly state "IOMMU enabled" in dmesg (the other messages are about capabilities/metrics, not whether they are being used).

In your BIOS/UEFI firmware, you should also have IOMMU, Above 4G decoding, Resizable BAR and ACS enabled and CSM (BIOS emulation) disabled.

As far as "Unable to enable vGPU heterogeneous mode for GPU 00000000:44:00.0: Not Supported" - your GPU may not actually support that (nvidia-smi -q will tell you whether it is an option).

The amd_iommu= option gives you access to various settings:
Code:
amd_iommu=    [HW,X86-64]
            Pass parameters to the AMD IOMMU driver in the system.
            Possible values are:
            fullflush - Deprecated, equivalent to iommu.strict=1
            off      - do not initialize any AMD IOMMU found in
                    the system
            force_isolation - Force device isolation for all
                      devices. The IOMMU driver is not
                      allowed anymore to lift isolation
                      requirements as needed. This option
                      does not override iommu=pt
            force_enable    - Force enable the IOMMU on platforms known
                          to be buggy with IOMMU enabled. Use this
                          option with care.
            pgtbl_v1        - Use v1 page table for DMA-API (Default).
            pgtbl_v2        - Use v2 page table for DMA-API.
            irtcachedis     - Disable Interrupt Remapping Table (IRT) caching.
            nohugepages     - Limit page-sizes used for v1 page-tables
                          to 4 KiB.
            v2_pgsizes_only - Limit page-sizes used for v1 page-tables
                          to 4KiB/2Mib/1GiB.
 
Last edited:
You should have amd_iommu=on and iommu=pt in your kernel options no? Because right now it doesn't seem like you have IOMMU enabled, it should explicitly state "IOMMU enabled" in dmesg (the other messages are about capabilities/metrics, not whether they are being used).

In your BIOS/UEFI firmware, you should also have IOMMU, Above 4G decoding, Resizable BAR and ACS enabled and CSM (BIOS emulation) disabled.

As far as "Unable to enable vGPU heterogeneous mode for GPU 00000000:44:00.0: Not Supported" - your GPU may not actually support that (nvidia-smi -q will tell you whether it is an option).

The amd_iommu= option gives you access to various settings:
Code:
amd_iommu=    [HW,X86-64]
            Pass parameters to the AMD IOMMU driver in the system.
            Possible values are:
            fullflush - Deprecated, equivalent to iommu.strict=1
            off      - do not initialize any AMD IOMMU found in
                    the system
            force_isolation - Force device isolation for all
                      devices. The IOMMU driver is not
                      allowed anymore to lift isolation
                      requirements as needed. This option
                      does not override iommu=pt
            force_enable    - Force enable the IOMMU on platforms known
                          to be buggy with IOMMU enabled. Use this
                          option with care.
            pgtbl_v1        - Use v1 page table for DMA-API (Default).
            pgtbl_v2        - Use v2 page table for DMA-API.
            irtcachedis     - Disable Interrupt Remapping Table (IRT) caching.
            nohugepages     - Limit page-sizes used for v1 page-tables
                          to 4 KiB.
            v2_pgsizes_only - Limit page-sizes used for v1 page-tables
                          to 4KiB/2Mib/1GiB.
had a short break and back at it again.
from my perspective everything should be just fine (but it isnt).

BIOS 100% everything is enabled that is required for an AMD Server Mainbaord.
Fixed up my service that was starting the GPU services and doing a journal one see's that the amd iommu is by default started (believe this was a kernel change where iommu is automatically enabled for AMD based servers).

But here is a dump just to show all should be working;

root@godzilla:~# systemctl status --now nvidia-sriov.service
○ nvidia-sriov.service - Enable NVIDIA SR-IOV
Loaded: loaded (/lib/systemd/system/nvidia-sriov.service; enabled; preset: enabled)
Active: inactive (dead) since Mon 2024-09-30 21:05:25 CEST; 57s ago
Process: 1840 ExecStartPre=/bin/sleep 15 (code=exited, status=0/SUCCESS)
Process: 2443 ExecStart=/usr/lib/nvidia/sriov-manage -e ALL (code=exited, status=0/SUCCESS)
Process: 2529 ExecStart=/usr/bin/nvidia-smi vgpu -shm 1 (code=exited, status=0/SUCCESS)
Main PID: 2529 (code=exited, status=0/SUCCESS)
CPU: 656ms

Sep 30 21:05:07 godzilla systemd[1]: Starting nvidia-sriov.service - Enable NVIDIA SR-IOV...
Sep 30 21:05:22 godzilla sriov-manage[2447]: Enabling VFs on 0000:44:00.0
Sep 30 21:05:23 godzilla sriov-manage[2486]: Enabling VFs on 0000:c4:00.0
Sep 30 21:05:25 godzilla nvidia-smi[2529]: Enabled vGPU heterogeneous mode for GPU 00000000:44:00.0
Sep 30 21:05:25 godzilla nvidia-smi[2529]: Enabled vGPU heterogeneous mode for GPU 00000000:C4:00.0
Sep 30 21:05:25 godzilla systemd[1]: nvidia-sriov.service: Deactivated successfully.
Sep 30 21:05:25 godzilla systemd[1]: Finished nvidia-sriov.service - Enable NVIDIA SR-IOV.
root@godzilla:~# journalctl -b 0 | grep -i iommu
Sep 30 21:05:03 godzilla kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction
Sep 30 21:05:03 godzilla kernel: Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
Sep 30 21:05:03 godzilla kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-2-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction
Sep 30 21:05:03 godzilla kernel: iommu: Default domain type: Passthrough (set via kernel command line)
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:01.0: Adding to iommu group 0
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:01.1: Adding to iommu group 1
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:02.0: Adding to iommu group 2
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:03.0: Adding to iommu group 3
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:03.3: Adding to iommu group 4
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:03.4: Adding to iommu group 5
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:04.0: Adding to iommu group 6
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:05.0: Adding to iommu group 7
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:07.0: Adding to iommu group 8
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:07.1: Adding to iommu group 9
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:08.0: Adding to iommu group 10
Sep 30 21:05:03 godzilla kernel: pci 0000:c0:08.1: Adding to iommu group 11
Sep 30 21:05:03 godzilla kernel: pci 0000:c1:00.0: Adding to iommu group 12
Sep 30 21:05:03 godzilla kernel: pci 0000:c1:00.1: Adding to iommu group 13
Sep 30 21:05:03 godzilla kernel: pci 0000:c2:00.0: Adding to iommu group 14
Sep 30 21:05:03 godzilla kernel: pci 0000:c2:01.0: Adding to iommu group 15
Sep 30 21:05:03 godzilla kernel: pci 0000:c4:00.0: Adding to iommu group 16
Sep 30 21:05:03 godzilla kernel: pci 0000:c7:00.0: Adding to iommu group 17
Sep 30 21:05:03 godzilla kernel: pci 0000:c7:00.2: Adding to iommu group 18
Sep 30 21:05:03 godzilla kernel: pci 0000:c8:00.0: Adding to iommu group 19
Sep 30 21:05:03 godzilla kernel: pci 0000:c8:00.2: Adding to iommu group 20
Sep 30 21:05:03 godzilla kernel: pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
Sep 30 21:05:03 godzilla kernel: pci 0000:80:01.0: Adding to iommu group 21
Sep 30 21:05:03 godzilla kernel: pci 0000:80:01.1: Adding to iommu group 22
Sep 30 21:05:03 godzilla kernel: pci 0000:80:01.2: Adding to iommu group 23
Sep 30 21:05:03 godzilla kernel: pci 0000:80:01.4: Adding to iommu group 24
Sep 30 21:05:03 godzilla kernel: pci 0000:80:01.5: Adding to iommu group 25
Sep 30 21:05:03 godzilla kernel: pci 0000:80:02.0: Adding to iommu group 26
Sep 30 21:05:03 godzilla kernel: pci 0000:80:03.0: Adding to iommu group 27
Sep 30 21:05:03 godzilla kernel: pci 0000:80:03.1: Adding to iommu group 28
Sep 30 21:05:03 godzilla kernel: pci 0000:80:04.0: Adding to iommu group 29
Sep 30 21:05:03 godzilla kernel: pci 0000:80:05.0: Adding to iommu group 30
Sep 30 21:05:03 godzilla kernel: pci 0000:80:07.0: Adding to iommu group 31
Sep 30 21:05:03 godzilla kernel: pci 0000:80:07.1: Adding to iommu group 32
Sep 30 21:05:03 godzilla kernel: pci 0000:80:08.0: Adding to iommu group 33
Sep 30 21:05:03 godzilla kernel: pci 0000:80:08.1: Adding to iommu group 34
Sep 30 21:05:03 godzilla kernel: pci 0000:80:08.2: Adding to iommu group 35
Sep 30 21:05:03 godzilla kernel: pci 0000:81:00.0: Adding to iommu group 36
Sep 30 21:05:03 godzilla kernel: pci 0000:82:00.0: Adding to iommu group 37
Sep 30 21:05:03 godzilla kernel: pci 0000:82:00.1: Adding to iommu group 38
Sep 30 21:05:03 godzilla kernel: pci 0000:83:00.0: Adding to iommu group 39
Sep 30 21:05:03 godzilla kernel: pci 0000:84:00.0: Adding to iommu group 39
Sep 30 21:05:03 godzilla kernel: pci 0000:85:00.0: Adding to iommu group 40
Sep 30 21:05:03 godzilla kernel: pci 0000:86:00.0: Adding to iommu group 41
Sep 30 21:05:03 godzilla kernel: pci 0000:86:00.1: Adding to iommu group 42
Sep 30 21:05:03 godzilla kernel: pci 0000:87:00.0: Adding to iommu group 43
Sep 30 21:05:03 godzilla kernel: pci 0000:87:01.0: Adding to iommu group 44
Sep 30 21:05:03 godzilla kernel: pci 0000:8a:00.0: Adding to iommu group 45
Sep 30 21:05:03 godzilla kernel: pci 0000:8a:00.2: Adding to iommu group 46
Sep 30 21:05:03 godzilla kernel: pci 0000:8b:00.0: Adding to iommu group 47
Sep 30 21:05:03 godzilla kernel: pci 0000:8b:00.2: Adding to iommu group 48
Sep 30 21:05:03 godzilla kernel: pci 0000:8c:00.0: Adding to iommu group 49
Sep 30 21:05:03 godzilla kernel: pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
Sep 30 21:05:03 godzilla kernel: pci 0000:40:01.0: Adding to iommu group 50
Sep 30 21:05:03 godzilla kernel: pci 0000:40:01.1: Adding to iommu group 51
Sep 30 21:05:03 godzilla kernel: pci 0000:40:02.0: Adding to iommu group 52
Sep 30 21:05:03 godzilla kernel: pci 0000:40:03.0: Adding to iommu group 53
Sep 30 21:05:03 godzilla kernel: pci 0000:40:04.0: Adding to iommu group 54
Sep 30 21:05:03 godzilla kernel: pci 0000:40:05.0: Adding to iommu group 55
Sep 30 21:05:03 godzilla kernel: pci 0000:40:07.0: Adding to iommu group 56
Sep 30 21:05:03 godzilla kernel: pci 0000:40:07.1: Adding to iommu group 57
Sep 30 21:05:03 godzilla kernel: pci 0000:40:08.0: Adding to iommu group 58
Sep 30 21:05:03 godzilla kernel: pci 0000:40:08.1: Adding to iommu group 59
Sep 30 21:05:03 godzilla kernel: pci 0000:41:00.0: Adding to iommu group 60
Sep 30 21:05:03 godzilla kernel: pci 0000:41:00.1: Adding to iommu group 61
Sep 30 21:05:03 godzilla kernel: pci 0000:42:00.0: Adding to iommu group 62
Sep 30 21:05:03 godzilla kernel: pci 0000:42:01.0: Adding to iommu group 63
Sep 30 21:05:03 godzilla kernel: pci 0000:44:00.0: Adding to iommu group 64
Sep 30 21:05:03 godzilla kernel: pci 0000:45:00.0: Adding to iommu group 65
Sep 30 21:05:03 godzilla kernel: pci 0000:45:00.2: Adding to iommu group 66
Sep 30 21:05:03 godzilla kernel: pci 0000:46:00.0: Adding to iommu group 67
Sep 30 21:05:03 godzilla kernel: pci 0000:46:00.1: Adding to iommu group 68
Sep 30 21:05:03 godzilla kernel: pci 0000:46:00.2: Adding to iommu group 69
Sep 30 21:05:03 godzilla kernel: pci 0000:46:00.3: Adding to iommu group 70
Sep 30 21:05:03 godzilla kernel: pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
Sep 30 21:05:03 godzilla kernel: pci 0000:00:01.0: Adding to iommu group 71
Sep 30 21:05:03 godzilla kernel: pci 0000:00:02.0: Adding to iommu group 72
Sep 30 21:05:03 godzilla kernel: pci 0000:00:03.0: Adding to iommu group 73
Sep 30 21:05:03 godzilla kernel: pci 0000:00:03.1: Adding to iommu group 74
Sep 30 21:05:03 godzilla kernel: pci 0000:00:04.0: Adding to iommu group 75
Sep 30 21:05:03 godzilla kernel: pci 0000:00:05.0: Adding to iommu group 76
Sep 30 21:05:03 godzilla kernel: pci 0000:00:07.0: Adding to iommu group 77
Sep 30 21:05:03 godzilla kernel: pci 0000:00:07.1: Adding to iommu group 78
Sep 30 21:05:03 godzilla kernel: pci 0000:00:08.0: Adding to iommu group 79
Sep 30 21:05:03 godzilla kernel: pci 0000:00:08.1: Adding to iommu group 80
Sep 30 21:05:03 godzilla kernel: pci 0000:00:14.0: Adding to iommu group 81
Sep 30 21:05:03 godzilla kernel: pci 0000:00:14.3: Adding to iommu group 81
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.0: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.1: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.2: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.3: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.4: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.5: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.6: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:00:18.7: Adding to iommu group 82
Sep 30 21:05:03 godzilla kernel: pci 0000:01:00.0: Adding to iommu group 83
Sep 30 21:05:03 godzilla kernel: pci 0000:01:00.1: Adding to iommu group 84
Sep 30 21:05:03 godzilla kernel: pci 0000:02:00.0: Adding to iommu group 85
Sep 30 21:05:03 godzilla kernel: pci 0000:02:01.0: Adding to iommu group 86
Sep 30 21:05:03 godzilla kernel: pci 0000:05:00.0: Adding to iommu group 87
Sep 30 21:05:03 godzilla kernel: pci 0000:05:00.2: Adding to iommu group 88
Sep 30 21:05:03 godzilla kernel: pci 0000:06:00.0: Adding to iommu group 89
Sep 30 21:05:03 godzilla kernel: pci 0000:06:00.2: Adding to iommu group 90
Sep 30 21:05:03 godzilla kernel: pci 0000:06:00.3: Adding to iommu group 91
Sep 30 21:05:03 godzilla kernel: perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
Sep 30 21:05:03 godzilla kernel: perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
Sep 30 21:05:03 godzilla kernel: perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
Sep 30 21:05:03 godzilla kernel: perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
Sep 30 21:05:22 godzilla kernel: pci 0000:44:00.4: Adding to iommu group 92
Sep 30 21:05:22 godzilla kernel: pci 0000:44:00.5: Adding to iommu group 93
Sep 30 21:05:22 godzilla kernel: pci 0000:44:00.6: Adding to iommu group 94
Sep 30 21:05:22 godzilla kernel: pci 0000:44:00.7: Adding to iommu group 95
Sep 30 21:05:22 godzilla kernel: pci 0000:44:01.0: Adding to iommu group 96
Sep 30 21:05:22 godzilla kernel: pci 0000:44:01.1: Adding to iommu group 97
Sep 30 21:05:22 godzilla kernel: pci 0000:44:01.2: Adding to iommu group 98
Sep 30 21:05:22 godzilla kernel: pci 0000:44:01.3: Adding to iommu group 99
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:00.4: Adding to iommu group 100
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:00.5: Adding to iommu group 101
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:00.6: Adding to iommu group 102
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:00.7: Adding to iommu group 103
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:01.0: Adding to iommu group 104
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:01.1: Adding to iommu group 105
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:01.2: Adding to iommu group 106
Sep 30 21:05:24 godzilla kernel: pci 0000:c4:01.3: Adding to iommu group 107
 
The weird thing is these messages start counting from 0 - your existing hardware (SAS, network, USB controllers etc) should already be in an IOMMU group if IOMMU is enabled, there should not be an IOMMU group 0. In my case by the time the nVIDIA driver loads, I am already at IOMMU group 234 with 0 being the PCI bridge and then everything down from there.

Just as a hint, the /usr/lib/nvidia/sriov-manage you're calling is just a bash script, you can see in there what it is doing as it loops through the devices, it doesn't actually 'check' whether the devices are functional as it creates stubs for them, only when you start the device does it 'know' whether a device is actually 'there'.

Without loading the drivers, do ls -l /sys/kernel/iommu_groups after a reboot, if it is empty, IOMMU is not functional.
 
Last edited:
I want to run rtx2080 vGPU with kernel 6.8.12-2-pve , with which version of the driver ?
Now i'm runing with kernel 6.5.13-6-pve and everything work fine
 
Last edited:
The weird thing is these messages start counting from 0 - your existing hardware (SAS, network, USB controllers etc) should already be in an IOMMU group if IOMMU is enabled, there should not be an IOMMU group 0. In my case by the time the nVIDIA driver loads, I am already at IOMMU group 234 with 0 being the PCI bridge and then everything down from there.

Just as a hint, the /usr/lib/nvidia/sriov-manage you're calling is just a bash script, you can see in there what it is doing as it loops through the devices, it doesn't actually 'check' whether the devices are functional as it creates stubs for them, only when you start the device does it 'know' whether a device is actually 'there'.

Without loading the drivers, do ls -l /sys/kernel/iommu_groups after a reboot, if it is empty, IOMMU is not functional.
Finally got everything up and running.
A massive thank you for your support and in the next day I will write together what I did to get this working.

However I am still at one blockage that I am not sure how to resolve.
If I have 4 GPUs within a node and I want to assign them to a VM for short period of time how is this possible with the Phython script as it seem to only assign 1 GPU.
 
My Python script currently loops through all the available GPU and picks the 'first' available VF. Once you start the VM, the VF is 'consumed', the script will suggest the next one, once the first GPU is "filled" (no longer has a usable VF) it should suggest the next one. If you want to pass through an entire GPU, either use a VF with the complete VRAM (which is what we do for simplicity sake) or pass it through, once the VM has started, it won't be available anymore. My script was made with the assumption you are provisioning your VMs one-by-one.

I should really write a little GUI thingy, but haven't had time yet.
 
Last edited:
  • Like
Reactions: just-danny
My Python script currently loops through all the available GPU and picks the 'first' available VF. Once you start the VM, the VF is 'consumed', the script will suggest the next one, once the first GPU is "filled" (no longer has a usable VF) it should suggest the next one. If you want to pass through an entire GPU, either use a VF with the complete VRAM (which is what we do for simplicity sake) or pass it through, once the VM has started, it won't be available anymore. My script was made with the assumption you are provisioning your VMs one-by-one.

I should really write a little GUI thingy, but haven't had time yet.
Thanks for the explanation and I do understand and truly love this script.
However if I may suggest one improvement
Example I have 4 GPUs, 4 Physical GPUs. For a short time I would like to add all 4 to one machine.
Yes your script correctly outputs the details but it is not written to the vm's conf file, it always only writes 1 GPU.
Why? Well qm (at least I havent found the correct syn. doesnt amend the args: line.
Hence you need to manually do this similar to this;

Output from script 1:
/var/lib/vz/snippets/nvidia_allocator.py 1002 get_command -24C
Found available: /sys/bus/pci/devices/0000:44:01.3
nVIDIA ID, type: 688 : NVIDIA A30-24C
qm set 1002 --hookscript local:snippets/nvidia_allocator.py
qm set 1002 --args "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:44:01.3 -uuid e0288ba3-bcbf-44f5-b199-753781dbc6ed"
qm set 1002 --tags "nvidia-688"

Output from script 2:
/var/lib/vz/snippets/nvidia_allocator.py 1002 get_command -24C
Found available: /sys/bus/pci/devices/0000:89:00.6
nVIDIA ID, type: 688 : NVIDIA A30-24C
qm set 1002 --hookscript local:snippets/nvidia_allocator.py
qm set 1002 --args "-device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:89:00.6 -uuid e0288ba3-bcbf-44f5-b199-753781dbc6ed"
qm set 1002 --tags "nvidia-688"

Manual change to conf:
agent: 1
args: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:89:00.6 -uuid e0288ba3-bcbf-44f5-b199-753781dbc6ed -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:44:01.3 -uuid e0288ba3-bcbf-44f5-b199-753781dbc6ed
boot: order=scsi0;ide2
cores: 6
cpu: x86-64-v2-AES
hookscript: local:snippets/nvidia_allocator.py
ide2: none,media=cdrom
memory: 32768
meta: creation-qemu=9.0.2,ctime=1725389188
name: vSrv-qdia1002
net0: virtio=BC:24:11:1D:C9:D6,bridge=vmbr0,tag=1002
numa: 0
ostype: l26
scsi0: Monster-ceph-vm-storage:vm-1002-disk-0,cache=writeback,iothread=1,size=500G
scsihw: virtio-scsi-single
smbios1: uuid=e0288ba3-bcbf-44f5-b199-753781dbc6ed
sockets: 4
tags: nvidia-688
vmgenid: 73724e37-07f1-49ff-8667-ee15e3a07077

When the system boots in this example 2 GPU's are shown.

Still just love the script and truly truly appreciate it, many thanks again!
 
The real issue is that until the VM has started there is no way of knowing whether a GPU has been assigned, so it can’t know whether a particular GPU will be available, so it is difficult to suggest 2 GPU that will work in any particular configuration.

The hookscript also can’t modify the configuration once the VM has entered pre-start. There is a patch coming that may be able to handle less fixed assignments.
 
@guruevi how have you found performance of your cards in vGPU mode relative to full passthrough / bare metal? I just finished a temporary project where I had 20x 6000 Ada spun up on 10 Hosts (all rented hardware). The initial idea was pure passthrough, but I had a ton of problems with drivers (consumer 4090's on a sister setup worked fine) and ended up using your scripts and the knowledge from this thread to fire up a vGPU workflow. Everything worked, and the machines ran, but we had issues with rendering that we didn't have on our consumer hosts. I couldn't fully test what was going on, time wasn't on my side, but would be interesting to know other people's thoughts.

PCI bandwidth was one thing I couldn't get to the bottom off, it was hard to tell what the actual bandwidth was, gpu-z always read 1 lane of PCIE!

What have your results been (are you rendering, workstations, AI or just compute?)
 
I've updated the nVIDIA Allocator with a simple Curses GUI
https://github.com/guruevi/proxmox_gpu_allocator

As far as performance, we don't really have 'bare metal' to compare it with, but just using a quick TensorFlow benchmark, because our JupyterLab runs on Python 3.12 and Proxmox on 3.11 (and I'm not breaking our Proxmox installs), our VM with a 6G vGPU assigned is faster by ~1-2s (12s on the bare metal vs 10 seconds in Jupyter) training a small model.

I think it largely depends on what you're doing for desktops comparing bare metal is kind of pointless, because you probably don't have workstations with L40S, and you're probably using vGPU to split the GPU up. In some Windows benchmarks, we are achieving 350,000 OpenCL "points" in Geekbench on the L40S and 200k points on the A40 (that is with a huge 48Q) and you have to squeeze pretty hard (6Q) to get in the lower 300k range (provided there is no competition for the resources), which seems comparable to the native scores they are publishing, whatever that score means, in comparison, a RTX4090 gaming card gets ~300k.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!