How to enable DMA for passthrough GPUs?

Kristen Royle

New Member
May 21, 2022
3
0
1
I have a 8-GPU server running Proxmox. The GPUs themselves work great using:

Code:
qm set 315 --hostpci0 4f:00,pcie=1
qm set 315 --hostpci1 52:00,pcie=1
qm set 315 --hostpci2 56:00,pcie=1
qm set 315 --hostpci3 57:00,pcie=1
qm set 315 --hostpci4 ce:00,pcie=1
qm set 315 --hostpci5 d1:00,pcie=1
qm set 315 --hostpci6 d5:00,pcie=1
qm set 315 --hostpci7 d6:00,pcie=1

But running nvidia-smi topo -m on a VM gives this:

Code:
    GPU0        GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_0  CPU Affinity    NUMA Affinity
GPU0     X      NV4     PHB     PHB     PHB     PHB     PHB     PHB     PHB     0-31            N/A
GPU1    NV4      X      PHB     PHB     PHB     PHB     PHB     PHB     PHB     0-31            N/A
GPU2    PHB     PHB      X      NV4     PHB     PHB     PHB     PHB     PHB     0-31            N/A
GPU3    PHB     PHB     NV4      X      PHB     PHB     PHB     PHB     PHB     0-31            N/A
GPU4    PHB     PHB     PHB     PHB      X      NV4     PHB     PHB     PHB     0-31            N/A
GPU5    PHB     PHB     PHB     PHB     NV4      X      PHB     PHB     PHB     0-31            N/A
GPU6    PHB     PHB     PHB     PHB     PHB     PHB      X      NV4     PHB     0-31            N/
GPU7    PHB     PHB     PHB     PHB     PHB     PHB     NV4      X      PHB     0-31            N/A
mlx5_0  PHB     PHB     PHB     PHB     PHB     PHB     PHB     PHB      X

Where PHB means "Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)", which means the data goes through the CPU, not directly using the PCI switch. Communication speed confirms this. I'd like to see "PIX = Connection traversing at most a single PCIe bridge".

It's not clear how to do this, create a VFIO group and put all the devices there? But what's next, how to tell Proxmox to pass the entire group instead of individual GPUs?

Or maybe it's too much for Proxmox and I need to use KVM directly?

How do I tell Proxmox to dump /usr/bin/kvm calls to stdout or stderr when I run "qm start"?

Kristen
 
what is your hardware setup? is there a host bridge for each pci slot maybe? the hardware assingment in pve is rather fixed, but you can use the 'args' parameter in the vm config to give arbitrary qemu parameters (if you know what you're doing)
also maybe setting the iommu to passthrough (if not already) could help: just add 'iommu=pt' to the kernel commandline and reboot
 
what is your hardware setup? is there a host bridge for each pci slot maybe? the hardware assingment in pve is rather fixed, but you can use the 'args' parameter in the vm config to give arbitrary qemu parameters (if you know what you're doing)
also maybe setting the iommu to passthrough (if not already) could help: just add 'iommu=pt' to the kernel commandline and reboot

I actually found the solution. Now I have the next problem, p2p works but it's not fast enough :D

Here is what you need to do:

Code:
-device 'vfio-pci,host=0000:4f:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on,x-nv-gpudirect-clique=1'

To see the command starting the VM use qm showcmd <vmid>.

This flag "x-nv-gpudirect-clique" is only mentioned on the internet 5 times or so, but that's the right answer. And I didn't find a way to add this flag via qm set yet.
 
I actually found the solution. Now I have the next problem, p2p works but it's not fast enough :D
good find ;)

no there is no way to set it with qm set on the hostpci devices, but you can use the 'args' parameter
Code:
qm set ID -args ' -device ....'
you have to remove the hostpci entry then, also pve has no idea about the passthrough then, so some assumptions don't hold anymore

for the speed issue, did you try my suggestions with iommu passthrough ? that could maybe help ?
 
for the speed issue, did you try my suggestions with iommu passthrough ? that could maybe help ?

I found that iommu=pt and pcie_acs_override=downstream have no effect.

Maybe I have a different problem that I need to fix first to see an effect. Still exploring ...

I'm very open to more ideas to try. I'm thinking about reconstructing a PCI switch inside VM, maybe that will help somehow. I seems I need something to make address translation easier.

remove the hostpci
Yes, good idea, thanks.
 
after the above parameter changes also, still i see the nvivida-smi topo -m for the NIC shows PHB and not SYS or PIX .
Any other suggestions ?

1727932791768.png
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!