Multi GPU Passthrough - 4G decoding error?

p.lakis · Dec 6, 2018

I can pass through 4 individual Tesla P100's to 4 VMs but when combining to pass through any number above 1 i get the following error when running - dmesg | grep NVRM

1 of four works, but any amount over 1 the below output is produced.

Code:

admin@gpu-host:~$ dmesg | grep NVRM
[    4.550588] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:02:00.0)
[    4.550589] NVRM: The system BIOS may have misconfigured your GPU.
[    4.550843] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:03:00.0)
[    4.550844] NVRM: The system BIOS may have misconfigured your GPU.
[    4.551092] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:04:00.0)
[    4.551093] NVRM: The system BIOS may have misconfigured your GPU.
[    4.551108] NVRM: The NVIDIA probe routine failed for 3 device(s).
[    4.551109] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  410.78  Sat Nov 10 22:09:04 CST 2018 (using threaded interrupts)

I have seen this on Systems that cant decode above 4G,
My VMID.conf is attached.

Code:

agent: 1
bios: ovmf
bootdisk: scsi0
cores: 12
cpu: host
efidisk0: vm1028gq:vm-122-disk-1,size=128K
hostpci0: 05:00,pcie=1,x-vga=on
hostpci1: 06:00,pcie=1,x-vga=on
hostpci2: 84:00,pcie=1,x-vga=on
hostpci3: 85:00,pcie=1,x-vga=on
hugepages: 2
ide2: iso:iso/ubuntu-16.04.4-desktop-amd64.iso,media=cdrom
machine: q35
memory: 131072
name: U16.04-Tensor-Box
net0: virtio=DE:FC:7F:0B:27:04,bridge=vmbr1
numa: 1
ostype: l26
scsi0: vm1028gq:vm-122-disk-2,cache=writethrough,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=65d62e28-3f97-430b-be89-68567bc9fc2b
sockets: 2
args: -machine pc,max-ram-below-4g=4G

I have tried: 1G, 2G and 4G. All return the same NVRM errors, Is there something im missing?

.
Thanks

dcsapak · Dec 6, 2018

i guess your 'args' line has no effect, since you specify q35 (so the machine gets overwritten by us again)
there is an open bug for this https://bugzilla.proxmox.com/show_bug.cgi?id=1267

you can try to get the qemu-commandline by 'qm showcmd ID --pretty' and
add the ',max-ram-below-4g=X' to the "-machine 'type=q35'" part and execute that by hand, until the bug i mentioned is fixed
alternatively, did you try without q35? (ofc you have to remove pcie=1 and x-vga=on then)

p.lakis · Dec 7, 2018

In my VM config i added the following:

Code:

machine: q35,max-ram-below-4g=1G

Which in return gives the following QEMU config:

Code:

# qm showcmd 152 --pretty
/usr/bin/kvm \
  -id 152 \
  -name Base-Window10 \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/152.qmp,server,nowait' \
  -mon 'chardev=qmp,mode=control' \
  -pidfile /var/run/qemu-server/152.pid \
  -daemonize \
  -smbios 'type=1,uuid=c5b75794-9838-4a19-91c2-0ca588bcd49b' \
  -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' \
  -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/zvol/vm1028pool/vm-152-disk-2' \
  -smp '16,sockets=2,cores=8,maxcpus=16' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vga none \
  -nographic \
  -no-hpet \
  -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,kvm=off' \
  -m 64512 \
  -object 'memory-backend-ram,id=ram-node0,size=32256M' \
  -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' \
  -object 'memory-backend-ram,id=ram-node1,size=32256M' \
  -numa 'node,nodeid=1,cpus=8-15,memdev=ram-node1' \
  -readconfig /usr/share/qemu-server/pve-q35.cfg \
  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
  -device 'vfio-pci,host=05:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' \
  -device 'vfio-pci,host=06:00.0,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' \
  -chardev 'socket,path=/var/run/qemu-server/152.qga,server,nowait,id=qga0' \
  -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
  -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:6b9a7558eb39' \
  -drive 'file=/mnt/pve/iso/template/iso/virtio-win-0.1.149.iso,if=none,id=drive-ide0,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' \
  -drive 'file=/mnt/pve/iso/template/iso/SW_DVD9_Win_Pro_Ent_Edu_N_10_1803_64BIT_English_-4_MLF_X21-87129.ISO,if=none,id=drive-ide2,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=201' \
  -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
  -drive 'file=/dev/zvol/vm1028pool/vm-152-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' \
  -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap152i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' \
  -device 'e1000,mac=DE:D6:56:85:F1:53,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
  -rtc 'driftfix=slew,base=localtime' \
  -machine 'type=q35,max-ram-below-4g=1G' \
  -global 'kvm-pit.lost_tick_policy=discard'

Still no luck on getting the resources to the VM for the GPUs,

Is there a way in QEMU to define the following?

Above 4G Decoding = Enabled
MMIOH Base = 256G
MMIO High Size = 128G

This is how the BIOS is set up on the host system, If we could pass these over to QEMU then im confident we could get it to work?
Im struggling to find any documentation on MMIOH and QEMU.

EDIT:
I want to add that this issue is only on these TESLA chips, Prior testing on a GeForce system allowed all 4 cards to be pass through without issue.

Thanks

dcsapak · Dec 7, 2018

does it work when you pass the 4 cards to 4 different vms?

p.lakis said:
[ 4.550589] NVRM: The system BIOS may have misconfigured your GPU.

since this line appears on the host, maybe the host bios is really configuring the system not correctly...

p.lakis · Dec 8, 2018

Yes i can create 4 VMs with a single GPU each running concurrently.

That line appears inside the VM, Which is part of the OVMF bios.
If we could emulate the MMIO area in QEMU it could overcome this

dumdum · May 15, 2019

Did you manage to find any solution for pass through 4x Telsa on OVMF bios?
Based on nvidia's KB, seems like we need to very specific with the BIOS setting.
h t t p s ://nvidia.custhelp.com/app/answers/detail/a_id/4119/~/incorrect-bios-settings-on-a-server-when-used-with-a-hypervisor-can-cause-mmio

lclements0 · Oct 12, 2021

I registered here specifically to provide future users with a resolution for this problem that seemed to escape me on some simple searches.

The issue when passing multiple GPU's towards a Q35 KVM machine with pcipassthrough using OVMF appears to be a lack of addressable PCI memory space. Adding pci=realloc to your grub boot cmd can assist with reallocating 32-bit memory and bring the second device "up", but any attempts to actually use the device fail as the memory allocation of the 64-bit bus fails.

I was able to resolve this with a -global flag sent to qm set.

Bash:

qm set VMID -args '-global q35-pcihost.pci-hole64-size=2048G'

It appears the default hole size, which from what I can find is 1024G, is insufficient to succesfully address 2 NVIDIA cards being used in this way. In my specific instance, this was a Tesla K80 that is actually a single card but presents itself as two GPU's with two seperate PCI addresses. Your mileage may vary.

esologic · Oct 19, 2021

lclements0 said:
I registered here specifically to provide future users with a resolution for this problem that seemed to escape me on some simple searches.

The issue when passing multiple GPU's towards a Q35 KVM machine with pcipassthrough using OVMF appears to be a lack of addressable PCI memory space. Adding pci=realloc to your grub boot cmd can assist with reallocating 32-bit memory and bring the second device "up", but any attempts to actually use the device fail as the memory allocation of the 64-bit bus fails.

I was able to resolve this with a -global flag sent to qm set.

Bash:

qm set VMID -args '-global q35-pcihost.pci-hole64-size=2048G'

It appears the default hole size, which from what I can find is 1024G, is insufficient to succesfully address 2 NVIDIA cards being used in this way. In my specific instance, this was a Tesla K80 that is actually a single card but presents itself as two GPU's with two seperate PCI addresses. Your mileage may vary.

Would you mind detailing the complete set of configuration changes you had to make to your system to have 2x Tesla K80s working with proxmox? I am banging my head against the wall trying to solve this problem. Thanks.

lclements0 · Mar 14, 2022

esologic said:
Would you mind detailing the complete set of configuration changes you had to make to your system to have 2x Tesla K80s working with proxmox? I am banging my head against the wall trying to solve this problem. Thanks.

Apologies for the delay here, I missed this one.

I had to make two changes here, the first was to modify /etc/default/grub:

Code:

GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap pci=realloc rd.driver.blacklist=nouveau"

The second was, as described above,

Code:

qm set VMID -args '-global q35-pcihost.pci-hole64-size=2048G'

Hopefully that's helpful.

Search

Search

Multi GPU Passthrough - 4G decoding error?

p.lakis

New Member

dcsapak

Proxmox Staff Member

p.lakis

New Member

dcsapak

Proxmox Staff Member

p.lakis

New Member

dumdum

New Member

lclements0

Member

esologic

New Member

lclements0

Member

We value your privacy