PROXMOX VM PCIE Quantity

Fu Lei

Active Member
Feb 23, 2019
9
0
41
35
Hello everyone:
I have a question that you need to answer,
proxmox 5.4 My virtual machine can only configure these 4 pcie gpu devices,
I want to ask if there is a problem.
 
There is not much information to go on. But depending on CPU type, the PCIe lanes might be an issue (4x16=64).
 
Thanks Reply
I uploaded the cpu information of the server and the image of the virtual machine that cannot add pcie.
Please analyze again

Didn't other friends meet?

屏幕快照 2019-08-28 上午11.20.57.png

屏幕快照 2019-08-28 上午11.21.18.png
 
proxmox 5.4 My virtual machine can only configure these 4 pcie gpu devices,

Yes, currently only 4 "hostpci" devices can be added to a VM by our Design. I'm curious, why do you need more?

We could probably increase it, would just be great to have a good reason. :)
 
Thank,

Proxmox is mainly used as a GPU server. There are 8 graphics cards in the host. I want to add it to the same virtual machine to enhance the GPU computing performance of the virtual machine.
 
I'd say allowing something like 10-16 PCIe per default would seems as a good limit. We cannot provide unlimited as we need to prereserve the PCIe addresses to ensure hotplug is possible and that they stay stable.
 
Proxmox VE 6.0-7
4x GTX 1080 Ti
4x RTX 2080 Ti

I have succeeded passing through up to 4 GPUs to a VM (Ubuntu Server 18.04.3), but trying to pass through a 5th GPU results in network problems.

I replaced the files /usr/share/perl5/PVE/QemuServer/PCI.pm and /usr/share/perl5/PVE/QemuServer.pm of my Proxmox installation using the corresponding files form the Git repository (master @ 6cb7b041cec6220b1b105f2b2f22a38216f7e110).

Initially the VM does not start:
Code:
TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -name test -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/101.pid -daemonize -smbios 'type=1,uuid=4308a1ec-7f0e-413d-8d68-44c94a1704b3' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/zvol/rpool/data/vm-101-disk-1' -smp '16,sockets=1,cores=16,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'host,-md-clear,-pcid,-spec-ctrl,-ibpb,-virt-ssbd,-amd-ssbd,-amd-no-ssb,+pdpe1gb,-hv-tlbflush,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off' -m 98304 -device 'vmgenid,guid=9391eca2-6181-4ca3-be2c-97588e840eb9' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=1b:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1b:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=1c:00.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1c:00.1,id=hostpci1.1,bus=ich9-pcie-port-2,addr=0x0.1' -device 'vfio-pci,host=1d:00.0,id=hostpci2.0,bus=ich9-pcie-port-3,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1d:00.1,id=hostpci2.1,bus=ich9-pcie-port-3,addr=0x0.1' -device 'vfio-pci,host=1e:00.0,id=hostpci3.0,bus=ich9-pcie-port-4,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1e:00.1,id=hostpci3.1,bus=ich9-pcie-port-4,addr=0x0.1' -device 'pcie-root-port,id=ich9-pcie-port-5,addr=10.0,x-speed=16,x-width=32,multifunction=on,bus=pcie.0,port=5,chassis=5' -device 'vfio-pci,host=3d:00.0,id=hostpci4.0,bus=ich9-pcie-port-5,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=3d:00.1,id=hostpci4.1,bus=ich9-pcie-port-5,addr=0x0.1' -device 'vfio-pci,host=3d:00.2,id=hostpci4.2,bus=ich9-pcie-port-5,addr=0x0.2' -device 'vfio-pci,host=3d:00.3,id=hostpci4.3,bus=ich9-pcie-port-5,addr=0x0.3' -chardev 'socket,path=/var/run/qemu-server/101.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:6675b9c82fb6' -drive 'file=/var/lib/vz/template/iso/ubuntu-18.04.3-live-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-101-disk-0,if=none,id=drive-scsi0,cache=directsync,format=raw,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=0A:04:96:2A:99:E5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=q35'' failed: got timeout


After rebooting Proxmox and trying to start the VM again, its status in the Proxmox web VM summary view is (and stays) "running".

Unfortunately I can't reach the VM anymore using SSH. Pinging the VM from the Proxmox server also no longer works. As I can't use the Proxmox web VM console (noVNC) anymore (due to the passthrough), I do not know whether the actual passthrough succeeded or failed.

Passing through the following combinations works, so the GPU type itself is not the cause of the problem:
  • 4x GTX 1080 Ti
  • 4x RTX 2080 Ti
  • 1x GTX 1080 Ti + 3x RTX 2080 Ti

Could you please help me setting up a VM with 8 GPUs?


VM config:
Code:
agent: 1
bios: ovmf
bootdisk: scsi0
cores: 16
cpu: host,flags=-md-clear;-pcid;-spec-ctrl;-ibpb;-virt-ssbd;-amd-ssbd;-amd-no-ssb;+pdpe1gb;-hv-tlbflush
efidisk0: local-zfs:vm-101-disk-1,size=128K
ide2: local:iso/ubuntu-18.04.3-live-server-amd64.iso,media=cdrom
machine: q35
memory: 98304
name: test
net0: virtio=0A:04:96:2A:99:E5,bridge=vmbr1
numa: 0
ostype: l26
scsi0: local-zfs:vm-101-disk-0,cache=directsync,size=1T
scsihw: virtio-scsi-pci
smbios1: uuid=4308a1ec-7f0e-413d-8d68-44c94a1704b3
sockets: 1
vmgenid: 9391eca2-6181-4ca3-be2c-97588e840eb9
hostpci0: 1b:00,pcie=1,x-vga=1
hostpci1: 1c:00,pcie=1
hostpci2: 1d:00,pcie=1
hostpci3: 1e:00,pcie=1
hostpci4: 3d:00,pcie=1


Proxmox server /etc/network/interfaces
Code:
auto lo

iface lo inet loopback

iface eno0 inet manual

iface ens4f1 inet manual

auto vmbr0
iface vmbr0 inet static
        address  <ip1>
        netmask  255.255.255.0
        gateway  <ip2>
        bridge-ports eno0
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet static
        address  192.168.0.1
        netmask  24
        bridge-ports none
        bridge-stp off
        bridge-fd 0
        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up iptables -t nat -A POSTROUTING -s '192.168.0.0/24' -o vmbr0 -j MASQUERADE
        post-down iptables -t nat -D POSTROUTING -s '192.168.0.0/24' -o vmbr0 -j MASQUERADE
        post-up iptables -t nat -A PREROUTING -i vmbr0 -p tcp --dport 1000 -j DNAT --to 192.168.0.2:22
        post-down iptables -t nat -D PREROUTING -i vmbr0 -p tcp --dport 1000 -j DNAT --to 192.168.0.2:22
 
Last edited:
I have succeeded passing through up to 4 GPUs to a VM (Ubuntu Server 18.04.3), but trying to pass through a 5th GPU results in network problems.
Check the IOMMU Groups column. Your problem sounds a lot like the one problematic GPU and your main network interface are in the same group. You can only pass through the whole group and not individual device of the same group.
 
Thank you for your hint! Since I am not very familiar with the topic I hope that I am interpreting it right. If not, then please correct me.

The GPU devices themselves seem to be in separate groups with no other device of any sort having the same group ID:
# the GPUs have the device IDs 1b, 1c, 1d, 1e, 3d, 3f, 40 and 41

$ dmesg | egrep group | awk '{print $NF" "$0}' | sort -n

[...]
29 [ 3.034135] iommu: Adding device 0000:19:14.0 to group 29
30 [ 3.034382] iommu: Adding device 0000:1b:00.0 to group 30
30 [ 3.034499] iommu: Adding device 0000:1b:00.1 to group 30
31 [ 3.034720] iommu: Adding device 0000:1c:00.0 to group 31
31 [ 3.034834] iommu: Adding device 0000:1c:00.1 to group 31
32 [ 3.035048] iommu: Adding device 0000:1d:00.0 to group 32
32 [ 3.035162] iommu: Adding device 0000:1d:00.1 to group 32
33 [ 3.035377] iommu: Adding device 0000:1e:00.0 to group 33
33 [ 3.035493] iommu: Adding device 0000:1e:00.1 to group 33
34 [ 3.035561] iommu: Adding device 0000:3a:00.0 to group 34
[...]
47 [ 3.040326] iommu: Adding device 0000:3c:14.0 to group 47
48 [ 3.040667] iommu: Adding device 0000:3d:00.0 to group 48
48 [ 3.040796] iommu: Adding device 0000:3d:00.1 to group 48
48 [ 3.040918] iommu: Adding device 0000:3d:00.2 to group 48
48 [ 3.041040] iommu: Adding device 0000:3d:00.3 to group 48
49 [ 3.041378] iommu: Adding device 0000:3f:00.0 to group 49
49 [ 3.041505] iommu: Adding device 0000:3f:00.1 to group 49
49 [ 3.041627] iommu: Adding device 0000:3f:00.2 to group 49
49 [ 3.041749] iommu: Adding device 0000:3f:00.3 to group 49
50 [ 3.042087] iommu: Adding device 0000:40:00.0 to group 50
50 [ 3.042215] iommu: Adding device 0000:40:00.1 to group 50
50 [ 3.042339] iommu: Adding device 0000:40:00.2 to group 50
50 [ 3.042463] iommu: Adding device 0000:40:00.3 to group 50
51 [ 3.042801] iommu: Adding device 0000:41:00.0 to group 51
51 [ 3.042933] iommu: Adding device 0000:41:00.1 to group 51
51 [ 3.043056] iommu: Adding device 0000:41:00.2 to group 51
51 [ 3.043179] iommu: Adding device 0000:41:00.3 to group 51
52 [ 3.043244] iommu: Adding device 0000:5d:02.0 to group 52
[...]

Update 2019-09-18:
I inspected the logs of the startup with 8 GPUs and found out that when using 4+ GPUs, the network adapter receives a different name: enp6s18 changed to enp10s18. With netplan having been configured for enp6s18 during the installation with 0-4 GPUs, it did no longer work when booting up with 4+ GPUs and the resulting unassigned enp10s18 network adapter.

The solution for a flexible passthrough of different numbers of GPUs without having to deal with changing network adapter names was to use netplan's match, macaddress and set-name properties (example).
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!