PROXMOX VM PCIE Quantity

Fu Lei · Aug 23, 2019

Hello everyone：
I have a question that you need to answer,
proxmox 5.4 My virtual machine can only configure these 4 pcie gpu devices,
I want to ask if there is a problem.

Alwin · Aug 23, 2019

There is not much information to go on. But depending on CPU type, the PCIe lanes might be an issue (4x16=64).

Fu Lei · Aug 28, 2019

Thanks Reply
I uploaded the cpu information of the server and the image of the virtual machine that cannot add pcie.
Please analyze again

Didn't other friends meet?

t.lamprecht · Aug 28, 2019

Fu Lei said:
proxmox 5.4 My virtual machine can only configure these 4 pcie gpu devices,

Yes, currently only 4 "hostpci" devices can be added to a VM by our Design. I'm curious, why do you need more?

We could probably increase it, would just be great to have a good reason.

Fu Lei · Aug 28, 2019

Thank,

Proxmox is mainly used as a GPU server. There are 8 graphics cards in the host. I want to add it to the same virtual machine to enhance the GPU computing performance of the virtual machine.

t.lamprecht · Aug 28, 2019

OK, seems reasonable, thanks for providing a specific use case!

Would you be willing to open a "Enhancement Request" at https://bugzilla.proxmox.com/ (maybe link to this thread) so that this can get tracked? That would be great!

t.lamprecht · Aug 28, 2019

I'd say allowing something like 10-16 PCIe per default would seems as a good limit. We cannot provide unlimited as we need to prereserve the PCIe addresses to ensure hotplug is possible and that they stay stable.

Fu Lei · Aug 28, 2019

Hi
I can't find the directory for enhancement requests

t.lamprecht · Aug 28, 2019

Just click on "File a Bug", then "pve" and then the "Severity" field is already pre-selected to "Enhancement"

Fu Lei · Aug 30, 2019

ok Recommendations have been submitted.

m0xpr0x · Sep 10, 2019

Proxmox VE 6.0-7
4x GTX 1080 Ti
4x RTX 2080 Ti

I have succeeded passing through up to 4 GPUs to a VM (Ubuntu Server 18.04.3), but trying to pass through a 5th GPU results in network problems.

I replaced the files /usr/share/perl5/PVE/QemuServer/PCI.pm and /usr/share/perl5/PVE/QemuServer.pm of my Proxmox installation using the corresponding files form the Git repository (master @ 6cb7b041cec6220b1b105f2b2f22a38216f7e110).

Initially the VM does not start:

Code:

TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -name test -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/101.pid -daemonize -smbios 'type=1,uuid=4308a1ec-7f0e-413d-8d68-44c94a1704b3' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,file=/dev/zvol/rpool/data/vm-101-disk-1' -smp '16,sockets=1,cores=16,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'host,-md-clear,-pcid,-spec-ctrl,-ibpb,-virt-ssbd,-amd-ssbd,-amd-no-ssb,+pdpe1gb,-hv-tlbflush,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off' -m 98304 -device 'vmgenid,guid=9391eca2-6181-4ca3-be2c-97588e840eb9' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=1b:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1b:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -device 'vfio-pci,host=1c:00.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1c:00.1,id=hostpci1.1,bus=ich9-pcie-port-2,addr=0x0.1' -device 'vfio-pci,host=1d:00.0,id=hostpci2.0,bus=ich9-pcie-port-3,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1d:00.1,id=hostpci2.1,bus=ich9-pcie-port-3,addr=0x0.1' -device 'vfio-pci,host=1e:00.0,id=hostpci3.0,bus=ich9-pcie-port-4,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=1e:00.1,id=hostpci3.1,bus=ich9-pcie-port-4,addr=0x0.1' -device 'pcie-root-port,id=ich9-pcie-port-5,addr=10.0,x-speed=16,x-width=32,multifunction=on,bus=pcie.0,port=5,chassis=5' -device 'vfio-pci,host=3d:00.0,id=hostpci4.0,bus=ich9-pcie-port-5,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=3d:00.1,id=hostpci4.1,bus=ich9-pcie-port-5,addr=0x0.1' -device 'vfio-pci,host=3d:00.2,id=hostpci4.2,bus=ich9-pcie-port-5,addr=0x0.2' -device 'vfio-pci,host=3d:00.3,id=hostpci4.3,bus=ich9-pcie-port-5,addr=0x0.3' -chardev 'socket,path=/var/run/qemu-server/101.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:6675b9c82fb6' -drive 'file=/var/lib/vz/template/iso/ubuntu-18.04.3-live-server-amd64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-101-disk-0,if=none,id=drive-scsi0,cache=directsync,format=raw,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=0A:04:96:2A:99:E5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=q35'' failed: got timeout

After rebooting Proxmox and trying to start the VM again, its status in the Proxmox web VM summary view is (and stays) "running".

Unfortunately I can't reach the VM anymore using SSH. Pinging the VM from the Proxmox server also no longer works. As I can't use the Proxmox web VM console (noVNC) anymore (due to the passthrough), I do not know whether the actual passthrough succeeded or failed.

Passing through the following combinations works, so the GPU type itself is not the cause of the problem:

4x GTX 1080 Ti
4x RTX 2080 Ti
1x GTX 1080 Ti + 3x RTX 2080 Ti

Could you please help me setting up a VM with 8 GPUs?

VM config:

Code:

agent: 1
bios: ovmf
bootdisk: scsi0
cores: 16
cpu: host,flags=-md-clear;-pcid;-spec-ctrl;-ibpb;-virt-ssbd;-amd-ssbd;-amd-no-ssb;+pdpe1gb;-hv-tlbflush
efidisk0: local-zfs:vm-101-disk-1,size=128K
ide2: local:iso/ubuntu-18.04.3-live-server-amd64.iso,media=cdrom
machine: q35
memory: 98304
name: test
net0: virtio=0A:04:96:2A:99:E5,bridge=vmbr1
numa: 0
ostype: l26
scsi0: local-zfs:vm-101-disk-0,cache=directsync,size=1T
scsihw: virtio-scsi-pci
smbios1: uuid=4308a1ec-7f0e-413d-8d68-44c94a1704b3
sockets: 1
vmgenid: 9391eca2-6181-4ca3-be2c-97588e840eb9
hostpci0: 1b:00,pcie=1,x-vga=1
hostpci1: 1c:00,pcie=1
hostpci2: 1d:00,pcie=1
hostpci3: 1e:00,pcie=1
hostpci4: 3d:00,pcie=1

Proxmox server /etc/network/interfaces

Code:

auto lo

iface lo inet loopback

iface eno0 inet manual

iface ens4f1 inet manual

auto vmbr0
iface vmbr0 inet static
        address  <ip1>
        netmask  255.255.255.0
        gateway  <ip2>
        bridge-ports eno0
        bridge-stp off
        bridge-fd 0

auto vmbr1
iface vmbr1 inet static
        address  192.168.0.1
        netmask  24
        bridge-ports none
        bridge-stp off
        bridge-fd 0
        post-up echo 1 > /proc/sys/net/ipv4/ip_forward
        post-up iptables -t nat -A POSTROUTING -s '192.168.0.0/24' -o vmbr0 -j MASQUERADE
        post-down iptables -t nat -D POSTROUTING -s '192.168.0.0/24' -o vmbr0 -j MASQUERADE
        post-up iptables -t nat -A PREROUTING -i vmbr0 -p tcp --dport 1000 -j DNAT --to 192.168.0.2:22
        post-down iptables -t nat -D PREROUTING -i vmbr0 -p tcp --dport 1000 -j DNAT --to 192.168.0.2:22

aaron · Sep 16, 2019

m0xpr0x said:
I have succeeded passing through up to 4 GPUs to a VM (Ubuntu Server 18.04.3), but trying to pass through a 5th GPU results in network problems.

Check the IOMMU Groups column. Your problem sounds a lot like the one problematic GPU and your main network interface are in the same group. You can only pass through the whole group and not individual device of the same group.

m0xpr0x · Sep 16, 2019

Thank you for your hint! Since I am not very familiar with the topic I hope that I am interpreting it right. If not, then please correct me.

The GPU devices themselves seem to be in separate groups with no other device of any sort having the same group ID:

# the GPUs have the device IDs 1b, 1c, 1d, 1e, 3d, 3f, 40 and 41

$ dmesg | egrep group | awk '{print $NF" "$0}' | sort -n

[...]
29 [ 3.034135] iommu: Adding device 0000:19:14.0 to group 29
30 [ 3.034382] iommu: Adding device 0000:1b:00.0 to group 30
30 [ 3.034499] iommu: Adding device 0000:1b:00.1 to group 30
31 [ 3.034720] iommu: Adding device 0000:1c:00.0 to group 31
31 [ 3.034834] iommu: Adding device 0000:1c:00.1 to group 31
32 [ 3.035048] iommu: Adding device 0000:1d:00.0 to group 32
32 [ 3.035162] iommu: Adding device 0000:1d:00.1 to group 32
33 [ 3.035377] iommu: Adding device 0000:1e:00.0 to group 33
33 [ 3.035493] iommu: Adding device 0000:1e:00.1 to group 33
34 [ 3.035561] iommu: Adding device 0000:3a:00.0 to group 34
[...]
47 [ 3.040326] iommu: Adding device 0000:3c:14.0 to group 47
48 [ 3.040667] iommu: Adding device 0000:3d:00.0 to group 48
48 [ 3.040796] iommu: Adding device 0000:3d:00.1 to group 48
48 [ 3.040918] iommu: Adding device 0000:3d:00.2 to group 48
48 [ 3.041040] iommu: Adding device 0000:3d:00.3 to group 48
49 [ 3.041378] iommu: Adding device 0000:3f:00.0 to group 49
49 [ 3.041505] iommu: Adding device 0000:3f:00.1 to group 49
49 [ 3.041627] iommu: Adding device 0000:3f:00.2 to group 49
49 [ 3.041749] iommu: Adding device 0000:3f:00.3 to group 49
50 [ 3.042087] iommu: Adding device 0000:40:00.0 to group 50
50 [ 3.042215] iommu: Adding device 0000:40:00.1 to group 50
50 [ 3.042339] iommu: Adding device 0000:40:00.2 to group 50
50 [ 3.042463] iommu: Adding device 0000:40:00.3 to group 50
51 [ 3.042801] iommu: Adding device 0000:41:00.0 to group 51
51 [ 3.042933] iommu: Adding device 0000:41:00.1 to group 51
51 [ 3.043056] iommu: Adding device 0000:41:00.2 to group 51
51 [ 3.043179] iommu: Adding device 0000:41:00.3 to group 51
52 [ 3.043244] iommu: Adding device 0000:5d:02.0 to group 52
[...]

Update 2019-09-18:
I inspected the logs of the startup with 8 GPUs and found out that when using 4+ GPUs, the network adapter receives a different name: enp6s18 changed to enp10s18. With netplan having been configured for enp6s18 during the installation with 0-4 GPUs, it did no longer work when booting up with 4+ GPUs and the resulting unassigned enp10s18 network adapter.

The solution for a flexible passthrough of different numbers of GPUs without having to deal with changing network adapter names was to use netplan's match, macaddress and set-name properties (example).

Search

Search

PROXMOX VM PCIE Quantity

Fu Lei

Active Member

Alwin

Proxmox Retired Staff

Fu Lei

Active Member

t.lamprecht

Proxmox Staff Member

Fu Lei

Active Member

t.lamprecht

Proxmox Staff Member

t.lamprecht

Proxmox Staff Member

Fu Lei

Active Member

t.lamprecht

Proxmox Staff Member

Fu Lei

Active Member

m0xpr0x

New Member

aaron

Proxmox Staff Member

m0xpr0x

New Member