Greater than 16GB RAM on VM with SR-IOV enabled

gimpbully

Member
Aug 7, 2015
21
0
21
I have a proxmox node with a Mellanox Connectx-3 card set up for SR-IOV and VIP interfaces. Works fine for most cases, I can pass a VIP each to several VMs but once I configure a VM for greater than 16GB, the VM times out when trying to start via the gui or qm commant. Curiously, it does actually start if you take the raw command and run it via ssh. Any ideas where to start troubleshooting this?

Code:
root@proxmox2:~# qm start 110
start failed: command '/usr/bin/kvm -id 110 -name robinhood -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -smbios 'type=1,uuid=1f065d88-7bc9-4109-b633-49976e46a991' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/110.vnc,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 30000 -object 'memory-backend-ram,id=ram-node0,size=30000M' -numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0' -object 'iothread,id=iothread-virtio0' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=d3f6dbd5-8992-4470-ab4b-dc5d095d52c9' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=0000:a1:00.3,id=hostpci0,bus=pci.0,addr=0x10' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ea794bfec1a' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-110-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/dev/zvol/rpool/data/vm-110-disk-1,if=none,id=drive-virtio0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=xx:xx:xx:xx:xx:xx,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc+pve1'' failed: got timeout
root@proxmox2:~#
 
It really does seem to be a timeout thing. Is there a way to manually change the timeout for the GUI or qm startup?
real 1m50.741s
user 0m0.044s
sys 0m0.004s
 
Same issue here.

Starts with 14336MB of RAM, but does not start with 16384 and higher with the QM command and Proxmox web UI.
The raw command works fine in the command line.

/usr/bin/kvm -id 501 -name nasferatu -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/501.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/501.pid -daemonize -smbios 'type=1,uuid=9d940794-cc76-46f3-b0fc-e74345b311bb' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/rpool/data/vm-501-disk-2' -smp '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/501.vnc,password -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 16384 -object 'memory-backend-ram,id=ram-node0,size=8192M' -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -object 'memory-backend-ram,id=ram-node1,size=8192M' -numa 'node,nodeid=1,cpus=2-3,memdev=ram-node1' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=5aeefdbb-f1e5-4968-a098-ac1b2d7d2e0b' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:04:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:fc0df5182c6' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-501-disk-1,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap501i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on,queues=4' -device 'virtio-net-pci,mac=5A:13:3E:70:47:B3,netdev=net0,bus=pci.0,addr=0x12,id=net0,vectors=10,mq=on' -machine 'type=q35+pve0'

proxmox-ve: 6.3-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-4.15: 5.4-7
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

qm config 501
agent: 0
balloon: 0
bios: ovmf
boot: order=scsi0;ide2
cores: 2
cpu: host
cpuunits: 32768
efidisk0: local-zfs:vm-501-disk-2,size=1M
hostpci0: 04:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 14336
name: nasferatu
net0: virtio=5A:13:3E:70:47:B3,bridge=vmbr0,queues=4
numa: 1
onboot: 1
ostype: other
scsi0: local-zfs:vm-501-disk-1,discard=on,size=12G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=9d940794-cc76-46f3-b0fc-e74345b311bb
sockets: 2
startup: order=3,up=60
unused0: local-zfs:vm-501-disk-0
vmgenid: 5aeefdbb-f1e5-4968-a098-ac1b2d7d2e0b

Is more information necessary for troubleshooting?
 
Last edited:
I do recall some reports here in the forum on big memory VMs where the allocation of memory was not fast enough and hence the VM did not start properly.
Have you used the search trying to find something like "big memory VM won't start" or similar?
 
when using pci passthrough, qemu must preallocate the memory used. and yes it seems the reason is that this takes too long and you are running into a timeout
i would suggest to use hugepages, this way you get

1. better memory performance
2 the memory allocations take place in larger chunks and does not take as long
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!