I have a machine with 128GB of RAM with an EPYC 7401P (24C 48T) and want to run 4 VM's.
Initially I had issues setting up the first machine and giving it varying amounts of RAM, but that seemed fixed when using hugepages and configuring more sockets. Setup of the template went well but now I hit another problem; when trying to start 4 equal VM's it won't start the last one (whatever machine I start last will not start). My VM's all get 10 vCPU's, 30GB of RAM and one GPU passed through.
The error I get is:
This is my grub config:
State before starting VM's:
After starting 3 VM's looking as expected:
When attempted to start the 4th VM:
I can run any combination of 3 VM's, but whichever machine is started last will fail like that, it somehow creates 400-700 extra hugepages but doesn't allocate any of the available freepages.
VM config:
Giving less resources to each machine makes them work eg. 4 vCPU & 10 GB of RAM.
I have tried setting the machines as 1 socket with 10 cpu's, 2 sockets with 5 cpu's and 4 sockets with 2 - 3 cpu's, ballooning on or off, same result. I am up to date with the latest no-subscription repository.
When not using hugepages I get a timeout error like this:
I am getting a paid subscription when this machine goes into production but I'm not sold on Proxmox if I can't get my system to work.
Initially I had issues setting up the first machine and giving it varying amounts of RAM, but that seemed fixed when using hugepages and configuring more sockets. Setup of the template went well but now I hit another problem; when trying to start 4 equal VM's it won't start the last one (whatever machine I start last will not start). My VM's all get 10 vCPU's, 30GB of RAM and one GPU passed through.
The error I get is:
Code:
TASK ERROR: start failed: hugepage allocation failed at usr/share/perl5/PVE/QemuServer/Memory.pm line 532.
This is my grub config:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on hugepagesz=2M hugepages=61000 default_hugepagesz=2M"
State before starting VM's:
Code:
root@ProxS01:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 61000
HugePages_Free: 61000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
After starting 3 VM's looking as expected:
Code:
root@ProxS01:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 45056 kB
ShmemHugePages: 0 kB
HugePages_Total: 61000
HugePages_Free: 16000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
When attempted to start the 4th VM:
Code:
root@ProxS01:~# cat /proc/meminfo | grep -i huge
AnonHugePages: 45056 kB
ShmemHugePages: 0 kB
HugePages_Total: 61409
HugePages_Free: 16409
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
VM config:
Code:
balloon: 0
bootdisk: scsi0
cores: 5
cpu: host,hidden=1
hostpci0: 21:00
hugepages: 2
ide2: none,media=cdrom
memory: 30000
name: ProxS01M01
net0: virtio=CA:F9:B7:37:26:2D,bridge=vmbr0
numa: 1
ostype: l26
scsi0: local-lvm:vm-101-disk-0,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=8bef3d1c-c51f-47de-ba63-f67aa343f407
sockets: 2
vmgenid: 7d1f8100-002c-420a-a85d-8253031c3a90
Giving less resources to each machine makes them work eg. 4 vCPU & 10 GB of RAM.
I have tried setting the machines as 1 socket with 10 cpu's, 2 sockets with 5 cpu's and 4 sockets with 2 - 3 cpu's, ballooning on or off, same result. I am up to date with the latest no-subscription repository.
When not using hugepages I get a timeout error like this:
Code:
TASK ERROR: start failed: command '/usr/bin/kvm -id 103 -name ProxS01M03 -chardev 'socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/103.pid -daemonize -smbios 'type=1,uuid=235f9ccb-2831-4b93-9e7a-e0674e4b08db' -smp '12,sockets=4,cores=3,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/103.vnc,x509,password -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off' -m 30000 -object 'memory-backend-ram,id=ram-node0,size=7500M' -numa 'node,nodeid=0,cpus=0-2,memdev=ram-node0' -object 'memory-backend-ram,id=ram-node1,size=7500M' -numa 'node,nodeid=1,cpus=3-5,memdev=ram-node1' -object 'memory-backend-ram,id=ram-node2,size=7500M' -numa 'node,nodeid=2,cpus=6-8,memdev=ram-node2' -object 'memory-backend-ram,id=ram-node3,size=7500M' -numa 'node,nodeid=3,cpus=9-11,memdev=ram-node3' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=d7a87059-2048-4803-973a-0cfd1ae50cb3' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=43:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,multifunction=on' -device 'vfio-pci,host=43:00.1,id=hostpci0.1,bus=pci.0,addr=0x10.1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8ff6963ed07e' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-103-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=5E:41:72:1B:09:14,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc'' failed: got timeout
I am getting a paid subscription when this machine goes into production but I'm not sold on Proxmox if I can't get my system to work.
Last edited: