How to fix "time out" error properly?

nwongrat

Member
Feb 16, 2023
34
0
6
Below is the timeout error.

TASK ERROR: start failed: command '/usr/bin/kvm -id 2215 -name 'EMBY,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/2215.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/2215.pid -daemonize -smbios 'type=1,uuid=f7d0f7e3-60f2-4227-90c9-3ecc4becab65' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/zvol/rpool/data/vm-2215-disk-0,size=540672' -smp '16,sockets=2,cores=8,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/2215.vnc,password=on' -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 16000 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=bf3600f2-7839-4929-89da-2f5c7cbaa8a5' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:08:12.1,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'vfio-pci,host=0000:41:00.0,id=hostpci1,bus=ich9-pcie-port-2,addr=0x0' -device 'vfio-pci,host=0000:41:00.1,id=hostpci2,bus=ich9-pcie-port-3,addr=0x0' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -chardev 'socket,path=/var/run/qemu-server/2215.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:1f5a6f1b8753' -drive 'file=/dev/zvol/rpool/data/vm-2215-disk-1,if=none,id=drive-ide0,cache=writeback,format=raw,aio=io_uring,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-7.1+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout


It happend to me every single day. When any VM got time out. It will always time out until the end of the world. Even when there is no workload at all. Then I will need to restart node to fix it. Is there any other proper way beside restart the node?

Thank you.
 
If you don't get any errors in journalctl when starting the VM and only a timeout, then Proxmox has trouble allocating enough memory for the VM. It uses PCI(e) passthrough and therefore all VM memory must be pinned into actual RAM (and ballooning and KSM won't work). Start the VM with less memory or leave more memory free by reducing the memory of other VMs, limiting ZFS ARC, etc.
 
If you don't get any errors in journalctl when starting the VM and only a timeout, then Proxmox has trouble allocating enough memory for the VM. It uses PCI(e) passthrough and therefore all VM memory must be pinned into actual RAM (and ballooning and KSM won't work). Start the VM with less memory or leave more memory free by reducing the memory of other VMs, limiting ZFS ARC, etc.
Speaking of Memories which will lead to another question which is as following.

This node use for Backup and Entertainment.

Service-01 is mainly Veeam backup for Esxi
PMBK-01 is the PBS
EMBY is the entertainment

Nothing fancy in the configuration, everything was done by default. Disk is in HBA mode. It compose of 4 RAID10 zfs.

Here is the current workload and specification of server
QEzxEjn.png



Basically, total ram in this server is about 94G. Those 3 VMs I set up maximum ram total of 38G combine which leave 60G more ram to node usage.

However, As you can see in the image. The total ram usage right now is 89G. It is 50G more than I allocated. I know that node use ram itself but 50G!!!!. It seem weird.

FRcAiP7.png


ftN8w49.png


nvj0b5C.png



If you see in detail for ram usage for each vm it consume about 30G in total.

The question is, where is all my RAM??
 
Last edited:
ZFS ARC takes up to 50% of the memory unless you limit it according to the Proxmox manual.
VMs with PCI(e) passthrough always use all of their memory, even when it looks like they are not using all of their memory.
If I understand correctly to tiny of proxmox knowledge.
Are you saying that right now Ram is being used by node itself in order to whatever it need to do. If it was correct, I really need to limit all that.

There were many times that Veeam service was very slow inspite of RAM not being used by the VM itself but the total ram (in node) was fully used.

In PCIe passthrough, the memory will be used by host or vm? For example, If I have 1 node that full of PCIe passthrough, Shall I leave the ram for host or for the vm that use the passthrough.

Thank a lot. You enlighten me.