Hello,
we have problems with recovering from a node outage.
The scenario is:
After a long time (~ 13 minutes) the VM gets started properly. In the logs are a lot of
This issue seems related to https://forum.proxmox.com/threads/task-error-start-failed.72450/, but I'm not sure.
Could you help us?
Cedric
we have problems with recovering from a node outage.
The scenario is:
- 4 nodes
- a VM on node A (part of a HA group)
- we cut of power of node A
- after a while the VM is migrated to node B
- the Start-Task on the new Node fails (see error), but the Status is running and HA State is started
- no answer to ping
- no VNC
Code:
TASK ERROR: start failed: command '/usr/bin/kvm -id 102 -name debian1 -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=1ff44064-a6a1-4c53-8ca4-a1952157d65e' -smp '4,sockets=2,cores=2,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/102.vnc,password -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -object 'iothread,id=iothread-virtioscsi0' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' -device 'vmgenid,guid=516f5b55-7636-47d5-ba06-dd468370d4ce' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:abaf84a6f7c7' -drive 'file=/mnt/pve/cephfs/template/iso/debian-10.6.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=rbd:rbd_pool/vm-102-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/rbd_pool.keyring,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=1E:A4:4B:7A:57:DB,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc+pve0'' failed: got timeout
After a long time (~ 13 minutes) the VM gets started properly. In the logs are a lot of
Code:
Oct 13 11:20:49 px01 pvedaemon[2594]: VM 102 qmp command failed - VM 102 qmp command 'guest-ping' failed - unable to connect to VM 102 qga socket - timeout after 31 retries
This issue seems related to https://forum.proxmox.com/threads/task-error-start-failed.72450/, but I'm not sure.
Could you help us?
Cedric