[SOLVED] TASK ERROR: start failed

May 27, 2020
7
0
21
57
Hi,

I have the VM (Oracle Linux). I shutdown the VM and then (i.e. next day) start it again and on first click on Start button I get error shown bellow and VM is not started (the green arrow is actually seen on the VM icon and some CPU and RAM is seen to be consumed , so it maybe indicate that machine starts for few seconds) .
When I click the Start button again, the machine is started in a normal manner.

How to find the problem ? I don't care if I have to click this twice and is little annoying, but I care if this is a sign of some bigger problem which will lead to overall VM inaccessibility in the future.


ERROR:
activating and using 'data1:vm-100-state-suspend-2020-07-02' as vmstate
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name Oracle12c-DRSV -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/data1/vm-100-disk-1' -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/100.vnc,password -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 8192 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=8d090401-390b-442c-b837-21edebbe4c44' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8f70ffbca41a' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/zvol/data1/vm-100-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -drive 'file=/dev/zvol/data1/vm-100-disk-2,if=none,id=drive-sata1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ahci0.1,drive=drive-sata1,id=sata1' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=86:B5:70:7E:DC:57,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc+pve0' -loadstate /dev/zvol/data1/vm-100-state-suspend-2020-07-02' failed: got timeou

regards,
Sašo
 
Last edited:
Hi,
seems like you hit a timeout. For suspended VMs this should be at least 5 minutes. Approximately how long did it take for the error to appear? How long did it take for the VM to start the second time around? Could you share your pveversion -v?
 
Hi Fabien,

1. It takes approximately 30 seconds the error to appear.
And it looks like the error happens only when also the host machine is shutdown and then started again. If I shutdown only the VM and then start it again, it boots immediately (the noVNC console showing the VM's boot screen can be opened after 10 secs).

2. Output of pveversion -v :

proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1


Regards,
Sašo
 
Well, 30 seconds is the default timeout if you don't resume from suspended (and don't have a lot of memory or hugepages active). Could you share the configuration of the VM (located in /etc/pve/nodes/<NODE>/qemu-server/100.conf)? Does the problem only happen with this machine and only with hibernation/suspension?

EDIT: please provide the config from before you start it for the first time after rebooting the host.
 
Hi,

1. The problem appears only on this VM (100) and only if physical machine is also shutdown after the VM is shudown (note: VM is always shudown properly using /sbin/shutdown - now, and host machine is shutdown via proxmox node shutdown)

2. The problem does not appear in other VM (Ubuntu)


3. 100.conf contains:

bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: none,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E:DC:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
parent: InstaliranOracle
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-5.0+pve0
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
sockets: 1
vmgenid: 8d090401-390b-442c-b837-21edebbe4c44
vmstate: data1:vm-100-state-suspend-2020-07-02

[InstaliranOracle]
bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: local:iso/V995537-01.iso,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E:DC:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
parent: PredInstalacijoOracla
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
snaptime: 1592909038
sockets: 1
vmgenid: 8d090401-390b-442c-b837-21edebbe4c44

[PrazenLinux]
bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: local:iso/V995537-01.iso,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E:DC:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
snaptime: 1592854586
sockets: 1
vmgenid: 891b9536-c098-4293-a28d-9191d7742bf3

[PredInstalacijoOracla]
bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: local:iso/V995537-01.iso,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E:DC:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
parent: PrazenLinux
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-5.0+pve0
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
snaptime: 1592904196
sockets: 1
vmgenid: 8d090401-390b-442c-b837-21edebbe4c44
vmstate: data1:vm-100-state-PredInstalacijoOracla



4. 101.conf (VM without such problem) contains:

bootdisk: sata0
cores: 1
ide2: none,media=cdrom
memory: 4096
name: UbuntuMSSqlServer
net0: e1000=8E:D9:8D:68:19:35,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-5.0+pve0
sata0: local-zfs:vm-101-disk-0,size=61815M
scsihw: virtio-scsi-pci
smbios1: uuid=76dd9aba-1e17-40fa-b52e-0b996a12735a
sockets: 1
vmgenid: 9abab29c-241d-4528-99b5-5532e8131b4f
vmstate: local-zfs:vm-101-state-suspend-2020-07-02

Regards,
Sašo
 
Ok, I think I found the issue. There is no suspended lock on the machine, even though it has a statefile that's going to be loaded when the VM is started. So in fact the VM is resumed every time (see -loadstate /dev/zvol/data1/vm-100-state-suspend-2020-07-02 in the task error you posted).

Our code currently does not expect this possibility (and it normally shouldn't happen), so it uses the default timeout as opposed to the larger one for resuming a VM from suspension. Additionally it doesn't remove the state file after resuming, so it will resume from the same state each time the machine is started.

To fix it, please do the following:
  1. Don't suspend the VM in the meantime.
  2. Before starting the VM next time, use qm set 100 --lock suspended.
  3. Start the VM. Now the bigger timeout will be used and the state file will be removed afterwards.
I'd suggest doing the same for the other VM too, as it seems to be in the same situation (it has no lock, but a vmstate).
 
  • Like
Reactions: gravyflex

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!