[SOLVED] TASK ERROR: start failed

SasoCelarc · Jul 6, 2020

Hi,

I have the VM (Oracle Linux). I shutdown the VM and then (i.e. next day) start it again and on first click on Start button I get error shown bellow and VM is not started (the green arrow is actually seen on the VM icon and some CPU and RAM is seen to be consumed , so it maybe indicate that machine starts for few seconds) .
When I click the Start button again, the machine is started in a normal manner.

How to find the problem ? I don't care if I have to click this twice and is little annoying, but I care if this is a sign of some bigger problem which will lead to overall VM inaccessibility in the future.

ERROR:
activating and using 'data1:vm-100-state-suspend-2020-07-02' as vmstate
TASK ERROR: start failed: command '/usr/bin/kvm -id 100 -name Oracle12c-DRSV -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/data1/vm-100-disk-1' -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/100.vnc,password -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 8192 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=8d090401-390b-442c-b837-21edebbe4c44' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:8f70ffbca41a' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/zvol/data1/vm-100-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -drive 'file=/dev/zvol/data1/vm-100-disk-2,if=none,id=drive-sata1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ahci0.1,drive=drive-sata1,id=sata1' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=86:B5:70:7E

C:57,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc+pve0' -loadstate /dev/zvol/data1/vm-100-state-suspend-2020-07-02' failed: got timeou

regards,
Sašo

fiona · Jul 6, 2020

Hi,
seems like you hit a timeout. For suspended VMs this should be at least 5 minutes. Approximately how long did it take for the error to appear? How long did it take for the VM to start the second time around? Could you share your pveversion -v?

SasoCelarc · Jul 6, 2020

Hi Fabien,

1. It takes approximately 30 seconds the error to appear.
And it looks like the error happens only when also the host machine is shutdown and then started again. If I shutdown only the VM and then start it again, it boots immediately (the noVNC console showing the VM's boot screen can be opened after 10 secs).

2. Output of pveversion -v :

proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

Regards,
Sašo

fiona · Jul 6, 2020

Well, 30 seconds is the default timeout if you don't resume from suspended (and don't have a lot of memory or hugepages active). Could you share the configuration of the VM (located in /etc/pve/nodes/<NODE>/qemu-server/100.conf)? Does the problem only happen with this machine and only with hibernation/suspension?

EDIT: please provide the config from before you start it for the first time after rebooting the host.

SasoCelarc · Jul 6, 2020

Hi,

1. The problem appears only on this VM (100) and only if physical machine is also shutdown after the VM is shudown (note: VM is always shudown properly using /sbin/shutdown - now, and host machine is shutdown via proxmox node shutdown)

2. The problem does not appear in other VM (Ubuntu)

3. 100.conf contains:

bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: none,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E

C:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
parent: InstaliranOracle
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-5.0+pve0
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
sockets: 1
vmgenid: 8d090401-390b-442c-b837-21edebbe4c44
vmstate: data1:vm-100-state-suspend-2020-07-02

[InstaliranOracle]
bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: local:iso/V995537-01.iso,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E

C:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
parent: PredInstalacijoOracla
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
snaptime: 1592909038
sockets: 1
vmgenid: 8d090401-390b-442c-b837-21edebbe4c44

[PrazenLinux]
bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: local:iso/V995537-01.iso,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E

C:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
snaptime: 1592854586
sockets: 1
vmgenid: 891b9536-c098-4293-a28d-9191d7742bf3

[PredInstalacijoOracla]
bios: ovmf
bootdisk: sata0
cores: 2
efidisk0: data1:vm-100-disk-1,size=1M
ide2: local:iso/V995537-01.iso,media=cdrom
memory: 8192
name: Oracle12c-DRSV
net0: virtio=86:B5:70:7E

C:57,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
parent: PrazenLinux
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-5.0+pve0
sata0: data1:vm-100-disk-0,size=32G
sata1: data1:vm-100-disk-2,size=180G
scsihw: virtio-scsi-pci
smbios1: uuid=e5ceddea-847f-4a7b-9d20-07af96b8eb87
snaptime: 1592904196
sockets: 1
vmgenid: 8d090401-390b-442c-b837-21edebbe4c44
vmstate: data1:vm-100-state-PredInstalacijoOracla

4. 101.conf (VM without such problem) contains:

bootdisk: sata0
cores: 1
ide2: none,media=cdrom
memory: 4096
name: UbuntuMSSqlServer
net0: e1000=8E

9:8D:68:19:35,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-5.0+pve0
sata0: local-zfs:vm-101-disk-0,size=61815M
scsihw: virtio-scsi-pci
smbios1: uuid=76dd9aba-1e17-40fa-b52e-0b996a12735a
sockets: 1
vmgenid: 9abab29c-241d-4528-99b5-5532e8131b4f
vmstate: local-zfs:vm-101-state-suspend-2020-07-02

Regards,
Sašo

fiona · Jul 7, 2020

Ok, I think I found the issue. There is no suspended lock on the machine, even though it has a statefile that's going to be loaded when the VM is started. So in fact the VM is resumed every time (see -loadstate /dev/zvol/data1/vm-100-state-suspend-2020-07-02 in the task error you posted).

Our code currently does not expect this possibility (and it normally shouldn't happen), so it uses the default timeout as opposed to the larger one for resuming a VM from suspension. Additionally it doesn't remove the state file after resuming, so it will resume from the same state each time the machine is started.

To fix it, please do the following:

Don't suspend the VM in the meantime.
Before starting the VM next time, use qm set 100 --lock suspended.
Start the VM. Now the bigger timeout will be used and the state file will be removed afterwards.

I'd suggest doing the same for the other VM too, as it seems to be in the same situation (it has no lock, but a vmstate).

SasoCelarc · Jul 7, 2020

Thank you Fabian. Your instructions solved the problem. Now the VM is in working state after first click on Start button.

Regards,
Sašo

Search

Search

[SOLVED] TASK ERROR: start failed

SasoCelarc

Member

fiona

Proxmox Staff Member

SasoCelarc

Member

fiona

Proxmox Staff Member

SasoCelarc

Member

fiona

Proxmox Staff Member

SasoCelarc

Member