VM unresponsive after reboot, unable to start again

casdr

Active Member
Apr 8, 2018
1
0
41
25
Hi,

I rebooted a Debian 11 KVM VM (from within the VM itself), but it didn't come back up again. PVE was unable to fetch the details, and when starting a console, I got the following error:

Code:
Jul 10 09:55:49 de-fns1-node1 pvestatd[1467]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 51 retries

I stopped the VM using qm, but when starting the VM again, I'm getting the following error:

Code:
Jul 10 10:02:11 de-fns1-node1 pvedaemon[1771723]: start VM 102: UPID:de-fns1-node1:001B08CB:34E4ECB2:668E4003:qmstart:102:root@pam:
Jul 10 10:02:11 de-fns1-node1 pvedaemon[1351221]: <root@pam> starting task UPID:de-fns1-node1:001B08CB:34E4ECB2:668E4003:qmstart:102:root@pam:
Jul 10 10:02:11 de-fns1-node1 pvedaemon[1771723]: Use of uninitialized value in split at /usr/share/perl5/PVE/QemuServer/Cloudinit.pm line 102.
Jul 10 10:02:31 de-fns1-node1 pvedaemon[1771723]: timeout waiting on systemd
Jul 10 10:02:31 de-fns1-node1 pvedaemon[1351221]: <root@pam> end task UPID:de-fns1-node1:001B08CB:34E4ECB2:668E4003:qmstart:102:root@pam: timeout waiting on systemd

I can see there's a kvm instance with ID 102 still running on the host, but I have no way of controlling it. Even a kill -9 doesn't work.

Code:
root     2154588 34.2 21.0 18241976 13855236 ?   D    Jun07 16268:35 /usr/bin/kvm -id 102 -name de-fns1-nac1,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/102.pid -daemonize -smbios type=1,uuid=1056a8bd-1704-4153-b7ca-92758a268077 -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/102.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 16384 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=b301dcaf-a7e8-4b4f-bcd3-b71528359ce5 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -chardev socket,id=serial0,path=/var/run/qemu-server/102.serial0,server=on,wait=off -device isa-serial,chardev=serial0 -device VGA,id=vga,bus=pci.0,addr=0x2 -chardev socket,path=/var/run/qemu-server/102.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:1af4b1ebb73b -drive file=/var/lib/vz/images/102/vm-102-cloudinit.qcow2,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/102/vm-102-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=101 -netdev type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=DA:C4:1C:75:18:32,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0

Other VM's are working just fine. I've tried disabling the ballooning device and switching the disk controller as suggested in other topics, but that didn't help. After rebooting the PVE host it worked fine.

I'll try to find a way to reproduce this issue, but I'm not quite sure what the original cause was.

Information:

Code:
# systemctl status 102.scope
● 102.scope
     Loaded: loaded (/run/systemd/transient/102.scope; transient)
  Transient: yes
     Active: inactive (dead) since Wed 2024-07-10 09:58:56 CEST; 26min ago
      Tasks: 1 (limit: 76968)
     Memory: 13.2G
        CPU: 1w 4d 7h 8min 42.739s
     CGroup: /qemu.slice/102.scope
             └─2154588 /usr/bin/kvm -id 102 -name de-fns1-nac1,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id>

Jun 07 11:35:20 de-fns1-node1 systemd[1]: Started 102.scope.
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154598]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap102i0
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154598]: ovs|00002|db_ctl_base|ERR|no port named tap102i0
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154599]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln102i0
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154599]: ovs|00002|db_ctl_base|ERR|no port named fwln102i0
Jul 10 09:58:56 de-fns1-node1 systemd[1]: 102.scope: Succeeded.
Jul 10 09:58:56 de-fns1-node1 systemd[1]: Stopped 102.scope.
Jul 10 09:58:56 de-fns1-node1 systemd[1]: 102.scope: Consumed 1w 4d 7h 8min 42.739s CPU time.

Code:
agent: enabled=1
boot: c
bootdisk: scsi0
cores: 8
ide2: local:102/vm-102-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=172.29.241.2/26,gw=172.29.241.1
memory: 16384
meta: creation-qemu=7.2.0,ctime=1711721980
name: de-fns1-nac1
nameserver: 8.8.8.8 8.8.4.4
net0: virtio=DA:C4:1C:75:18:32,bridge=vmbr2,tag=11
numa: 0
onboot: 1
scsi0: local:102/vm-102-disk-0.raw,size=82G
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=1056a8bd-1704-4153-b7ca-92758a268077
sockets: 1
vga: serial0
vmgenid: b301dcaf-a7e8-4b4f-bcd3-b71528359ce5

Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!