Hi,
I rebooted a Debian 11 KVM VM (from within the VM itself), but it didn't come back up again. PVE was unable to fetch the details, and when starting a console, I got the following error:
I stopped the VM using qm, but when starting the VM again, I'm getting the following error:
I can see there's a kvm instance with ID 102 still running on the host, but I have no way of controlling it. Even a kill -9 doesn't work.
Other VM's are working just fine. I've tried disabling the ballooning device and switching the disk controller as suggested in other topics, but that didn't help. After rebooting the PVE host it worked fine.
I'll try to find a way to reproduce this issue, but I'm not quite sure what the original cause was.
Information:
I rebooted a Debian 11 KVM VM (from within the VM itself), but it didn't come back up again. PVE was unable to fetch the details, and when starting a console, I got the following error:
Code:
Jul 10 09:55:49 de-fns1-node1 pvestatd[1467]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 51 retries
I stopped the VM using qm, but when starting the VM again, I'm getting the following error:
Code:
Jul 10 10:02:11 de-fns1-node1 pvedaemon[1771723]: start VM 102: UPID:de-fns1-node1:001B08CB:34E4ECB2:668E4003:qmstart:102:root@pam:
Jul 10 10:02:11 de-fns1-node1 pvedaemon[1351221]: <root@pam> starting task UPID:de-fns1-node1:001B08CB:34E4ECB2:668E4003:qmstart:102:root@pam:
Jul 10 10:02:11 de-fns1-node1 pvedaemon[1771723]: Use of uninitialized value in split at /usr/share/perl5/PVE/QemuServer/Cloudinit.pm line 102.
Jul 10 10:02:31 de-fns1-node1 pvedaemon[1771723]: timeout waiting on systemd
Jul 10 10:02:31 de-fns1-node1 pvedaemon[1351221]: <root@pam> end task UPID:de-fns1-node1:001B08CB:34E4ECB2:668E4003:qmstart:102:root@pam: timeout waiting on systemd
I can see there's a kvm instance with ID 102 still running on the host, but I have no way of controlling it. Even a kill -9 doesn't work.
Code:
root 2154588 34.2 21.0 18241976 13855236 ? D Jun07 16268:35 /usr/bin/kvm -id 102 -name de-fns1-nac1,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/102.pid -daemonize -smbios type=1,uuid=1056a8bd-1704-4153-b7ca-92758a268077 -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/102.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 16384 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=b301dcaf-a7e8-4b4f-bcd3-b71528359ce5 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -chardev socket,id=serial0,path=/var/run/qemu-server/102.serial0,server=on,wait=off -device isa-serial,chardev=serial0 -device VGA,id=vga,bus=pci.0,addr=0x2 -chardev socket,path=/var/run/qemu-server/102.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:1af4b1ebb73b -drive file=/var/lib/vz/images/102/vm-102-cloudinit.qcow2,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/102/vm-102-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=101 -netdev type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=DA:C4:1C:75:18:32,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0
Other VM's are working just fine. I've tried disabling the ballooning device and switching the disk controller as suggested in other topics, but that didn't help. After rebooting the PVE host it worked fine.
I'll try to find a way to reproduce this issue, but I'm not quite sure what the original cause was.
Information:
Code:
# systemctl status 102.scope
● 102.scope
Loaded: loaded (/run/systemd/transient/102.scope; transient)
Transient: yes
Active: inactive (dead) since Wed 2024-07-10 09:58:56 CEST; 26min ago
Tasks: 1 (limit: 76968)
Memory: 13.2G
CPU: 1w 4d 7h 8min 42.739s
CGroup: /qemu.slice/102.scope
└─2154588 /usr/bin/kvm -id 102 -name de-fns1-nac1,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id>
Jun 07 11:35:20 de-fns1-node1 systemd[1]: Started 102.scope.
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154598]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap102i0
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154598]: ovs|00002|db_ctl_base|ERR|no port named tap102i0
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154599]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln102i0
Jun 07 11:35:20 de-fns1-node1 ovs-vsctl[2154599]: ovs|00002|db_ctl_base|ERR|no port named fwln102i0
Jul 10 09:58:56 de-fns1-node1 systemd[1]: 102.scope: Succeeded.
Jul 10 09:58:56 de-fns1-node1 systemd[1]: Stopped 102.scope.
Jul 10 09:58:56 de-fns1-node1 systemd[1]: 102.scope: Consumed 1w 4d 7h 8min 42.739s CPU time.
Code:
agent: enabled=1
boot: c
bootdisk: scsi0
cores: 8
ide2: local:102/vm-102-cloudinit.qcow2,media=cdrom,size=4M
ipconfig0: ip=172.29.241.2/26,gw=172.29.241.1
memory: 16384
meta: creation-qemu=7.2.0,ctime=1711721980
name: de-fns1-nac1
nameserver: 8.8.8.8 8.8.4.4
net0: virtio=DA:C4:1C:75:18:32,bridge=vmbr2,tag=11
numa: 0
onboot: 1
scsi0: local:102/vm-102-disk-0.raw,size=82G
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=1056a8bd-1704-4153-b7ca-92758a268077
sockets: 1
vga: serial0
vmgenid: b301dcaf-a7e8-4b4f-bcd3-b71528359ce5
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1