Hello,
I've encountered an issue on single node pve installation. This is fairly new installation (started on pve 8.2) that was working ok so far.
After updating BIOS on AsRack Motherboard B650D4U to BIOS 10.15 (formerly 3.11), no single VM is starting anymore.
For firmware context, this update introduces AGESA 1.1.0.3, which is rather new. CPU is Ryzen 7900 btw.
The update was performed to enable ECC Memory Support, ECC is actually working now.
Error for all VMs seems identical:
Jul 20 11:45:01 pve01 pvedaemon[1884]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - unable to connect to VM 105 qga socket - timeout after 31 retr>
Jul 20 11:45:03 pve01 pvestatd[1857]: VM 105 qmp command failed - VM 105 qmp command 'query-proxmox-support' failed - unable to connect to VM 105 qmp socket - timeout aft>
Long Error Output:
TASK ERROR: start failed: command '/usr/bin/kvm -id 105 -name 'anothervm105,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/105.pid -daemonize -smbios 'type=1,uuid=492e81ca-276e-4fc9-a1df-8d3722711a2c' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/zvol/rpool/data/vm-105-disk-0,size=540672' -smp '2,sockets=2,cores=1,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/105.vnc,password=on' -cpu qemu64,+aes,enforce,-hv-evmcs,-hv-tlbflush,+kvm_pv_eoi,+kvm_pv_unhalt,-pcid,+pni,+popcnt,-spec-ctrl,-ssbd,+sse4.1,+sse4.2,+ssse3 -m 8192 -object 'iothread,id=iothread-virtioscsi0' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' -device 'vmgenid,guid=abb1fa64-23a9-4c8c-a6c3-d7fb53867b8b' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/105.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:0eb81c34d3e' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=/dev/zvol/rpool/data/vm-105-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=BC:24:11:12:E3:F5,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout
Things that i have tried so far:
1. Create a new VM without EFIDISK or Hard Disk and boot from IDE Controller ISO > same error.
2. Check Logs in dmesg and journalctl
- Noticed RRDC errors (potentially cause the RTC Time reset on BIOS Update:
Followed this thread to fix the issue:
https://forum.proxmox.com/threads/rrdc-and-rrd-update-errors.76219/
>> While the db backup and rebuild seemed to work, VMs are still not able to boot.
2.1 Fixed the hwclock eventually and set it to my local time zone (GMT+2)
hwclock
2024-07-20 14:19:16.867045+02:00
timedatectl
Local time: Sat 2024-07-20 14:19:25 CEST
Universal time: Sat 2024-07-20 12:19:25 UTC
RTC time: Sat 2024-07-20 12:19:25
Time zone: Europe/Vienna (CEST, +0200)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
>> RTC funnily does not match hwclock.
3. Grabbed the latest Package Updates - and rebooted:
ifupdown2/stable 3.2.0-1+pmx9 all [upgradable from: 3.2.0-1+pmx8]
proxmox-kernel-6.8/stable 6.8.8-3 all [upgradable from: 6.8.8-2]
pve-firmware/stable 3.13-1 all [upgradable from: 3.12-1]
qemu-server/stable 8.2.2 amd64 [upgradable from: 8.2.1]
Full Output of running versions:
pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
4. Checked ZFS service states and logs. Looks mostly ok
zfs-import-cache.service has a warning:
from systemctl status zfs-import-cache.service
zfs-import-cache.service Condition: start condition failed at Sat 2024-07-20 12:02:00 CEST; 13min ago
Jul 20 12:02:00 pve01 systemd[1]: zfs-import-cache.service - Import ZFS pools by cache file was skipped because of an unmet condition check (ConditionFileNotEmpty=/etc/zfs/zpool.cache).
5. Checked if SVM is actually running..:
lscpu | grep AMD-V
Virtualization: AMD-V
lscpu | grep svm
Flags: svm svm_lock [....]
I am a bit stuck at this point and would appreciate a lead/tip. Many thanks.
I've encountered an issue on single node pve installation. This is fairly new installation (started on pve 8.2) that was working ok so far.
After updating BIOS on AsRack Motherboard B650D4U to BIOS 10.15 (formerly 3.11), no single VM is starting anymore.
For firmware context, this update introduces AGESA 1.1.0.3, which is rather new. CPU is Ryzen 7900 btw.
The update was performed to enable ECC Memory Support, ECC is actually working now.
Error for all VMs seems identical:
Jul 20 11:45:01 pve01 pvedaemon[1884]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - unable to connect to VM 105 qga socket - timeout after 31 retr>
Jul 20 11:45:03 pve01 pvestatd[1857]: VM 105 qmp command failed - VM 105 qmp command 'query-proxmox-support' failed - unable to connect to VM 105 qmp socket - timeout aft>
Long Error Output:
TASK ERROR: start failed: command '/usr/bin/kvm -id 105 -name 'anothervm105,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/105.pid -daemonize -smbios 'type=1,uuid=492e81ca-276e-4fc9-a1df-8d3722711a2c' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/zvol/rpool/data/vm-105-disk-0,size=540672' -smp '2,sockets=2,cores=1,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/105.vnc,password=on' -cpu qemu64,+aes,enforce,-hv-evmcs,-hv-tlbflush,+kvm_pv_eoi,+kvm_pv_unhalt,-pcid,+pni,+popcnt,-spec-ctrl,-ssbd,+sse4.1,+sse4.2,+ssse3 -m 8192 -object 'iothread,id=iothread-virtioscsi0' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' -device 'vmgenid,guid=abb1fa64-23a9-4c8c-a6c3-d7fb53867b8b' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/105.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:0eb81c34d3e' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=/dev/zvol/rpool/data/vm-105-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=BC:24:11:12:E3:F5,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout
Things that i have tried so far:
1. Create a new VM without EFIDISK or Hard Disk and boot from IDE Controller ISO > same error.
2. Check Logs in dmesg and journalctl
- Noticed RRDC errors (potentially cause the RTC Time reset on BIOS Update:
Code:
Jul 20 11:45:03 pve01 pmxcfs[1769]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/105: -1
Jul 20 11:45:03 pve01 pmxcfs[1769]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve01/pve_back: -1
Jul 20 11:45:03 pve01 pmxcfs[1769]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve01/local: -1
Followed this thread to fix the issue:
https://forum.proxmox.com/threads/rrdc-and-rrd-update-errors.76219/
>> While the db backup and rebuild seemed to work, VMs are still not able to boot.
2.1 Fixed the hwclock eventually and set it to my local time zone (GMT+2)
hwclock
2024-07-20 14:19:16.867045+02:00
timedatectl
Local time: Sat 2024-07-20 14:19:25 CEST
Universal time: Sat 2024-07-20 12:19:25 UTC
RTC time: Sat 2024-07-20 12:19:25
Time zone: Europe/Vienna (CEST, +0200)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
>> RTC funnily does not match hwclock.
3. Grabbed the latest Package Updates - and rebooted:
ifupdown2/stable 3.2.0-1+pmx9 all [upgradable from: 3.2.0-1+pmx8]
proxmox-kernel-6.8/stable 6.8.8-3 all [upgradable from: 6.8.8-2]
pve-firmware/stable 3.13-1 all [upgradable from: 3.12-1]
qemu-server/stable 8.2.2 amd64 [upgradable from: 8.2.1]
Full Output of running versions:
pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
4. Checked ZFS service states and logs. Looks mostly ok
Code:
root@pve01:/var/log# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:00:33 with 0 errors on Sun Jul 14 00:24:34 2024
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.6479a78a8f000fc6-part3 ONLINE 0 0 0
nvme-eui.6479a78a8f00238c-part3 ONLINE 0 0 0
errors: No known data errors
zfs-import-cache.service has a warning:
from systemctl status zfs-import-cache.service
zfs-import-cache.service Condition: start condition failed at Sat 2024-07-20 12:02:00 CEST; 13min ago
Jul 20 12:02:00 pve01 systemd[1]: zfs-import-cache.service - Import ZFS pools by cache file was skipped because of an unmet condition check (ConditionFileNotEmpty=/etc/zfs/zpool.cache).
5. Checked if SVM is actually running..:
lscpu | grep AMD-V
Virtualization: AMD-V
lscpu | grep svm
Flags: svm svm_lock [....]
I am a bit stuck at this point and would appreciate a lead/tip. Many thanks.