[SOLVED] VMs do not start after BIOS update induced net interface rename

net_again64 · Jul 20, 2024

Hello,

I've encountered an issue on single node pve installation. This is fairly new installation (started on pve 8.2) that was working ok so far.

After updating BIOS on AsRack Motherboard B650D4U to BIOS 10.15 (formerly 3.11), no single VM is starting anymore.

For firmware context, this update introduces AGESA 1.1.0.3, which is rather new. CPU is Ryzen 7900 btw.

The update was performed to enable ECC Memory Support, ECC is actually working now.

Error for all VMs seems identical:
Jul 20 11:45:01 pve01 pvedaemon[1884]: VM 105 qmp command failed - VM 105 qmp command 'guest-ping' failed - unable to connect to VM 105 qga socket - timeout after 31 retr>
Jul 20 11:45:03 pve01 pvestatd[1857]: VM 105 qmp command failed - VM 105 qmp command 'query-proxmox-support' failed - unable to connect to VM 105 qmp socket - timeout aft>

Long Error Output:
TASK ERROR: start failed: command '/usr/bin/kvm -id 105 -name 'anothervm105,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/105.pid -daemonize -smbios 'type=1,uuid=492e81ca-276e-4fc9-a1df-8d3722711a2c' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=raw,file=/dev/zvol/rpool/data/vm-105-disk-0,size=540672' -smp '2,sockets=2,cores=1,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/105.vnc,password=on' -cpu qemu64,+aes,enforce,-hv-evmcs,-hv-tlbflush,+kvm_pv_eoi,+kvm_pv_unhalt,-pcid,+pni,+popcnt,-spec-ctrl,-ssbd,+sse4.1,+sse4.2,+ssse3 -m 8192 -object 'iothread,id=iothread-virtioscsi0' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5' -device 'vmgenid,guid=abb1fa64-23a9-4c8c-a6c3-d7fb53867b8b' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/105.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:0eb81c34d3e' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0' -drive 'file=/dev/zvol/rpool/data/vm-105-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=BC:24:11:12:E3:F5,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=256,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout

Things that i have tried so far:
1. Create a new VM without EFIDISK or Hard Disk and boot from IDE Controller ISO > same error.

2. Check Logs in dmesg and journalctl
- Noticed RRDC errors (potentially cause the RTC Time reset on BIOS Update:

Code:

Jul 20 11:45:03 pve01 pmxcfs[1769]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-vm/105: -1
Jul 20 11:45:03 pve01 pmxcfs[1769]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve01/pve_back: -1
Jul 20 11:45:03 pve01 pmxcfs[1769]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/pve01/local: -1

Followed this thread to fix the issue:
https://forum.proxmox.com/threads/rrdc-and-rrd-update-errors.76219/
>> While the db backup and rebuild seemed to work, VMs are still not able to boot.

2.1 Fixed the hwclock eventually and set it to my local time zone (GMT+2)
hwclock
2024-07-20 14:19:16.867045+02:00

timedatectl
Local time: Sat 2024-07-20 14:19:25 CEST
Universal time: Sat 2024-07-20 12:19:25 UTC
RTC time: Sat 2024-07-20 12:19:25
Time zone: Europe/Vienna (CEST, +0200)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no

>> RTC funnily does not match hwclock.

3. Grabbed the latest Package Updates - and rebooted:
ifupdown2/stable 3.2.0-1+pmx9 all [upgradable from: 3.2.0-1+pmx8]
proxmox-kernel-6.8/stable 6.8.8-3 all [upgradable from: 6.8.8-2]
pve-firmware/stable 3.13-1 all [upgradable from: 3.12-1]
qemu-server/stable 8.2.2 amd64 [upgradable from: 8.2.1]

Full Output of running versions:

pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

4. Checked ZFS service states and logs. Looks mostly ok

Code:

root@pve01:/var/log# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:33 with 0 errors on Sun Jul 14 00:24:34 2024
config:

        NAME                                 STATE     READ WRITE CKSUM
        rpool                                ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            nvme-eui.6479a78a8f000fc6-part3  ONLINE       0     0     0
            nvme-eui.6479a78a8f00238c-part3  ONLINE       0     0     0

errors: No known data errors

zfs-import-cache.service has a warning:
from systemctl status zfs-import-cache.service
zfs-import-cache.service Condition: start condition failed at Sat 2024-07-20 12:02:00 CEST; 13min ago
Jul 20 12:02:00 pve01 systemd[1]: zfs-import-cache.service - Import ZFS pools by cache file was skipped because of an unmet condition check (ConditionFileNotEmpty=/etc/zfs/zpool.cache).

5. Checked if SVM is actually running..:
lscpu | grep AMD-V
Virtualization: AMD-V
lscpu | grep svm
Flags: svm svm_lock [....]

I am a bit stuck at this point and would appreciate a lead/tip. Many thanks.

net_again64 · Jul 20, 2024

rolled back BIOS to the previous version, and reset all BIOS settings to defaults.
Behavior stays the same.
No new log insights so far.

leesteken · Jul 20, 2024

net_again64 said:
rolled back BIOS to the previous version, and reset all BIOS settings to defaults.
Behavior stays the same.
No new log insights so far.

I've noticed on Ryzen AM4 that a BIOS update and/or CMOS reset disables SVM, which means hardware virtualization (KVM) is not supported. The setting was also not in an obvious place in the BIOS settings. Maybe it's the same or something similar in your case?

net_again64 · Jul 20, 2024

leesteken said:
I've noticed on Ryzen AM4 that a BIOS update and/or CMOS reset disables SVM, which means hardware virtualization (KVM) is not supported. The setting was also not in an obvious place in the BIOS settings. Maybe it's the same or something similar in your case?

The SVM option in BIOS did move to another menu after update to 10.15, so i suspected the same.
Checked lscpu, the higher 10.15 and lower BIOS 3.11 version and both have these 2 lines, which to my understanding shows it is enabled.
lscpu
flags svm
Virtualization features:
Virtualization: AMD-V

These are the checks for SVM i know, do you know another?

net_again64 · Jul 21, 2024

Update on this issue:

Narrowed down the potential cause by creating a new VM without Network Interface > VM does boot.
Add a Nic to the VM > VM does not boot.

checked
echo /sys/class/net/*
for changes in network interfaces
1 Interface was renamed during BIOS update...
enp7s0 > enp5s0

removed the orphaned interface reference in /etc/network/interfaces
and added the newly renamed interface in /etc/network/interfaces
rebooted pve host

unfortunately vms still do not boot, but it looks like i am getting close to the actual issue.

looked for references of orphaned network interface enp7s0 and remove them where present
sudo rgrep enp7s0 /etc
/etc/network/interfaces-back:iface enp7s0 inet manual

sudo find /etc -name '*enp7s0*'
#no result

validated network interface config
ifup -a --no-act ; echo "status: $?"
status: 0

Has someone experienced this before and knows how to resolve this?

The orphaned interface is actually not part of the LACP port group where the bridge resides, so it is not in use.
cat /etc/network/interfaces

Code:

auto lo
iface lo inet loopback

# Set intel-x550-t2 interfaces to static
iface enp1s0f0 inet manual
iface enp1s0f1 inet manual

# LACP Conf
auto bond0
iface bond0 inet manual
      bond-slaves enp1s0f0 enp1s0f1
      bond-miimon 100
      bond-mode 802.3ad
      bond-xmit-hash-policy layer2+3

# vlan24 interface
# iface bond0.24 inet manual

# vlan24 ipv4 conf
auto vmbr0.24
iface vmbr0.24 inet static
        address  172.16.24.21/24
        gateway  172.16.24.1

# VLAN-aware-Bridge
auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094


iface enx2add5a93b6bf inet manual
iface enp5s0 inet manual
iface enp6s0 inet manual
#auto enp6s0
#iface enp6s0 inet static
#        address  192.168.88.3/24
#        gateway  192.168.88.1

net_again64 · Jul 21, 2024

The issue is solved.

Summary:
BIOS Update led to rename of 1 unused network interface, which resulted in VMs not starting if they had a virtual NIC attached.

Behavior:
PVE itself was working normally

VMs would not start if they had a virtual NIC attached.

TASK Error: see above

VM will start without a virtual NIC attached

Solution:
Stay on the new BIOS Version AND fix references in Debian for the old orphaned interface name (in this case enp7s0 > enp5s0)
reboot pve after the changes

No Solution:
downgrade to the old BIOS Version AND fix references in Debian for the old orphaned interface name (in this case enp7s0 > enp5s0)

leesteken · Jul 21, 2024

Please note that enabling or disabling onboard devices (like adding/removing devices) can also cause PCI IDs to change (and network configuration to break). This happening with a BIOS update is new to me but since the BIOS determines the PCI ID order, it does make sense.

Search

Search

[SOLVED] VMs do not start after BIOS update induced net interface rename

net_again64

New Member

net_again64

New Member

leesteken

Distinguished Member

net_again64

New Member

net_again64

New Member

net_again64

New Member

leesteken

Distinguished Member