[SOLVED] Certain VMs from a cluster cannot be backed up and managed

gradinaruvasile

Well-Known Member
Oct 22, 2015
83
11
48
We have a 4 node cluster of proxmox 6

Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-4.15: 5.4-7
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-4.15.18-19-pve: 4.15.18-45
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-63
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

and on one of th enodes we have some VMs that cannot be managed:
- VNC fails with
Code:
VM ID qmp command 'change' failed - got timeoutTASK ERROR: Failed to run vncproxy.
- Migration fails with
Code:
Task started by HA resource agent
2019-08-16 14:56:20 ERROR: migration aborted (duration 00:00:03): VM 118 qmp command 'query-machines' failed - got timeout
TASK ERROR: migration aborted
- Backup fails with
Code:
NFO: Starting Backup of VM 118 (qemu)
INFO: Backup started at 2019-08-16 11:33:30
INFO: status = running
INFO: update VM 118: -lock backup
INFO: VM Name: VMNAME
INFO: include disk 'scsi0' 'zesan-lvm:vm-118-disk-0' 25G
/dev/sdc: open failed: No medium found
/dev/sdc: open failed: No medium found
/dev/sdc: open failed: No medium found
/dev/sdc: open failed: No medium found
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: The backup started
INFO: creating archive '/mnt/pve/mntpoint/dump/vzdump-qemu-118-2019_08_16-11_33_30.vma.lzo'
ERROR: got timeout
INFO: aborting backup job
ERROR: VM 118 qmp command 'backup-cancel' failed - got timeout
ERROR: Backup of VM 118 failed - got timeout
INFO: Failed at 2019-08-16 11:43:36
INFO: Backup ended
INFO: Backup job finished with errors
TASK ERROR: job errors

It seems that if we shut down and start the VMs they will start working.
But this happened a few days back, restarted the VMs since then (actually the whole cluster) but now this issue is back for random machines on the same node.
This is breaking automatic backups for us and possibly HA.

How can i diagnose this?

Edit:

The logs for pve-ha-lrm service are full with entries lik ethis:

Code:
Aug 16 11:43:38 srv pve-ha-lrm[25059]: VM 118 qmp command 'query-status' failed - got timeout
Aug 16 11:43:38 srv pve-ha-lrm[25059]: VM 118 qmp command failed - VM 118 qmp command 'query-status' failed - got timeout
Aug 16 11:43:25 srv pve-ha-lrm[25012]: VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:43:25 srv pve-ha-lrm[25012]: VM 118 qmp command failed - VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:43:15 srv pve-ha-lrm[24963]: VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:43:15 srv pve-ha-lrm[24963]: VM 118 qmp command failed - VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:43:05 srv pve-ha-lrm[24912]: VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:43:05 srv pve-ha-lrm[24912]: VM 118 qmp command failed - VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:42:55 srv pve-ha-lrm[24851]: VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:42:55 srv pve-ha-lrm[24851]: VM 118 qmp command failed - VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:42:45 srv pve-ha-lrm[24802]: VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:42:45 srv pve-ha-lrm[24802]: VM 118 qmp command failed - VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
Aug 16 11:42:35 srv pve-ha-lrm[24753]: VM 118 qmp command 'query-status' failed - unable to connect to VM 118 qmp socket - timeout after 31 retries
 
Last edited:
So it seems this issue does not just go away.
I had one node where all except one VMs exhibited this issue.
Only some of qmp commands are working. Shutdown is working and some others, but not migrate or backup or even monitor (console isn't working).
And in time this seem to "get caught" by other VMs. No idea what to look for.
I rebooted all vm's on a node then migrated them, restarted the node.
Now it is working but i had the issue appeared on another VM on another node (which was working fine for some time).
 
I am actually having the same issue running Virtual Environment 6.0-5.

Cannot connect to the console nor running the reset command after some time a VM is active. Only restarting the VM helps.

Any idea what can cause this issue? Did yet not find anything helpful in the logs either.
 
  • Like
Reactions: EvgenyKh
I have the same problem. VM on qcow2.

Console problem
VM 111 qmp command 'change' failed - got timeout
TASK ERROR: Failed to run vncproxy.


BackupProblem
INFO: starting new backup job: vzdump 111 --storage only2copy --node pm7 --remove 0 --mode snapshot --compress lzo
INFO: Starting Backup of VM 111 (qemu)
INFO: Backup started at 2019-08-20 09:26:50
INFO: status = running
INFO: update VM 111: -lock backup
INFO: VM Name: vda-dev-b24
INFO: include disk 'scsi0' 'local:111/vm-111-disk-1.qcow2' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/only2copy/dump/vzdump-qemu-111-2019_08_20-09_26_50.vma.lzo'
ERROR: unable to connect to VM 111 qmp socket - timeout after 31 retries
INFO: aborting backup job
ERROR: VM 111 qmp command 'backup-cancel' failed - got timeout
ERROR: Backup of VM 111 failed - unable to connect to VM 111 qmp socket - timeout after 31 retries
INFO: Failed at 2019-08-20 09:37:31
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Exactly same thing on 6.0-5:
some calls cause "qmp command 'query-machines' failed" and backups got timeout. Seems that restarting of VMs only helpful.
 
Can you guys please tell me on what kind of storage those VMs are running? (LVM, ZFS, qcow2 on NFS, ...)?

This might help us to narrow it down and reproduce the problem.
 
If i remember correctly most were LVM (iscsi over LVM), but i think there were also some NFS ones. Unfortunately i cannot say for sure for NFS because lately we migrated quite some storages.
But this issue is not only backup related. The machines cannot be managed as in no migrate or even vnc.
 
  • Like
Reactions: aaron
Unfortunately, we have had no luck reproducing this so far. Could you please post the following
Code:
cat /etc/pve/storage.cfg
For the relevant VM(s) with id X
Code:
qm config X
Code:
qm showcmd X --pretty
 
Unfortunately, we have had no luck reproducing this so far. Could you please post the following
Code:
cat /etc/pve/storage.cfg
For the relevant VM(s) with id X
Code:
qm config X
Code:
qm showcmd X --pretty

Unfortunately i have no known misbehaving VMs right now (i don't know how to issue bulk commands that might trigger it and they have no indication otherwise until you need a console, migration or backup). I restarted all of them and since this issue does not seem to have appear again.
 
Code:
1dir: local
        path /var/lib/vz
        content rootdir,images
        maxfiles 0
        shared 0

cifs: iso-images
        path /mnt/pve/iso-images
        server storage.supertux.lan
        share iso-images
        content vztmpl,snippets,iso
        domain supertux.lan
        username proxmox

cifs: sicherung
        path /mnt/pve/sicherung
        server srv-backup01.supertux.lan
        share sicherung
        content backup
        maxfiles 11
        username localbackup

cifs: archiv
        path /mnt/pve/archiv
        server srv-backup01.supertux.lan
        share archiv
        content backup
        maxfiles 5
        username localbackup

cifs: archiv-replica
        path /mnt/pve/archiv-replica
        server srv-backup01.supertux.lan
        share archiv-replica
        content backup
        maxfiles 1
        username localbackup

cifs: archiv02
        path /mnt/pve/archiv02
        server srv-backup02.supertux.lan
        share archiv02
        content backup
        maxfiles 1
        username localbackup

dir: ssd_vm
        path /mnt/pve/ssd_vm
        content rootdir,images
        is_mountpoint 1
        nodes srv-virtu02
        shared 0

Code:
agent: 1
boot: c
bootdisk: virtio0
cores: 12
cpu: kvm64,flags=+pcid
cpulimit: 4
description: VOIP-Server f%C3%BCr Telefonie
hotplug: disk,network,usb,memory
ide2: none,media=cdrom
memory: 4096
name: vsrv-voip.supertux.lan
net0: bridge=vmbr0,virtio=62:64:34:32:37:65,tag=13
numa: 1
onboot: 1
ostype: l26
protection: 1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=9cf6cdfa-8d21-48a9-b46a-c8eaf46ba6df
sockets: 2
usb0: spice
vga: qxl
virtio0: local:142/vm-142-disk-1.qcow2,discard=on,size=50G

Code:
/usr/bin/kvm \
  -id 142 \
  -name vsrv-voip.supertux.lan \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/142.qmp,server,nowait' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/142.pid \
  -daemonize \
  -smbios 'type=1,uuid=9cf6cdfa-8d21-48a9-b46a-c8eaf46ba6df' \
  -smp '24,sockets=2,cores=12,maxcpus=24' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc unix:/var/run/qemu-server/142.vnc,password \
  -cpu kvm64,+pcid,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce \
  -m 'size=1024,slots=255,maxmem=4194304M' \
  -object 'memory-backend-ram,id=ram-node0,size=512M' \
  -numa 'node,nodeid=0,cpus=0-11,memdev=ram-node0' \
  -object 'memory-backend-ram,id=ram-node1,size=512M' \
  -numa 'node,nodeid=1,cpus=12-23,memdev=ram-node1' \
  -object 'memory-backend-ram,id=mem-dimm0,size=512M' \
  -device 'pc-dimm,id=dimm0,memdev=mem-dimm0,node=0' \
  -object 'memory-backend-ram,id=mem-dimm1,size=512M' \
  -device 'pc-dimm,id=dimm1,memdev=mem-dimm1,node=1' \
  -object 'memory-backend-ram,id=mem-dimm2,size=512M' \
  -device 'pc-dimm,id=dimm2,memdev=mem-dimm2,node=0' \
  -object 'memory-backend-ram,id=mem-dimm3,size=512M' \
  -device 'pc-dimm,id=dimm3,memdev=mem-dimm3,node=1' \
  -object 'memory-backend-ram,id=mem-dimm4,size=512M' \
  -device 'pc-dimm,id=dimm4,memdev=mem-dimm4,node=0' \
  -object 'memory-backend-ram,id=mem-dimm5,size=512M' \
  -device 'pc-dimm,id=dimm5,memdev=mem-dimm5,node=1' \
  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
  -readconfig /usr/share/qemu-server/pve-usb.cfg \
  -chardev 'spicevmc,id=usbredirchardev0,name=usbredir' \
  -device 'usb-redir,chardev=usbredirchardev0,id=usbredirdev0,bus=ehci.0' \
  -chardev 'socket,id=serial0,path=/var/run/qemu-server/142.serial0,server,nowait' \
  -device 'isa-serial,chardev=serial0' \
  -device 'qxl-vga,id=vga,bus=pci.0,addr=0x2' \
  -chardev 'socket,path=/var/run/qemu-server/142.qga,server,nowait,id=qga0' \
  -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
  -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
  -spice 'tls-port=61004,addr=127.0.0.1,tls-ciphers=HIGH,seamless-migration=on' \
  -device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' \
  -chardev 'spicevmc,id=vdagent,name=vdagent' \
  -device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:d389d4b77d5' \
  -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2' \
  -drive 'file=/var/lib/vz/images/142/vm-142-disk-1.qcow2,if=none,id=drive-virtio0,discard=on,format=qcow2,cache=none,aio=native,detect-zeroes=unmap' \
  -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap142i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
  -device 'virtio-net-pci,mac=62:64:34:32:37:65,netdev=net0,bus=pci.0,addr=0x12,id=net0' \
  -machine 'type=pc'
 
Code:
root@pm7:~# cat /etc/pve/storage.cfg
dir: ssd480
    path /mntssd
    content images
    nodes pm7
    shared 0

dir: local
    path /var/lib/vz
    content backup,vztmpl,rootdir,snippets,images,iso
    maxfiles 5
    shared 0

dir: only2copy
    path /var/lib/vz/only2copy
    content backup
    maxfiles 2
    shared 0

Code:
root@pm7:~# qm config 100
bootdisk: scsi0
cores: 1
ide2: none,media=cdrom
memory: 24000
name: deb-teleg-smsbanan
net0: e1000=AE:08:A0:43:25:CA,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: ssd480:100/vm-100-disk-0.qcow2,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=05b40800-c959-478a-a4b8-849ba8d7cb02
sockets: 2

Code:
root@pm7:~# qm showcmd 100 --pretty
/usr/bin/kvm \
  -id 100 \
  -name deb-teleg-smsbanan \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/100.pid \
  -daemonize \
  -smbios 'type=1,uuid=05b40800-c959-478a-a4b8-849ba8d7cb02' \
  -smp '2,sockets=2,cores=1,maxcpus=2' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc unix:/var/run/qemu-server/100.vnc,password \
  -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce \
  -m 24000 \
  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:48073f11533' \
  -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
  -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
  -drive 'file=/mntssd/images/100/vm-100-disk-0.qcow2,if=none,id=drive-scsi0,discard=on,format=qcow2,cache=none,aio=native,detect-zeroes=unmap' \
  -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' \
  -device 'e1000,mac=AE:08:A0:43:25:CA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
  -machine 'type=pc'
 
I assume the VM goes into this state after trying to use the console in the admin panel.
Try switching console fast with several VMs
 
I assume the VM goes into this state after trying to use the console in the admin panel.
Try switching console fast with several VMs
It doesn't work, i tried it. When the VM is in this state, console, migration and backup are all unusable. Sometimes i found the machines when the backups finished and i saw the failure emails. So it is not triggered by the vnc console. And nothing seems to work but stopping the VM and starting it again.
 
I currently have the exact same behavor. 6 Node cluster, all running on Proxmox based Ceph storage.
It started this morning and I feel like I have a vauge idea of what caused it.

I use Infiniband for all Ceph + Proxmox traffic (Which has been working fine). I decided to use Infiniband for the nas (Freenas via NFS) and it was acting a bit hit and miss. Some of my hosts couldn't reliably talk to the nas. So last night, after most of my backups failed, I changed back to a standard 1gbe copper link. Now, one host has 11 machines that are affected by this (out of about 20 machines on the host).

140bumblebeeFAILED00:10:06got timeout
1410095-printsys-2OK00:06:37
2.95GB​
/mnt/pve/backups/dump/vzdump-qemu-141-2019_09_10-05_03_38.vma.lzo
142nagios-slaveFAILED00:10:05got timeout
145proxmox-mailFAILED00:10:06got timeout
1470104-haproxy4FAILED00:10:05got timeout
151saltmasterFAILED00:10:09got timeout


I cannot migrate to another host and I cannot use the VNC console. If I go onto the host it's self and try a migration, it clearly tries,

root@hydra1:~# qm migrate 138 hydra3 --online
Requesting HA migration for VM 138 to node hydra3

but then the HA manager fails

task started by HA resource agent
2019-09-10 08:50:21 ERROR: migration aborted (duration 00:00:03): VM 138 qmp command 'query-machines' failed - got timeout
TASK ERROR: migration aborted


Unsure if it's related, but dmesg shows a lot of the interfaces got disabled, then enabled, then in blocking state at roughly the same time:

[Tue Sep 10 03:06:04 2019] fwbr118i0: port 2(tap118i0) entered blocking state
[Tue Sep 10 03:06:04 2019] fwbr118i0: port 2(tap118i0) entered disabled state
[Tue Sep 10 03:06:04 2019] fwbr118i0: port 2(tap118i0) entered blocking state
[Tue Sep 10 03:06:04 2019] fwbr118i0: port 2(tap118i0) entered forwarding state
[Tue Sep 10 03:11:10 2019] fwbr118i0: port 2(tap118i0) entered disabled state
[Tue Sep 10 03:11:10 2019] fwbr118i0: port 1(fwln118i0) entered disabled state
[Tue Sep 10 03:11:10 2019] vmbr0: port 19(fwpr118p0) entered disabled state
[Tue Sep 10 03:11:10 2019] device fwln118i0 left promiscuous mode
[Tue Sep 10 03:11:10 2019] fwbr118i0: port 1(fwln118i0) entered disabled state
[Tue Sep 10 03:11:10 2019] device fwpr118p0 left promiscuous mode
[Tue Sep 10 03:11:10 2019] vmbr0: port 19(fwpr118p0) entered disabled state



Also seeing this in the syslog

Sep 10 00:00:03 hydra1 systemd-udevd[38337]: Using default interface naming scheme 'v240'.
Sep 10 00:00:03 hydra1 systemd-udevd[38337]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 10 00:00:03 hydra1 systemd-udevd[38337]: Could not generate persistent MAC address for tap100i0: No such file or directory
Sep 10 00:00:04 hydra1 pmxcfs[1433]: [dcdb] notice: data verification successful
Sep 10 00:00:04 hydra1 kernel: [472579.807640] device tap100i0 entered promiscuous mode
Sep 10 00:00:04 hydra1 systemd-udevd[38342]: Using default interface naming scheme 'v240'.
Sep 10 00:00:04 hydra1 systemd-udevd[38342]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 10 00:00:04 hydra1 systemd-udevd[38342]: Could not generate persistent MAC address for fwbr100i0: No such file or directory
Sep 10 00:00:04 hydra1 systemd-udevd[38337]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 10 00:00:04 hydra1 systemd-udevd[38337]: Could not generate persistent MAC address for fwpr100p0: No such file or directory
Sep 10 00:00:04 hydra1 systemd-udevd[38340]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 10 00:00:04 hydra1 systemd-udevd[38340]: Using default interface naming scheme 'v240'.
Sep 10 00:00:04 hydra1 systemd-udevd[38340]: Could not generate persistent MAC address for fwln100i0: No such file or directory
Sep 10 00:00:04 hydra1 kernel: [472579.849073] fwbr100i0: port 1(fwln100i0) entered blocking state
Sep 10 00:00:04 hydra1 kernel: [472579.849077] fwbr100i0: port 1(fwln100i0) entered disabled state
Sep 10 00:00:04 hydra1 kernel: [472579.849232] device fwln100i0 entered promiscuous mode
Sep 10 00:00:04 hydra1 kernel: [472579.849324] fwbr100i0: port 1(fwln100i0) entered blocking state
Sep 10 00:00:04 hydra1 kernel: [472579.849326] fwbr100i0: port 1(fwln100i0) entered forwarding state
Sep 10 00:00:04 hydra1 kernel: [472579.854674] vmbr0: port 19(fwpr100p0) entered blocking state
Sep 10 00:00:04 hydra1 kernel: [472579.854677] vmbr0: port 19(fwpr100p0) entered disabled state
Sep 10 00:00:04 hydra1 kernel: [472579.854820] device fwpr100p0 entered promiscuous mode
Sep 10 00:00:04 hydra1 kernel: [472579.854881] vmbr0: port 19(fwpr100p0) entered blocking state
Sep 10 00:00:04 hydra1 kernel: [472579.854883] vmbr0: port 19(fwpr100p0) entered forwarding state
Sep 10 00:00:04 hydra1 kernel: [472579.859972] fwbr100i0: port 2(tap100i0) entered blocking state
Sep 10 00:00:04 hydra1 kernel: [472579.859974] fwbr100i0: port 2(tap100i0) entered disabled state
Sep 10 00:00:04 hydra1 kernel: [472579.860189] fwbr100i0: port 2(tap100i0) entered blocking state
Sep 10 00:00:04 hydra1 kernel: [472579.860191] fwbr100i0: port 2(tap100i0) entered forwarding state

EDIT:
This is a config of one of the affected machines:
/usr/bin/kvm -id 142 -name nagios-slave -chardev 'socket,id=qmp,path=/var/run/qemu-server/142.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/142.pid -daemonize -smbios 'type=1,uuid=d24a50e0-9b4e-478d-ba86-4974d37aa272' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/142.vnc,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 1024 -k en-gb -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2022e4fa592' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=rbd:hdd-pool/vm-142-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/hdd-pool.keyring,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap142i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=D2:03:96:17:77:E9,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc'

EDIT 2:
Can't monitor the VM from the host either:


root@hydra1:/var/run/qemu-server# qm monitor 142
Entering Qemu Monitor for VM 142 - type 'help' for help
qm> help
ERROR: VM 142 qmp command 'human-monitor-command' failed - got timeout
 
Last edited:
Good Day

I am having the same issue, some VMs backup and can be managed via noVNC, others gets timeout when backing up, cannot manage via noVNC
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!