Proxmox Backup failure "VM not running"

Eduardino

New Member
Feb 3, 2025
4
0
1
Good day,
On my Proxmox server, around 100 virtual machines are running and being backup. However, only 2 of them consistently encounter backup issues. During the backup process, the task fails with the error: "ERROR: VM 356 not running", causing the entire backup process to break.
The VMs are in raw format (for ceph analog), and using fs-freeze does not resolve the problem.
What can I do to resolve this?

Code:
INFO: starting new backup job: vzdump 356 --storage storage-bkp--notes-template '{{guestname}}' --protected 1 --mode snapshot --compress zstd --notification-mode auto --node node008 --remove 0
INFO: Starting Backup of VM 356 (qemu)
INFO: Backup started at 2025-02-02 20:02:59
INFO: status = running
INFO: VM Name: DB
INFO: include disk 'scsi0' 'vitastor:vm-356-disk-0' 4500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/storage-bkp/dump/vzdump-qemu-356-2025_02_02-20_02_59.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '75524ba9-f141-4f63-ac6f-02c7af5ba461'
INFO: resuming VM again
INFO:   0% (714.1 MiB of 4.4 TiB) in 3s, read: 238.0 MiB/s, write: 230.3 MiB/s
ERROR: VM 356 not running
INFO: aborting backup job
ERROR: VM 356 not running
INFO: resuming VM again
ERROR: Backup of VM 356 failed - VM 356 not running
INFO: Failed at 2025-02-02 20:04:17
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?
Abnormal thing during the backup time:
Code:
Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected
Feb 02 12:22:46 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:46 node004 kernel: tap411i0 (unregistering): left allmulticast mode
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:47 node004 qmeventd[1776836]: Starting cleanup for 411
Feb 02 12:22:47 node004 ovs-vsctl[1776840]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411o0
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left allmulticast mode
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left promiscuous mode
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411i0
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00002|db_ctl_base|ERR|no port named fwln411i0
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pveproxy[725488]: worker exit
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap411i0
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00002|db_ctl_base|ERR|no port named tap411i0
Feb 02 12:22:48 node004 qmeventd[1776836]: Finished cleanup for 411
Feb 02 12:22:48 node004 pvedaemon[1770048]: worker exit
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Deactivated successfully.
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Consumed 3w 5h 4min 25.524s CPU time.
 
Hi,
Abnormal thing during the backup time:
Code:
Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected
Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.
 
Hi,

Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.
Hi,
tried different cluster with different hardware, same problem
VM config:
Code:
agent: 1
balloon: 0
boot: order=scsi0;net0
cores: 52
memory: 262144
meta: creation-qemu=7.2.0,ctime=1711354461
name: DB
net0: virtio=9A:AF:84:CB:57:D4,bridge=vmbr0,firewall=1,tag=3515
numa: 0
ostype: l26
scsi0: vitastor:vm-411-disk-0,format=raw,iothread=1,size=2024G
scsihw: virtio-scsi-single
smbios1: uuid=0d8d80b7-7839-49f1-8e74-8ceacde03a3e
sockets: 2
vmgenid: 3e2653e9-9bdc-4732-bc4c-d6618ff703a2

pvevirsion:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-7
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-4+vitastor1
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
Code:
pve-qemu-kvm: 9.0.2-4+vitastor1
Where did you get that QEMU package from? I don't think it's from us or was it given to you by some staff member for other debugging purposes? EDIT: no, definitely not from us..
 
Last edited:
Looks like you are running a fork and not a Proxmox VE system. I wonder why you think you can get Proxmox support for third party packages.

=> contact the creator of the "custom" pve-qemu-kvm package

I highly recommend to use only official Proxmox packages and not modified or "improved" third party packages.
 
  • Like
Reactions: fiona