Proxmox Backup failure "VM not running"

Eduardino · Monday at 08:33

Good day,
On my Proxmox server, around 100 virtual machines are running and being backup. However, only 2 of them consistently encounter backup issues. During the backup process, the task fails with the error: "ERROR: VM 356 not running", causing the entire backup process to break.
The VMs are in raw format (for ceph analog), and using fs-freeze does not resolve the problem.
What can I do to resolve this?

Code:

INFO: starting new backup job: vzdump 356 --storage storage-bkp--notes-template '{{guestname}}' --protected 1 --mode snapshot --compress zstd --notification-mode auto --node node008 --remove 0
INFO: Starting Backup of VM 356 (qemu)
INFO: Backup started at 2025-02-02 20:02:59
INFO: status = running
INFO: VM Name: DB
INFO: include disk 'scsi0' 'vitastor:vm-356-disk-0' 4500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/storage-bkp/dump/vzdump-qemu-356-2025_02_02-20_02_59.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '75524ba9-f141-4f63-ac6f-02c7af5ba461'
INFO: resuming VM again
INFO:   0% (714.1 MiB of 4.4 TiB) in 3s, read: 238.0 MiB/s, write: 230.3 MiB/s
ERROR: VM 356 not running
INFO: aborting backup job
ERROR: VM 356 not running
INFO: resuming VM again
ERROR: Backup of VM 356 failed - VM 356 not running
INFO: Failed at 2025-02-02 20:04:17
INFO: Backup job finished with errors
TASK ERROR: job errors

Moayad · Monday at 10:39

Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?

Eduardino · Monday at 12:19

Moayad said:
Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?

Abnormal thing during the backup time:

Code:

Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected
Feb 02 12:22:46 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:46 node004 kernel: tap411i0 (unregistering): left allmulticast mode
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:47 node004 qmeventd[1776836]: Starting cleanup for 411
Feb 02 12:22:47 node004 ovs-vsctl[1776840]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411o0
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left allmulticast mode
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left promiscuous mode
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411i0
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00002|db_ctl_base|ERR|no port named fwln411i0
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pveproxy[725488]: worker exit
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap411i0
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00002|db_ctl_base|ERR|no port named tap411i0
Feb 02 12:22:48 node004 qmeventd[1776836]: Finished cleanup for 411
Feb 02 12:22:48 node004 pvedaemon[1770048]: worker exit
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Deactivated successfully.
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Consumed 3w 5h 4min 25.524s CPU time.

Eduardino · 2025-02-04T09:04:36+0100

Any idea?

fiona · 2025-02-04T12:36:39+0100

Hi,

Eduardino said:
Abnormal thing during the backup time:

Code:

Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected

Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.

Eduardino · 2025-02-04T16:27:40+0100

fiona said:
Hi,

Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.

Hi,
tried different cluster with different hardware, same problem
VM config:

Code:

agent: 1
balloon: 0
boot: order=scsi0;net0
cores: 52
memory: 262144
meta: creation-qemu=7.2.0,ctime=1711354461
name: DB
net0: virtio=9A:AF:84:CB:57:D4,bridge=vmbr0,firewall=1,tag=3515
numa: 0
ostype: l26
scsi0: vitastor:vm-411-disk-0,format=raw,iothread=1,size=2024G
scsihw: virtio-scsi-single
smbios1: uuid=0d8d80b7-7839-49f1-8e74-8ceacde03a3e
sockets: 2
vmgenid: 3e2653e9-9bdc-4732-bc4c-d6618ff703a2

pvevirsion:

Code:

proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-7
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-4+vitastor1
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1

fiona · 2025-02-04T18:06:30+0100

Eduardino said:
Code:

pve-qemu-kvm: 9.0.2-4+vitastor1

Where did you get that QEMU package from? I don't think it's from us ~~or was it given to you by some staff member for other debugging purposes?~~ EDIT: no, definitely not from us..

tom · 2025-02-04T19:51:12+0100

Looks like you are running a fork and not a Proxmox VE system. I wonder why you think you can get Proxmox support for third party packages.

=> contact the creator of the "custom" pve-qemu-kvm package

I highly recommend to use only official Proxmox packages and not modified or "improved" third party packages.

Search

Search

Proxmox Backup failure "VM not running"

Eduardino

New Member

Moayad

Proxmox Staff Member

Eduardino

New Member

Eduardino

New Member

fiona

Proxmox Staff Member

Eduardino

New Member

fiona

Proxmox Staff Member

tom

Proxmox Staff Member

We value your privacy