Proxmox Backup failure "VM not running"

Eduardino

New Member
Feb 3, 2025
7
0
1
Good day,
On my Proxmox server, around 100 virtual machines are running and being backup. However, only 2 of them consistently encounter backup issues. During the backup process, the task fails with the error: "ERROR: VM 356 not running", causing the entire backup process to break.
The VMs are in raw format (for ceph analog), and using fs-freeze does not resolve the problem.
What can I do to resolve this?

Code:
INFO: starting new backup job: vzdump 356 --storage storage-bkp--notes-template '{{guestname}}' --protected 1 --mode snapshot --compress zstd --notification-mode auto --node node008 --remove 0
INFO: Starting Backup of VM 356 (qemu)
INFO: Backup started at 2025-02-02 20:02:59
INFO: status = running
INFO: VM Name: DB
INFO: include disk 'scsi0' 'vitastor:vm-356-disk-0' 4500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/storage-bkp/dump/vzdump-qemu-356-2025_02_02-20_02_59.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '75524ba9-f141-4f63-ac6f-02c7af5ba461'
INFO: resuming VM again
INFO:   0% (714.1 MiB of 4.4 TiB) in 3s, read: 238.0 MiB/s, write: 230.3 MiB/s
ERROR: VM 356 not running
INFO: aborting backup job
ERROR: VM 356 not running
INFO: resuming VM again
ERROR: Backup of VM 356 failed - VM 356 not running
INFO: Failed at 2025-02-02 20:04:17
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?
 
Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?
Abnormal thing during the backup time:
Code:
Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected
Feb 02 12:22:46 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:46 node004 kernel: tap411i0 (unregistering): left allmulticast mode
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:47 node004 qmeventd[1776836]: Starting cleanup for 411
Feb 02 12:22:47 node004 ovs-vsctl[1776840]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411o0
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left allmulticast mode
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left promiscuous mode
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411i0
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00002|db_ctl_base|ERR|no port named fwln411i0
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pveproxy[725488]: worker exit
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap411i0
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00002|db_ctl_base|ERR|no port named tap411i0
Feb 02 12:22:48 node004 qmeventd[1776836]: Finished cleanup for 411
Feb 02 12:22:48 node004 pvedaemon[1770048]: worker exit
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Deactivated successfully.
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Consumed 3w 5h 4min 25.524s CPU time.
 
Hi,
Abnormal thing during the backup time:
Code:
Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected
Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.
 
Hi,

Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.
Hi,
tried different cluster with different hardware, same problem
VM config:
Code:
agent: 1
balloon: 0
boot: order=scsi0;net0
cores: 52
memory: 262144
meta: creation-qemu=7.2.0,ctime=1711354461
name: DB
net0: virtio=9A:AF:84:CB:57:D4,bridge=vmbr0,firewall=1,tag=3515
numa: 0
ostype: l26
scsi0: vitastor:vm-411-disk-0,format=raw,iothread=1,size=2024G
scsihw: virtio-scsi-single
smbios1: uuid=0d8d80b7-7839-49f1-8e74-8ceacde03a3e
sockets: 2
vmgenid: 3e2653e9-9bdc-4732-bc4c-d6618ff703a2

pvevirsion:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-7
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-4+vitastor1
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
Code:
pve-qemu-kvm: 9.0.2-4+vitastor1
Where did you get that QEMU package from? I don't think it's from us or was it given to you by some staff member for other debugging purposes? EDIT: no, definitely not from us..
 
Last edited:
Looks like you are running a fork and not a Proxmox VE system. I wonder why you think you can get Proxmox support for third party packages.

=> contact the creator of the "custom" pve-qemu-kvm package

I highly recommend to use only official Proxmox packages and not modified or "improved" third party packages.
 
  • Like
Reactions: fiona
Looks like you are running a fork and not a Proxmox VE system. I wonder why you think you can get Proxmox support for third party packages.

=> contact the creator of the "custom" pve-qemu-kvm package

I highly recommend to use only official Proxmox packages and not modified or "improved" third party packages.
Yes, its a little fork for compability with vitastor(ceph), no other "improvements"
Here core dump after backup failure:
 

Attachments

It's very likely an issue with those improvements if it happens in relation to VMs using that storage.
 
It's very likely an issue with those improvements if it happens in relation to VMs using that storage.
Got same error on this, there is no vitastor at all:
Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph: 19.2.0-pve2
ceph-fuse: 19.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
 
Last edited:
How did you start the VM then if it's using that storage?
 
If it's a different VM, please share again all the relevant information. And make sure the VM was actually started while the correct QEMU package was installed, not the vitastor one. A VM will continue using the binary it was started with until it is live-migrated to a different node or shut down.
 
If it's a different VM, please share again all the relevant information. And make sure the VM was actually started while the correct QEMU package was installed, not the vitastor one. A VM will continue using the binary it was started with until it is live-migrated to a different node or shut down.
Thats VM was here before live-migrated migrate throught backup to vitastor
What about coredump? There is nothing?
 
Last edited:
The coredump is for a VM running with vitastor storage. Please report the issue from where you actually got the problematic package from, not here.