Proxmox Backup failure "VM not running"

Eduardino · Feb 3, 2025

Good day,
On my Proxmox server, around 100 virtual machines are running and being backup. However, only 2 of them consistently encounter backup issues. During the backup process, the task fails with the error: "ERROR: VM 356 not running", causing the entire backup process to break.
The VMs are in raw format (for ceph analog), and using fs-freeze does not resolve the problem.
What can I do to resolve this?

Code:

INFO: starting new backup job: vzdump 356 --storage storage-bkp--notes-template '{{guestname}}' --protected 1 --mode snapshot --compress zstd --notification-mode auto --node node008 --remove 0
INFO: Starting Backup of VM 356 (qemu)
INFO: Backup started at 2025-02-02 20:02:59
INFO: status = running
INFO: VM Name: DB
INFO: include disk 'scsi0' 'vitastor:vm-356-disk-0' 4500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/storage-bkp/dump/vzdump-qemu-356-2025_02_02-20_02_59.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '75524ba9-f141-4f63-ac6f-02c7af5ba461'
INFO: resuming VM again
INFO:   0% (714.1 MiB of 4.4 TiB) in 3s, read: 238.0 MiB/s, write: 230.3 MiB/s
ERROR: VM 356 not running
INFO: aborting backup job
ERROR: VM 356 not running
INFO: resuming VM again
ERROR: Backup of VM 356 failed - VM 356 not running
INFO: Failed at 2025-02-02 20:04:17
INFO: Backup job finished with errors
TASK ERROR: job errors

Moayad · Feb 3, 2025

Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?

Eduardino · Feb 3, 2025

Moayad said:
Hi,

Have you checked the syslog during the backup time? Do you see any high I/O during the backup?

Abnormal thing during the backup time:

Code:

Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected
Feb 02 12:22:46 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:46 node004 kernel: tap411i0 (unregistering): left allmulticast mode
Feb 02 12:22:46 node004 kernel: fwbr411i0: port 1(tap411i0) entered disabled state
Feb 02 12:22:47 node004 qmeventd[1776836]: Starting cleanup for 411
Feb 02 12:22:47 node004 ovs-vsctl[1776840]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411o0
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left allmulticast mode
Feb 02 12:22:47 node004 kernel: fwln411o0 (unregistering): left promiscuous mode
Feb 02 12:22:47 node004 kernel: fwbr411i0: port 2(fwln411o0) entered disabled state
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln411i0
Feb 02 12:22:47 node004 ovs-vsctl[1776844]: ovs|00002|db_ctl_base|ERR|no port named fwln411i0
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pvescheduler[1761225]: VM 411 qmp command failed - VM 411 not running
Feb 02 12:22:47 node004 pveproxy[725488]: worker exit
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap411i0
Feb 02 12:22:48 node004 ovs-vsctl[1776848]: ovs|00002|db_ctl_base|ERR|no port named tap411i0
Feb 02 12:22:48 node004 qmeventd[1776836]: Finished cleanup for 411
Feb 02 12:22:48 node004 pvedaemon[1770048]: worker exit
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Deactivated successfully.
Feb 02 12:23:12 node004 systemd[1]: 411.scope: Consumed 3w 5h 4min 25.524s CPU time.

Eduardino · Feb 4, 2025

Any idea?

fiona · Feb 4, 2025

Hi,

Eduardino said:
Abnormal thing during the backup time:

Code:

Feb 02 12:22:46 node004 QEMU[717541]: malloc(): unaligned fastbin chunk detected

Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.

Eduardino · Feb 4, 2025

fiona said:
Hi,

Sounds like it might be a memory corruption. I'd run a memtest on the host during the next maintenance window.

Please share the VM configuration qm config <ID> and the output of pveversion -v.

To further debug the issue please run apt install pve-qemu-kvm-dbgsym gdb systemd-coredump libproxmox-backup-qemu0-dbgsym. The next time a crash happens afterwards, you can run coredumpctl -1 gdb and then in the GDB prompt thread apply all backtrace. This will obtain a backtrace of the crash.

Hi,
tried different cluster with different hardware, same problem
VM config:

Code:

agent: 1
balloon: 0
boot: order=scsi0;net0
cores: 52
memory: 262144
meta: creation-qemu=7.2.0,ctime=1711354461
name: DB
net0: virtio=9A:AF:84:CB:57:D4,bridge=vmbr0,firewall=1,tag=3515
numa: 0
ostype: l26
scsi0: vitastor:vm-411-disk-0,format=raw,iothread=1,size=2024G
scsihw: virtio-scsi-single
smbios1: uuid=0d8d80b7-7839-49f1-8e74-8ceacde03a3e
sockets: 2
vmgenid: 3e2653e9-9bdc-4732-bc4c-d6618ff703a2

pvevirsion:

Code:

proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-7
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-4+vitastor1
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1

fiona · Feb 4, 2025

Eduardino said:
Code:

pve-qemu-kvm: 9.0.2-4+vitastor1

Where did you get that QEMU package from? I don't think it's from us ~~or was it given to you by some staff member for other debugging purposes?~~ EDIT: no, definitely not from us..

tom · Feb 4, 2025

Looks like you are running a fork and not a Proxmox VE system. I wonder why you think you can get Proxmox support for third party packages.

=> contact the creator of the "custom" pve-qemu-kvm package

I highly recommend to use only official Proxmox packages and not modified or "improved" third party packages.

Eduardino · Feb 10, 2025

tom said:
Looks like you are running a fork and not a Proxmox VE system. I wonder why you think you can get Proxmox support for third party packages.

=> contact the creator of the "custom" pve-qemu-kvm package

I highly recommend to use only official Proxmox packages and not modified or "improved" third party packages.

Yes, its a little fork for compability with vitastor(ceph), no other "improvements"
Here core dump after backup failure:

fiona · Feb 10, 2025

It's very likely an issue with those improvements if it happens in relation to VMs using that storage.

Eduardino · Feb 10, 2025

fiona said:
It's very likely an issue with those improvements if it happens in relation to VMs using that storage.

Got same error on this, there is no vitastor at all:

Code:

proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph: 19.2.0-pve2
ceph-fuse: 19.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1

fiona · Feb 10, 2025

How did you start the VM then if it's using that storage?

fiona · Feb 10, 2025

If it's a different VM, please share again all the relevant information. And make sure the VM was actually started while the correct QEMU package was installed, not the vitastor one. A VM will continue using the binary it was started with until it is live-migrated to a different node or shut down.

Eduardino · Feb 10, 2025

fiona said:
If it's a different VM, please share again all the relevant information. And make sure the VM was actually started while the correct QEMU package was installed, not the vitastor one. A VM will continue using the binary it was started with until it is live-migrated to a different node or shut down.

Thats VM was here before ~~live-migrated~~ migrate throught backup to vitastor
What about coredump? There is nothing?

fiona · Feb 10, 2025

The coredump is for a VM running with vitastor storage. Please report the issue from where you actually got the problematic package from, not here.

Search

Search

Proxmox Backup failure "VM not running"

Eduardino

New Member

Moayad

Proxmox Staff Member

Eduardino

New Member

Eduardino

New Member

fiona

Proxmox Staff Member

Eduardino

New Member

fiona

Proxmox Staff Member

tom

Proxmox Staff Member

Eduardino

New Member

Attachments

fiona

Proxmox Staff Member

Eduardino

New Member

fiona

Proxmox Staff Member

fiona

Proxmox Staff Member

Eduardino

New Member

fiona

Proxmox Staff Member

We value your privacy