Proxmox 9.1.5 weird backup problem leaving VM filesystem damaged

alh

Active Member
Jul 7, 2021
53
8
28
47
setup:
we have a proxmox ve with 1 vm (debian bookworm)
on sunday we run a weekly backup (type stop) to a proxmox backup server

we experience the following issue:
- vm is shutdown
- backup starts BUT vm is also powered on immediately again
- vm seems to start fine but big parts of the device mappings/mounts fail
- vm becomes unresponsive
- backup eventually fails with i/o error
- if vm is powered down via pve and restarted the filesystem (ext4 on lvm) of the vm is damaged and system boots in initram

if i recreate the steps manually:
- shutdown machine
- backup to pbs
- start machine
all works as expected, i have a backup and vm starts up fine without fs-corruption.

the logs of the relevant period:

Code:
Feb 15 23:00:01 pve01 pvescheduler[560585]: <root@pam> starting task UPID:pve01:00088DCA:05412CB2:699241E1:vzdump:101:root@pam:
Feb 15 23:00:01 pve01 pvescheduler[560586]: INFO: starting new backup job: vzdump --mode stop --fleecing 0 --all 1 --notes-template '{{guestname}}' --storage synology --quiet 1 --notification-mode notification-system
Feb 15 23:00:01 pve01 pvescheduler[560586]: INFO: Starting Backup of VM 101 (qemu)
Feb 15 23:00:02 pve01 qm[560589]: <root@pam> starting task UPID:pve01:00088DD4:05412CE1:699241E2:qmshutdown:101:root@pam:
Feb 15 23:00:02 pve01 qm[560596]: shutdown VM 101: UPID:pve01:00088DD4:05412CE1:699241E2:qmshutdown:101:root@pam:
Feb 15 23:00:22 pve01 qmeventd[1157507]: read: Connection reset by peer
Feb 15 23:00:22 pve01 qmeventd[560762]: Starting cleanup for 101
Feb 15 23:00:22 pve01 qmeventd[560762]: trying to acquire lock...
Feb 15 23:00:23 pve01 systemd[1]: 101.scope: Deactivated successfully.
Feb 15 23:00:23 pve01 systemd[1]: 101.scope: Consumed 1d 7h 51min 44.827s CPU time, 22G memory peak.
Feb 15 23:00:23 pve01 qmeventd[560762]:  OK
Feb 15 23:00:23 pve01 qm[560589]: <root@pam> end task UPID:pve01:00088DD4:05412CE1:699241E2:qmshutdown:101:root@pam: OK
Feb 15 23:00:23 pve01 qmeventd[560762]: Finished cleanup for 101
Feb 15 23:00:23 pve01 systemd[1]: Started 101.scope.
Feb 15 23:00:24 pve01 pvescheduler[560586]: VM 101 started with PID 560775.
Feb 15 23:17:31 pve01 pvescheduler[560586]: ERROR: Backup of VM 101 failed - backup write data failed: command error: protocol canceled
Feb 15 23:17:31 pve01 pvescheduler[560586]: INFO: Backup job finished with errors

i'm greatful for your input.
 
just to add: if i run the backup manually via datacenter > host > vm > backup with the same parameters it works just fine as well (the vm also restarted immediately again).

all subsequent runs after the first manual backup also succeed now. after consistently failing for several weeks now, creating the first backup manually seems to have "solved" the problem. i will enable backup now again and check what happens this weekend.