Hey Everyone. First I hope whoever sees this has a good day
We're having an issue with our PVE cluster (v8.1.2) and backing up LXC containers that are run on a ceph pool
It seems that sporadically and without cause, the snapshot stage of our backup job just fails, leaving the container locked and the snapshot created but empty.
Backup Job LOG
In order to recover the container from the locked state, we have to do the following:
Once the container is running and the failed snapshot cleaned up, it seems to be fine for a couple days and then happens again. This occurs randomly across any containers backed up, without obvious cause.
Any advice to rectify this situation would be greatly appreciated, as until we resolve this, we can't have automatic backups of our core services.
Versions:
We're having an issue with our PVE cluster (v8.1.2) and backing up LXC containers that are run on a ceph pool
It seems that sporadically and without cause, the snapshot stage of our backup job just fails, leaving the container locked and the snapshot created but empty.
Backup Job LOG
Code:
INFO: starting new backup job: vzdump 122 --mode snapshot --notes-template '{{guestname}}' --prune-backups 'keep-last=250' --storage <PBS-REDACTED> --mailto <REDACTED> --quiet 1 --mailnotification failure
INFO: Starting Backup of VM 122 (lxc)
INFO: Backup started at 2024-02-07 12:00:05
INFO: status = running
INFO: CT Name: <REDACTED>
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
In order to recover the container from the locked state, we have to do the following:
Code:
ps ax | grep vzdump
1046797 ? Ds 0:00 task UPID:<REDACTED>:000FF90D:1DFDE8BC:65C370C5:vzdump:122:root@pam:
kill -9 1046797
rm /run/lock/lxc/pve-config-122.lock
touch /run/lock/lxc/pve-config-122.lock
pct unlock 122
pct stop 122
pct delsnapshot 122 --force
pct start 122
Once the container is running and the failed snapshot cleaned up, it seems to be fine for a couple days and then happens again. This occurs randomly across any containers backed up, without obvious cause.
Any advice to rectify this situation would be greatly appreciated, as until we resolve this, we can't have automatic backups of our core services.
Versions:
PVE (Proxmox Virtualization Server) | 8.1.3 |
PBS (Proxmox Backup Server) | 3.1-2 |
Ceph | 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable) |
Last edited: