LXC snapshot backup fails

gngui

New Member
Oct 11, 2023
6
0
1
Hi,
I have experienced several backup job failures that happen when a LXC backup job begins. As the logs below show, the snapshot backup jobs proceed OK for VM's until the job reaches my Nexcloud instance. The backup job stops logging and;

1. Proxmox locks the affected LXC
2. The backup job stops and proxmox seems to hung
3. All othe VMs and LXCs look like they are shutdown but are actually running

To recover I have to;

1. Physically restart the server
2. Unlock the affect LXC as it does not start
3. Delete a snapshot on the affected LXC
3.1. Deleting the snapshot from the GUI gives an error "zfs error: could not find any snapshots to destroy; check snapshot names." then Proxmox locks the LXC.
3.2. I am forced to unlock the LXC again and run the command pct delsnapshot 107 vzdump -force to delete the problematic snapshot
4. Proxmox backup will run OK for sometime until this whole process repeats itself.


INFO: starting new backup job: vzdump --mailnotification always --prune-backups 'keep-last=4' --storage local --exclude 102,103,104 --all 1 --quiet 1 --compress zstd --mode snapshot --notes-template '{{guestname}}'
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2023-10-10 02:00:03
INFO: status = running
INFO: VM Name: OPNsense
INFO: include disk 'scsi0' 'local-zfs:vm-100-disk-0' 50G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-100-2023_10_10-02_00_03.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '6cf1f205-37c7-46d8-b483-9f7491e72d0a'
INFO: resuming VM again
INFO: 5% (2.6 GiB of 50.0 GiB) in 3s, read: 896.6 MiB/s, write: 403.0 MiB/s
INFO: 7% (3.6 GiB of 50.0 GiB) in 6s, read: 335.9 MiB/s, write: 319.9 MiB/s
INFO: 11% (6.0 GiB of 50.0 GiB) in 9s, read: 801.1 MiB/s, write: 381.4 MiB/s
INFO: 19% (9.6 GiB of 50.0 GiB) in 12s, read: 1.2 GiB/s, write: 322.6 MiB/s
INFO: 30% (15.2 GiB of 50.0 GiB) in 15s, read: 1.9 GiB/s, write: 192.3 MiB/s
INFO: 38% (19.2 GiB of 50.0 GiB) in 18s, read: 1.3 GiB/s, write: 161.9 MiB/s
INFO: 40% (20.1 GiB of 50.0 GiB) in 21s, read: 335.9 MiB/s, write: 283.0 MiB/s
INFO: 43% (22.0 GiB of 50.0 GiB) in 24s, read: 617.5 MiB/s, write: 316.9 MiB/s
INFO: 49% (24.9 GiB of 50.0 GiB) in 27s, read: 1002.9 MiB/s, write: 277.6 MiB/s
INFO: 55% (27.9 GiB of 50.0 GiB) in 30s, read: 1019.8 MiB/s, write: 250.4 MiB/s
INFO: 66% (33.3 GiB of 50.0 GiB) in 33s, read: 1.8 GiB/s, write: 177.6 MiB/s
INFO: 81% (40.6 GiB of 50.0 GiB) in 36s, read: 2.4 GiB/s, write: 97.1 MiB/s
INFO: 86% (43.0 GiB of 50.0 GiB) in 39s, read: 833.5 MiB/s, write: 340.3 MiB/s
INFO: 87% (43.9 GiB of 50.0 GiB) in 42s, read: 310.5 MiB/s, write: 309.6 MiB/s
INFO: 89% (45.0 GiB of 50.0 GiB) in 45s, read: 350.4 MiB/s, write: 345.8 MiB/s
INFO: 91% (46.0 GiB of 50.0 GiB) in 48s, read: 354.0 MiB/s, write: 353.0 MiB/s
INFO: 93% (46.9 GiB of 50.0 GiB) in 51s, read: 293.6 MiB/s, write: 292.1 MiB/s
INFO: 95% (47.7 GiB of 50.0 GiB) in 54s, read: 299.7 MiB/s, write: 299.2 MiB/s
INFO: 97% (48.8 GiB of 50.0 GiB) in 57s, read: 379.4 MiB/s, write: 375.3 MiB/s
INFO: 99% (49.7 GiB of 50.0 GiB) in 1m, read: 290.0 MiB/s, write: 289.0 MiB/s
INFO: 100% (50.0 GiB of 50.0 GiB) in 1m 1s, read: 308.6 MiB/s, write: 306.1 MiB/s
INFO: backup is sparse: 32.74 GiB (65%) total zero data
INFO: transferred 50.00 GiB in 61 seconds (839.3 MiB/s)
INFO: archive file size: 5.11GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=4
INFO: removing backup 'local:backup/vzdump-qemu-100-2023_10_06-02_00_03.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 100 (00:01:01)
INFO: Backup finished at 2023-10-10 02:01:04
INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2023-10-10 02:01:04
INFO: status = running
INFO: VM Name: FreeIPA
INFO: include disk 'scsi0' 'local-zfs:vm-101-disk-0' 20G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-101-2023_10_10-02_01_04.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '9df7799d-bcaf-4768-b862-6b26dd6c8f10'
INFO: resuming VM again
INFO: 22% (4.5 GiB of 20.0 GiB) in 3s, read: 1.5 GiB/s, write: 339.4 MiB/s
INFO: 28% (5.7 GiB of 20.0 GiB) in 6s, read: 400.9 MiB/s, write: 320.3 MiB/s
INFO: 51% (10.2 GiB of 20.0 GiB) in 9s, read: 1.5 GiB/s, write: 241.2 MiB/s
INFO: 64% (13.0 GiB of 20.0 GiB) in 12s, read: 929.2 MiB/s, write: 240.7 MiB/s
INFO: 100% (20.0 GiB of 20.0 GiB) in 15s, read: 2.3 GiB/s, write: 37.7 MiB/s
INFO: backup is sparse: 16.54 GiB (82%) total zero data
INFO: transferred 20.00 GiB in 15 seconds (1.3 GiB/s)
INFO: archive file size: 1.80GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=4
INFO: removing backup 'local:backup/vzdump-qemu-101-2023_10_06-02_01_06.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 101 (00:00:15)
INFO: Backup finished at 2023-10-10 02:01:19
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2023-10-10 02:01:19
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: Nextcloud_old
INFO: include disk 'scsi0' 'local-zfs:vm-105-disk-0' 100G
INFO: snapshots found (not included into backup)
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-105-2023_10_10-02_01_19.vma.zst'
INFO: starting kvm to execute backup task
INFO: started backup task 'cfe2fca7-a7ab-4ada-aa54-5dabfef1ad7b'
INFO: 2% (2.4 GiB of 100.0 GiB) in 3s, read: 821.7 MiB/s, write: 502.2 MiB/s
INFO: 3% (3.9 GiB of 100.0 GiB) in 6s, read: 504.4 MiB/s, write: 432.0 MiB/s
INFO: 6% (6.4 GiB of 100.0 GiB) in 9s, read: 841.7 MiB/s, write: 247.0 MiB/s
INFO: 12% (12.4 GiB of 100.0 GiB) in 12s, read: 2.0 GiB/s, write: 179.4 MiB/s
INFO: 14% (14.4 GiB of 100.0 GiB) in 15s, read: 667.0 MiB/s, write: 391.0 MiB/s
INFO: 15% (15.9 GiB of 100.0 GiB) in 18s, read: 526.8 MiB/s, write: 228.2 MiB/s
INFO: 16% (16.8 GiB of 100.0 GiB) in 21s, read: 300.9 MiB/s, write: 262.8 MiB/s
INFO: 17% (17.5 GiB of 100.0 GiB) in 24s, read: 256.1 MiB/s, write: 240.5 MiB/s
INFO: 18% (18.3 GiB of 100.0 GiB) in 27s, read: 244.1 MiB/s, write: 205.4 MiB/s
INFO: 19% (19.1 GiB of 100.0 GiB) in 31s, read: 203.9 MiB/s, write: 198.0 MiB/s
INFO: 20% (20.9 GiB of 100.0 GiB) in 34s, read: 641.3 MiB/s, write: 391.1 MiB/s
INFO: 21% (21.9 GiB of 100.0 GiB) in 37s, read: 321.7 MiB/s, write: 293.3 MiB/s
INFO: 22% (22.8 GiB of 100.0 GiB) in 40s, read: 300.9 MiB/s, write: 251.3 MiB/s
INFO: 24% (24.2 GiB of 100.0 GiB) in 43s, read: 478.5 MiB/s, write: 330.6 MiB/s
INFO: 25% (25.3 GiB of 100.0 GiB) in 46s, read: 392.7 MiB/s, write: 301.5 MiB/s
INFO: 26% (26.5 GiB of 100.0 GiB) in 50s, read: 300.9 MiB/s, write: 267.1 MiB/s
INFO: 27% (27.4 GiB of 100.0 GiB) in 53s, read: 321.4 MiB/s, write: 299.4 MiB/s
INFO: 28% (28.3 GiB of 100.0 GiB) in 56s, read: 291.2 MiB/s, write: 231.6 MiB/s
INFO: 35% (35.9 GiB of 100.0 GiB) in 59s, read: 2.5 GiB/s, write: 71.4 MiB/s
INFO: 40% (40.5 GiB of 100.0 GiB) in 1m 2s, read: 1.5 GiB/s, write: 163.3 MiB/s
INFO: 42% (42.5 GiB of 100.0 GiB) in 1m 5s, read: 680.3 MiB/s, write: 280.4 MiB/s
INFO: 43% (43.2 GiB of 100.0 GiB) in 1m 8s, read: 241.7 MiB/s, write: 229.5 MiB/s
INFO: 44% (44.3 GiB of 100.0 GiB) in 1m 11s, read: 366.1 MiB/s, write: 307.1 MiB/s
INFO: 45% (45.7 GiB of 100.0 GiB) in 1m 14s, read: 468.9 MiB/s, write: 448.9 MiB/s
INFO: 46% (46.7 GiB of 100.0 GiB) in 1m 17s, read: 351.0 MiB/s, write: 294.4 MiB/s
INFO: 47% (47.9 GiB of 100.0 GiB) in 1m 20s, read: 410.3 MiB/s, write: 306.0 MiB/s
INFO: 48% (48.8 GiB of 100.0 GiB) in 1m 23s, read: 298.5 MiB/s, write: 258.8 MiB/s
INFO: 54% (55.0 GiB of 100.0 GiB) in 1m 26s, read: 2.1 GiB/s, write: 86.2 MiB/s
INFO: 64% (64.6 GiB of 100.0 GiB) in 1m 29s, read: 3.2 GiB/s, write: 0 B/s
INFO: 74% (74.1 GiB of 100.0 GiB) in 1m 32s, read: 3.2 GiB/s, write: 0 B/s
INFO: 83% (83.5 GiB of 100.0 GiB) in 1m 35s, read: 3.1 GiB/s, write: 0 B/s
INFO: 92% (93.0 GiB of 100.0 GiB) in 1m 38s, read: 3.2 GiB/s, write: 0 B/s
INFO: 100% (100.0 GiB of 100.0 GiB) in 1m 41s, read: 2.3 GiB/s, write: 2.7 KiB/s
INFO: backup is sparse: 76.99 GiB (76%) total zero data
INFO: transferred 100.00 GiB in 101 seconds (1013.9 MiB/s)
INFO: stopping kvm after backup task
INFO: archive file size: 10.83GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=4
INFO: removing backup 'local:backup/vzdump-qemu-105-2023_10_06-02_01_22.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 105 (00:01:43)
INFO: Backup finished at 2023-10-10 02:03:02
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1249921_106 for temporary files
INFO: Starting Backup of VM 106 (lxc)
INFO: Backup started at 2023-10-10 02:03:02
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: mail
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-106-2023_10_10-02_03_02.tar.zst'
INFO: Total bytes written: 954255360 (911MiB, 55MiB/s)
INFO: archive file size: 265MB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=4
INFO: removing backup 'local:backup/vzdump-lxc-106-2023_10_06-02_03_07.tar.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 106 (00:00:17)
INFO: Backup finished at 2023-10-10 02:03:19
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1249921_107 for temporary files
INFO: Starting Backup of VM 107 (lxc)
INFO: Backup started at 2023-10-10 02:03:19
INFO: status = running
INFO: CT Name: nextcloud
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
 
Hi,
when this happens can you please generate the output of ps auxwf as well as journalctl --since -1day > journal.txt and post it as attachment here. I assume that your snapshot creation never finishes correctly for some reason. Is the zfs dataset of your CTs rootfs thin provisioned? Please post zfs get all <zpool-name>/subvol-107-disk-0 (with correct zpool-name for your setup) and pct config 107 --current.
 
Hi,
when this happens can you please generate the output of ps auxwf as well as journalctl --since -1day > journal.txt and post it as attachment here. I assume that your snapshot creation never finishes correctly for some reason. Is the zfs dataset of your CTs rootfs thin provisioned? Please post zfs get all <zpool-name>/subvol-107-disk-0 (with correct zpool-name for your setup) and pct config 107 --current.
Hi Chris,
See attached most of what you asked for. Yes, the zfs dataset is thin provisioned. I get an error with the below command, not sure what I'm doing wrong;

root@pve1:~# zfs get all local-zfs/subvol-107-disk-0
cannot open 'local-zfs/subvol-107-disk-0': dataset does not exist

Regards,
Gerald
 

Attachments

No, if you use a FUSE mount in the container, you will have to opt for stop mode backups.

Or if that is not desired, mount the FUSE mountpoint on the host instead and pass it as bind mount to the container. However, I assume since you seemingly require FUSE to mount snaps via squashfuse in your case using a qemu VM would be the better option.
 
No, if you use a FUSE mount in the container, you will have to opt for stop mode backups.

Or if that is not desired, mount the FUSE mountpoint on the host instead and pass it as bind mount to the container. However, I assume since you seemingly require FUSE to mount snaps via squashfuse in your case using a qemu VM would be the better option.
Thanks Chris.
I think I will opt for a stop backup for this container. Since I use a backup job, can I set the job as snapshot but manually set stop for the specific container? Also, can Proxmox be configured to skip a problematic backup job instead of hanging?

Regards,
Gerald
 
Thanks Chris.
I think I will opt for a stop backup for this container. Since I use a backup job, can I set the job as snapshot but manually set stop for the specific container? Also, can Proxmox be configured to skip a problematic backup job instead of hanging?

Regards,
Gerald
You will have to exclude the container from the existing backup job and create a new, separate job with stop mode configured for this particular container. It is not possible to set this individually for the CTs/VMs of the job.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!