Filesystem in VM has been corrupted after backup failed

mark99

New Member
Apr 21, 2024
8
0
1
Hello, I am using the latest version of PVE without cluster functions and I am facing the problem of file system corruption inside the VM if the backup task (zstd process) fails

I found one mention here, but I decided to create a separate topic, as I consider this a critical problem
I did not find the exact place in the source codes where the error occurs, but the backup snapshot should be merged with the main image in any case, including when the task fails.

the current behavior when a VM block device can be changed while the VM is running and writing data is unacceptable: this can lead to irreversible damage to the structure of the guest file system

Bash:
root@pve:~# pveversion --verbose
proxmox-ve: 8.1.0 (running kernel: 6.2.16-19-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
pve-kernel-5.13: 7.1-9
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve1
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.10-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1
photo_2024-04-20_16-52-56.jpg

I have encountered this behavior several times and the easiest way to reproduce it is to fill up the backup disk:
while performing this task, the VM file systems ID 103, 107 and 128 were definitely corrupted
Code:
INFO: starting new backup job: vzdump 107 109 111 103 102 108 128 106 124 --storage hs-mnt-2 --quiet 1 --prune-backups 'keep-last=2' --mailto mark@* --node pve --compress zstd --mailnotification failure --mode snapshot
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2024-04-19 00:00:02
INFO: status = running
INFO: CT Name: slavik
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
  Logical volume "snap_vm-102-disk-0_vzdump" created.
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.tar.zst'
INFO: zstd: error 70 : Write error : cannot write block : No space left on device
INFO: cleanup temporary 'vzdump' snapshot
  Logical volume "snap_vm-102-disk-0_vzdump" successfully removed.
ERROR: Backup of VM 102 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/hdd0/tmp/vzdumptmp2489997_102/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd '--threads=1' >/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.tar.dat' failed: exit code 70
INFO: Failed at 2024-04-19 00:00:06
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.log': No space left on device
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2024-04-19 00:00:06
INFO: status = running
INFO: VM Name: blog
INFO: include disk 'virtio0' 'ssd2-lvm:vm-103-disk-0' 25G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-103-2024_04_19-00_00_06.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a07d8d37-9f51-4c3a-b20b-e63abca924ca'
INFO: resuming VM again
INFO:   6% (1.7 GiB of 25.0 GiB) in 3s, read: 583.6 MiB/s, write: 414.2 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   6% (1.7 GiB of 25.0 GiB) in 14s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 103 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:00:20
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-103-2024_04_19-00_00_06.log': No space left on device
INFO: Starting Backup of VM 106 (lxc)
INFO: Backup started at 2024-04-19 00:00:20
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: css-server
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.tar.zst'
INFO: zstd: error 70 : Write error : cannot write block : No space left on device
ERROR: Backup of VM 106 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/hdd0/tmp/vzdumptmp2489997_106/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd '--threads=1' >/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.tar.dat' failed: exit code 70
INFO: Failed at 2024-04-19 00:00:42
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.log': No space left on device
INFO: Starting Backup of VM 107 (qemu)
INFO: Backup started at 2024-04-19 00:00:42
INFO: status = running
INFO: VM Name: musicapp
INFO: include disk 'virtio0' 'ssd1-lvm:vm-107-disk-1' 120G
INFO: include disk 'virtio1' 'ssd1-lvm:vm-107-disk-0' 64G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-107-2024_04_19-00_00_42.vma.zst'
INFO: skipping guest-agent 'fs-freeze', agent configured but not running?
INFO: started backup task 'a2627416-d86e-4867-b3a1-124250dc6c94'
INFO: resuming VM again
INFO:   0% (1.2 GiB of 184.0 GiB) in 3s, read: 418.1 MiB/s, write: 389.7 MiB/s
INFO:   1% (2.1 GiB of 184.0 GiB) in 6s, read: 315.4 MiB/s, write: 310.3 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   1% (3.0 GiB of 184.0 GiB) in 30s, read: 38.4 MiB/s, write: 36.6 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 107 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:01:15
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-107-2024_04_19-00_00_42.log': No space left on device
INFO: Starting Backup of VM 108 (qemu)
INFO: Backup started at 2024-04-19 00:01:15
INFO: status = running
INFO: VM Name: server2012
INFO: include disk 'sata0' 'ssd1-lvm:vm-108-disk-0' 200G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-108-2024_04_19-00_01_15.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'fb24bff5-66af-437e-8b34-b319d050f5e5'
INFO: resuming VM again
INFO:   0% (1.2 GiB of 200.0 GiB) in 3s, read: 416.9 MiB/s, write: 339.1 MiB/s
INFO:   1% (2.2 GiB of 200.0 GiB) in 6s, read: 317.8 MiB/s, write: 301.5 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   1% (3.8 GiB of 200.0 GiB) in 30s, read: 68.3 MiB/s, write: 66.1 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 108 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:01:47
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-108-2024_04_19-00_01_15.log': No space left on device
INFO: Starting Backup of VM 109 (qemu)
INFO: Backup started at 2024-04-19 00:01:47
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: sslawa-work
INFO: include disk 'virtio0' 'ssd1-lvm:vm-109-disk-0' 50G
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-109-2024_04_19-00_01_47.vma.zst'
INFO: starting kvm to execute backup task
WARN: no efidisk configured! Using temporary efivars disk.
INFO: started backup task '0742a479-e5e5-4554-8382-23f806eee48a'
INFO:   1% (1022.2 MiB of 50.0 GiB) in 3s, read: 340.8 MiB/s, write: 307.3 MiB/s
INFO:   3% (1.8 GiB of 50.0 GiB) in 6s, read: 290.0 MiB/s, write: 285.0 MiB/s
INFO:   5% (2.6 GiB of 50.0 GiB) in 9s, read: 253.8 MiB/s, write: 253.3 MiB/s
INFO:   6% (3.5 GiB of 50.0 GiB) in 12s, read: 299.7 MiB/s, write: 298.7 MiB/s
INFO:   8% (4.2 GiB of 50.0 GiB) in 15s, read: 235.6 MiB/s, write: 232.8 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   8% (4.2 GiB of 50.0 GiB) in 34s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 109 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:02:24
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-109-2024_04_19-00_01_47.log': No space left on device
INFO: Starting Backup of VM 111 (qemu)
INFO: Backup started at 2024-04-19 00:02:24
INFO: status = running
INFO: VM Name: openvpn
INFO: include disk 'virtio0' 'ssd2-lvm:vm-111-disk-0' 10G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-111-2024_04_19-00_02_24.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a5b70cc9-2ce7-42b7-bfae-17429a42121d'
INFO: resuming VM again
INFO:  15% (1.5 GiB of 10.0 GiB) in 3s, read: 518.4 MiB/s, write: 337.1 MiB/s
INFO:  24% (2.5 GiB of 10.0 GiB) in 6s, read: 333.5 MiB/s, write: 324.4 MiB/s
INFO:  44% (4.4 GiB of 10.0 GiB) in 9s, read: 653.5 MiB/s, write: 287.2 MiB/s
INFO:  59% (5.9 GiB of 10.0 GiB) in 12s, read: 508.9 MiB/s, write: 299.9 MiB/s
INFO:  68% (6.9 GiB of 10.0 GiB) in 15s, read: 340.8 MiB/s, write: 290.9 MiB/s
INFO:  71% (7.2 GiB of 10.0 GiB) in 18s, read: 88.2 MiB/s, write: 82.3 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:  71% (7.2 GiB of 10.0 GiB) in 35s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 111 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:03:00
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-111-2024_04_19-00_02_24.log': No space left on device
INFO: Starting Backup of VM 124 (qemu)
INFO: Backup started at 2024-04-19 00:03:00
INFO: status = running
INFO: VM Name: slavik-vm
INFO: include disk 'virtio0' 'ssd1-lvm:vm-124-disk-0' 20G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-124-2024_04_19-00_03_00.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'd5536f0a-aca3-4031-99c0-35aa015a78bb'
INFO: resuming VM again
INFO:   5% (1.0 GiB of 20.0 GiB) in 3s, read: 347.0 MiB/s, write: 297.1 MiB/s
INFO:  11% (2.2 GiB of 20.0 GiB) in 6s, read: 419.1 MiB/s, write: 341.5 MiB/s
INFO:  16% (3.2 GiB of 20.0 GiB) in 9s, read: 342.0 MiB/s, write: 332.4 MiB/s
INFO:  19% (4.0 GiB of 20.0 GiB) in 12s, read: 256.2 MiB/s, write: 227.5 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:  19% (4.0 GiB of 20.0 GiB) in 31s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 124 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:03:31
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-124-2024_04_19-00_03_00.log': No space left on device
INFO: Starting Backup of VM 128 (qemu)
INFO: Backup started at 2024-04-19 00:03:31
INFO: status = running
INFO: VM Name: kerio
INFO: include disk 'virtio0' 'ssd1-lvm:vm-128-disk-0' 30G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-128-2024_04_19-00_03_31.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'e8baff63-47ec-4099-b53a-8069da49f503'
INFO: resuming VM again
INFO:   7% (2.3 GiB of 30.0 GiB) in 3s, read: 778.2 MiB/s, write: 394.1 MiB/s
INFO:  11% (3.4 GiB of 30.0 GiB) in 6s, read: 397.4 MiB/s, write: 353.3 MiB/s
INFO:  14% (4.4 GiB of 30.0 GiB) in 9s, read: 336.1 MiB/s, write: 314.4 MiB/s
INFO:  17% (5.4 GiB of 30.0 GiB) in 12s, read: 316.6 MiB/s, write: 284.9 MiB/s
INFO:  21% (6.4 GiB of 30.0 GiB) in 15s, read: 369.8 MiB/s, write: 363.0 MiB/s
INFO:  22% (6.7 GiB of 30.0 GiB) in 18s, read: 94.2 MiB/s, write: 93.9 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:  22% (6.7 GiB of 30.0 GiB) in 36s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 128 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:04:08
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-128-2024_04_19-00_03_31.log': No space left on device
INFO: Backup job finished with errors
INFO: notified via target `<mark@*>`
TASK ERROR: job errors

is there a fix or workarounds for this behavior?
 
Last edited:
I know nothing about the problem but a work-around would probably be to not use backup mode Snapshot but use Stop instead (and maybe Suspend would work also).
May I ask you to try to reproduce this behavior and confirm or deny it?

Stopping or suspending may be a workaround, but in this case it is the unavailability of resources
 
May I ask you to try to reproduce this behavior and confirm or deny it?
I cannot reproduce your issue (after a few attempts), so I cannot test it.
Stopping or suspending may be a workaround, but in this case it is the unavailability of resources
Indeed. Maybe running out of disk space for your backups might also be something that can be prevented? I cannot fix this problem, sorry.
 
  • Like
Reactions: Kingneutron
I cannot reproduce your issue (after a few attempts), so I cannot test it.
Could you tell me how you checked? did you reproduce the lack of space or just cancel the task manually?
if just canceled, then I also have no problems with this, there is only a problem if the place runs out and an error occurs in the backup process
I have not checked, but perhaps the same problem will occur if, for example, the disk where the backup goes is disconnected

Indeed. Maybe running out of disk space for your backups might also be something that can be prevented? I cannot fix this problem, sorry.
I'm sorry if I seem rude, I don't need you to fix the problem. :oops:
yes, the lack of space can be prevented, but this does not negate that the behavior of the PVE in my case is wrong, does it?
 
Last edited:
Could you tell me how you checked? did you reproduce the lack of space or just cancel the task manually?
I used a backup directory storage of 1GiB and the backup errored out after compressing several GBs. I cannot reproduce the issue with a VM that uses VirtIO SCSI and is not write-heavy during the (short) backup
yes, the lack of space can be prevented, but this does not negate that the behavior of the PVE in my case is wrong, does it?
I think running out of space more than once also needs to be addressed. But any problem gets much worse by the wrong behavior or corrupting the source. Personally I love using PBS for speedy deduplicated backups.
 
btw, stop mode shutdown guest, make a temporary snapshot while VM if off, start backup, then direct reboot VM to minimize downtime.
here, where dest. backup is a slow 2.5" hdd as PBS, to prevent too slowdown of the VM (Windows services like sql can't even start during the backup), I shutdown the VM during night, then backup, then hook script start VM after finished backup.
indeed, fleecing support should fix the problem.
 
https://youtu.be/2FQYSFJCPE4
I installed the latest updates, started the backup task and stopped it immediately. I see some io errors inside the vm again.
moreover, I had to remove the flag manually via qm unlock
Could it be that you ran a backup job without fleecing enabled, which can (only?) be set in a scheduled backup job under Advanced. Please correct me if I'm wrong.
EDIT: Fleecing can also be enabled when running vzdump manually from the command line: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing
 
Last edited:
Could it be that you ran a backup job without fleecing enabled, which can (only?) be set in a scheduled backup job under Advanced. Please correct me if I'm wrong.
EDIT: Fleecing can also be enabled when running vzdump manually from the command line: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing
I tried with the default options, but I didn't find an option in the GUI to enable fleecing. I'll try the same thing later through a scheduled task with fleecing enabled.

but I still don't understand why the backup procedure can somehow break the io inside the vm.
all other hypervisors and any backup software, except pve, first make a regular/tagged snapshot of the vm, and then copy the data at any speed, without channel requirements or anything else, without losing almost anything in io speed
at the end of the procedure, the snapshot merges with the main disk anyway, regardless of the task status

for what reason is it done differently in pve?

UPD: In any case, consistency is respected as much as fsfreeze commands are respected by software running inside the vm, right?
I do not understand the algorithms of pve/pbs, but I could not even imagine that the speed of the io vm could in any way depend on backup, this is a rather strange implementation
 
Last edited:
I tried with the default options, but I didn't find an option in the GUI to enable fleecing. I'll try the same thing later through a scheduled task with fleecing enabled.
Then fleecing was not enabled. As said, it looks like you need a scheduled job for it (at the moment).

but I still don't understand why the backup procedure can somehow break the io inside the vm.
all other hypervisors and any backup software, except pve, first make a regular/tagged snapshot of the vm, and then copy the data at any speed, without channel requirements or anything else, without losing almost anything in io speed
at the end of the procedure, the snapshot merges with the main disk anyway, regardless of the task status

for what reason is it done differently in pve?

UPD: In any case, consistency is respected as much as fsfreeze commands are respected by software running inside the vm, right?
I do not understand the algorithms of pve/pbs, but I could not even imagine that the speed of the io vm could in any way depend on backup, this is a rather strange implementation
If the used storage does not support instant snapshots (and I think the underlying QEMU/KVM does not use them anyway), it will take time to copy the consistent data (ensured by fsfreeze). Use the Suspend backup type to prevent changes to the data during this process.

If you want the VM to keep running during a backup, Proxmox (the underlying QEMU/KVM) temporarily stores writes (that happen inside the VM) somewhere else (in memory for example). The number/speed of writes happening inside the VM does influence this process a lot. If those writes get lost, you might have an outdated filesystem inside the VM (because the writes did not happen), while the filesystem cache assumes the writes did happen and this is a recipe for corruption.

Fleecing works the other way around and copies the original data to a separate temporary storage (that you chose), so that the original consistent data is backed up (instead of a mix of old and new data). If this fails or gets lost, the backup fails but the writes did happen inside the VM, so no corruption.
 
If you want the VM to keep running during a backup, Proxmox (the underlying QEMU/KVM) temporarily stores writes (that happen inside the VM) somewhere else (in memory for example). The number/speed of writes happening inside the VM does influence this process a lot. If those writes get lost, you might have an outdated filesystem inside the VM (because the writes did not happen), while the filesystem cache assumes the writes did happen and this is a recipe for corruption.
are there any reasons why such a solution was chosen?
why not just disable the snapshot mode option if the vm storage does not support snapshots
and use the built-in qemu snapshots?
 
Hi,
Then fleecing was not enabled. As said, it looks like you need a scheduled job for it (at the moment).
you can also configure it as a node-wide default in /etc/vzdump.conf with a line like fleecing: enabled=true,storage=local-lvm.

are there any reasons why such a solution was chosen?
why not just disable the snapshot mode option if the vm storage does not support snapshots
and use the built-in qemu snapshots?
1. it is independent of the storage, i.e. avoids limiting which storages you can use for the VM without having downtime during backup
2. it allows tracking which parts of the disks are dirty for incremental backups
3. built-in QEMU snapshots are only supported by qcow2 and RBD and can be rather inefficient (e.g. qcow2 on NFS).
 
Hi
you can also configure it as a node-wide default in /etc/vzdump.conf with a line like fleecing: enabled=true,storage=local-lvm.
I checked it out. I manually start the task and stop it after a few seconds.
There are no more problems with io dropping and file system corruption, but there are other problems:
  1. The VM remains in a locked state even after the job has ended (with an error stopped: unexpected status). It requires manual qm unlock VMID The same behavior is in the video that I sent here earlier
  2. The fleece image does not seem to be deleted. The new write commands go to the main storage of the VM disk, but I could not determine if there is any necessary data in this image and whether it can be deleted. I tried it on two VMs and both images remained on the fleece storage (local). as if there are any flaws in the procedure for handling an abnormal interruption of a backup task
Code:
root@pve:~# ls -Rhl /var/lib/vz/images/
/var/lib/vz/images/:
total 8.0K
drwxr----- 2 root root 4.0K May  4 00:12 111
drwxr----- 2 root root 4.0K May  4 00:08 128

/var/lib/vz/images/111:
total 948M
-rw-r----- 1 root root 11G May  4 00:14 vm-111-fleece-0.qcow2

/var/lib/vz/images/128:
total 6.0M
-rw-r----- 1 root root 31G May  4 00:09 vm-128-fleece-0.qcow2

1. it is independent of the storage, i.e. avoids limiting which storages you can use for the VM without having downtime during backup
2. it allows tracking which parts of the disks are dirty for incremental backups
3. built-in QEMU snapshots are only supported by qcow2 and RBD and can be rather inefficient (e.g. qcow2 on NFS).
Thank you for the explanation. I find it makes sense.
 
Last edited:
I checked it out. I manually start the task and stop it after a few seconds.
Why did you stop it? There unfortunately is no "soft abort" at the moment, so the whole task group will be killed 5 seconds after receiving the signal to terminate. Likely the cleanup didn't get the chance to run within those 5 seconds, causing the below issues:
There are no more problems with io dropping and file system corruption, but there are other problems:
  1. The VM remains in a locked state even after the job has ended (with an error stopped: unexpected status). It requires manual qm unlock VMID The same behavior is in the video that I sent here earlier
  2. The fleece image does not seem to be deleted. The new write commands go to the main storage of the VM disk, but I could not determine if there is any necessary data in this image and whether it can be deleted. I tried it on two VMs and both images remained on the fleece storage (local). as if there are any flaws in the procedure for handling an abnormal interruption of a backup task
It's safe to remove the fleecing images manually if you know they came from a failed backup.
 
Last edited:
Why did you stop it? There unfortunately is no "soft abort" at the moment, so the whole task group will be killed 5 seconds after receiving the signal to terminate. Likely the cleanup didn't get the chance to run within those 5 seconds, causing the below issues:
I want to make sure that the logic works predictably and correctly in various circumstances.
in this case, I see that this is not happening. is this a problem and will it be fixed?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!