Filesystem in VM has been corrupted after backup failed

mark99 · Apr 21, 2024

Hello, I am using the latest version of PVE without cluster functions and I am facing the problem of file system corruption inside the VM if the backup task (zstd process) fails

I found one mention here, but I decided to create a separate topic, as I consider this a critical problem
I did not find the exact place in the source codes where the error occurs, but the backup snapshot should be merged with the main image in any case, including when the task fails.

the current behavior when a VM block device can be changed while the VM is running and writing data is unacceptable: this can lead to irreversible damage to the structure of the guest file system

Bash:

root@pve:~# pveversion --verbose
proxmox-ve: 8.1.0 (running kernel: 6.2.16-19-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
pve-kernel-5.13: 7.1-9
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve1
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.10-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

I have encountered this behavior several times and the easiest way to reproduce it is to fill up the backup disk:
while performing this task, the VM file systems ID 103, 107 and 128 were definitely corrupted

Code:

INFO: starting new backup job: vzdump 107 109 111 103 102 108 128 106 124 --storage hs-mnt-2 --quiet 1 --prune-backups 'keep-last=2' --mailto mark@* --node pve --compress zstd --mailnotification failure --mode snapshot
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2024-04-19 00:00:02
INFO: status = running
INFO: CT Name: slavik
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
  Logical volume "snap_vm-102-disk-0_vzdump" created.
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.tar.zst'
INFO: zstd: error 70 : Write error : cannot write block : No space left on device
INFO: cleanup temporary 'vzdump' snapshot
  Logical volume "snap_vm-102-disk-0_vzdump" successfully removed.
ERROR: Backup of VM 102 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/hdd0/tmp/vzdumptmp2489997_102/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd '--threads=1' >/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.tar.dat' failed: exit code 70
INFO: Failed at 2024-04-19 00:00:06
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.log': No space left on device
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2024-04-19 00:00:06
INFO: status = running
INFO: VM Name: blog
INFO: include disk 'virtio0' 'ssd2-lvm:vm-103-disk-0' 25G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-103-2024_04_19-00_00_06.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a07d8d37-9f51-4c3a-b20b-e63abca924ca'
INFO: resuming VM again
INFO:   6% (1.7 GiB of 25.0 GiB) in 3s, read: 583.6 MiB/s, write: 414.2 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   6% (1.7 GiB of 25.0 GiB) in 14s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 103 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:00:20
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-103-2024_04_19-00_00_06.log': No space left on device
INFO: Starting Backup of VM 106 (lxc)
INFO: Backup started at 2024-04-19 00:00:20
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: css-server
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.tar.zst'
INFO: zstd: error 70 : Write error : cannot write block : No space left on device
ERROR: Backup of VM 106 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/hdd0/tmp/vzdumptmp2489997_106/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd '--threads=1' >/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.tar.dat' failed: exit code 70
INFO: Failed at 2024-04-19 00:00:42
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.log': No space left on device
INFO: Starting Backup of VM 107 (qemu)
INFO: Backup started at 2024-04-19 00:00:42
INFO: status = running
INFO: VM Name: musicapp
INFO: include disk 'virtio0' 'ssd1-lvm:vm-107-disk-1' 120G
INFO: include disk 'virtio1' 'ssd1-lvm:vm-107-disk-0' 64G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-107-2024_04_19-00_00_42.vma.zst'
INFO: skipping guest-agent 'fs-freeze', agent configured but not running?
INFO: started backup task 'a2627416-d86e-4867-b3a1-124250dc6c94'
INFO: resuming VM again
INFO:   0% (1.2 GiB of 184.0 GiB) in 3s, read: 418.1 MiB/s, write: 389.7 MiB/s
INFO:   1% (2.1 GiB of 184.0 GiB) in 6s, read: 315.4 MiB/s, write: 310.3 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   1% (3.0 GiB of 184.0 GiB) in 30s, read: 38.4 MiB/s, write: 36.6 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 107 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:01:15
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-107-2024_04_19-00_00_42.log': No space left on device
INFO: Starting Backup of VM 108 (qemu)
INFO: Backup started at 2024-04-19 00:01:15
INFO: status = running
INFO: VM Name: server2012
INFO: include disk 'sata0' 'ssd1-lvm:vm-108-disk-0' 200G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-108-2024_04_19-00_01_15.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'fb24bff5-66af-437e-8b34-b319d050f5e5'
INFO: resuming VM again
INFO:   0% (1.2 GiB of 200.0 GiB) in 3s, read: 416.9 MiB/s, write: 339.1 MiB/s
INFO:   1% (2.2 GiB of 200.0 GiB) in 6s, read: 317.8 MiB/s, write: 301.5 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   1% (3.8 GiB of 200.0 GiB) in 30s, read: 68.3 MiB/s, write: 66.1 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 108 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:01:47
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-108-2024_04_19-00_01_15.log': No space left on device
INFO: Starting Backup of VM 109 (qemu)
INFO: Backup started at 2024-04-19 00:01:47
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: sslawa-work
INFO: include disk 'virtio0' 'ssd1-lvm:vm-109-disk-0' 50G
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-109-2024_04_19-00_01_47.vma.zst'
INFO: starting kvm to execute backup task
WARN: no efidisk configured! Using temporary efivars disk.
INFO: started backup task '0742a479-e5e5-4554-8382-23f806eee48a'
INFO:   1% (1022.2 MiB of 50.0 GiB) in 3s, read: 340.8 MiB/s, write: 307.3 MiB/s
INFO:   3% (1.8 GiB of 50.0 GiB) in 6s, read: 290.0 MiB/s, write: 285.0 MiB/s
INFO:   5% (2.6 GiB of 50.0 GiB) in 9s, read: 253.8 MiB/s, write: 253.3 MiB/s
INFO:   6% (3.5 GiB of 50.0 GiB) in 12s, read: 299.7 MiB/s, write: 298.7 MiB/s
INFO:   8% (4.2 GiB of 50.0 GiB) in 15s, read: 235.6 MiB/s, write: 232.8 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:   8% (4.2 GiB of 50.0 GiB) in 34s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 109 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:02:24
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-109-2024_04_19-00_01_47.log': No space left on device
INFO: Starting Backup of VM 111 (qemu)
INFO: Backup started at 2024-04-19 00:02:24
INFO: status = running
INFO: VM Name: openvpn
INFO: include disk 'virtio0' 'ssd2-lvm:vm-111-disk-0' 10G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-111-2024_04_19-00_02_24.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a5b70cc9-2ce7-42b7-bfae-17429a42121d'
INFO: resuming VM again
INFO:  15% (1.5 GiB of 10.0 GiB) in 3s, read: 518.4 MiB/s, write: 337.1 MiB/s
INFO:  24% (2.5 GiB of 10.0 GiB) in 6s, read: 333.5 MiB/s, write: 324.4 MiB/s
INFO:  44% (4.4 GiB of 10.0 GiB) in 9s, read: 653.5 MiB/s, write: 287.2 MiB/s
INFO:  59% (5.9 GiB of 10.0 GiB) in 12s, read: 508.9 MiB/s, write: 299.9 MiB/s
INFO:  68% (6.9 GiB of 10.0 GiB) in 15s, read: 340.8 MiB/s, write: 290.9 MiB/s
INFO:  71% (7.2 GiB of 10.0 GiB) in 18s, read: 88.2 MiB/s, write: 82.3 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:  71% (7.2 GiB of 10.0 GiB) in 35s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 111 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:03:00
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-111-2024_04_19-00_02_24.log': No space left on device
INFO: Starting Backup of VM 124 (qemu)
INFO: Backup started at 2024-04-19 00:03:00
INFO: status = running
INFO: VM Name: slavik-vm
INFO: include disk 'virtio0' 'ssd1-lvm:vm-124-disk-0' 20G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-124-2024_04_19-00_03_00.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'd5536f0a-aca3-4031-99c0-35aa015a78bb'
INFO: resuming VM again
INFO:   5% (1.0 GiB of 20.0 GiB) in 3s, read: 347.0 MiB/s, write: 297.1 MiB/s
INFO:  11% (2.2 GiB of 20.0 GiB) in 6s, read: 419.1 MiB/s, write: 341.5 MiB/s
INFO:  16% (3.2 GiB of 20.0 GiB) in 9s, read: 342.0 MiB/s, write: 332.4 MiB/s
INFO:  19% (4.0 GiB of 20.0 GiB) in 12s, read: 256.2 MiB/s, write: 227.5 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:  19% (4.0 GiB of 20.0 GiB) in 31s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 124 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:03:31
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-124-2024_04_19-00_03_00.log': No space left on device
INFO: Starting Backup of VM 128 (qemu)
INFO: Backup started at 2024-04-19 00:03:31
INFO: status = running
INFO: VM Name: kerio
INFO: include disk 'virtio0' 'ssd1-lvm:vm-128-disk-0' 30G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-128-2024_04_19-00_03_31.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'e8baff63-47ec-4099-b53a-8069da49f503'
INFO: resuming VM again
INFO:   7% (2.3 GiB of 30.0 GiB) in 3s, read: 778.2 MiB/s, write: 394.1 MiB/s
INFO:  11% (3.4 GiB of 30.0 GiB) in 6s, read: 397.4 MiB/s, write: 353.3 MiB/s
INFO:  14% (4.4 GiB of 30.0 GiB) in 9s, read: 336.1 MiB/s, write: 314.4 MiB/s
INFO:  17% (5.4 GiB of 30.0 GiB) in 12s, read: 316.6 MiB/s, write: 284.9 MiB/s
INFO:  21% (6.4 GiB of 30.0 GiB) in 15s, read: 369.8 MiB/s, write: 363.0 MiB/s
INFO:  22% (6.7 GiB of 30.0 GiB) in 18s, read: 94.2 MiB/s, write: 93.9 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO:  22% (6.7 GiB of 30.0 GiB) in 36s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 128 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:04:08
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-128-2024_04_19-00_03_31.log': No space left on device
INFO: Backup job finished with errors
INFO: notified via target `<mark@*>`
TASK ERROR: job errors

is there a fix or workarounds for this behavior?

leesteken · Apr 21, 2024

I know nothing about the problem but a work-around would probably be to not use backup mode Snapshot but use Stop instead (and maybe Suspend would work also).

mark99 · Apr 21, 2024

leesteken said:
I know nothing about the problem but a work-around would probably be to not use backup mode Snapshot but use Stop instead (and maybe Suspend would work also).

May I ask you to try to reproduce this behavior and confirm or deny it?

Stopping or suspending may be a workaround, but in this case it is the unavailability of resources

leesteken · Apr 21, 2024

mark99 said:
May I ask you to try to reproduce this behavior and confirm or deny it?

I cannot reproduce your issue (after a few attempts), so I cannot test it.

mark99 said:
Stopping or suspending may be a workaround, but in this case it is the unavailability of resources

Indeed. Maybe running out of disk space for your backups might also be something that can be prevented? I cannot fix this problem, sorry.

mark99 · Apr 22, 2024

leesteken said:
I cannot reproduce your issue (after a few attempts), so I cannot test it.

Could you tell me how you checked? did you reproduce the lack of space or just cancel the task manually?
if just canceled, then I also have no problems with this, there is only a problem if the place runs out and an error occurs in the backup process
I have not checked, but perhaps the same problem will occur if, for example, the disk where the backup goes is disconnected

leesteken said:
Indeed. Maybe running out of disk space for your backups might also be something that can be prevented? I cannot fix this problem, sorry.

I'm sorry if I seem rude, I don't need you to fix the problem.

yes, the lack of space can be prevented, but this does not negate that the behavior of the PVE in my case is wrong, does it?

leesteken · Apr 22, 2024

mark99 said:
Could you tell me how you checked? did you reproduce the lack of space or just cancel the task manually?

I used a backup directory storage of 1GiB and the backup errored out after compressing several GBs. I cannot reproduce the issue with a VM that uses VirtIO SCSI and is not write-heavy during the (short) backup

mark99 said:
yes, the lack of space can be prevented, but this does not negate that the behavior of the PVE in my case is wrong, does it?

I think running out of space more than once also needs to be addressed. But any problem gets much worse by the wrong behavior or corrupting the source. Personally I love using PBS for speedy deduplicated backups.

leesteken · Saturday at 09:34

I believe the new fleecing support should prevent this problem: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing

_gabriel · Saturday at 14:24

btw, stop mode shutdown guest, make a temporary snapshot while VM if off, start backup, then direct reboot VM to minimize downtime.
here, where dest. backup is a slow 2.5" hdd as PBS, to prevent too slowdown of the VM (Windows services like sql can't even start during the backup), I shutdown the VM during night, then backup, then hook script start VM after finished backup.
indeed, fleecing support should fix the problem.

mark99 · Wednesday at 21:57

leesteken said:
I believe the new fleecing support should prevent this problem: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing

https://youtu.be/2FQYSFJCPE4
I installed the latest updates, started the backup task and stopped it immediately. I see some io errors inside the vm again.
moreover, I had to remove the flag manually via qm unlock

leesteken · Wednesday at 22:04

mark99 said:
https://youtu.be/2FQYSFJCPE4
I installed the latest updates, started the backup task and stopped it immediately. I see some io errors inside the vm again.
moreover, I had to remove the flag manually via qm unlock

Could it be that you ran a backup job without fleecing enabled, which can (only?) be set in a scheduled backup job under Advanced. Please correct me if I'm wrong.
EDIT: Fleecing can also be enabled when running vzdump manually from the command line: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing

mark99 · 2024-05-02T01:49:29+0200

leesteken said:
Could it be that you ran a backup job without fleecing enabled, which can (only?) be set in a scheduled backup job under Advanced. Please correct me if I'm wrong.
EDIT: Fleecing can also be enabled when running vzdump manually from the command line: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_vm_backup_fleecing

I tried with the default options, but I didn't find an option in the GUI to enable fleecing. I'll try the same thing later through a scheduled task with fleecing enabled.

but I still don't understand why the backup procedure can somehow break the io inside the vm.
all other hypervisors and any backup software, except pve, first make a regular/tagged snapshot of the vm, and then copy the data at any speed, without channel requirements or anything else, without losing almost anything in io speed
at the end of the procedure, the snapshot merges with the main disk anyway, regardless of the task status

for what reason is it done differently in pve?

UPD: In any case, consistency is respected as much as fsfreeze commands are respected by software running inside the vm, right?
I do not understand the algorithms of pve/pbs, but I could not even imagine that the speed of the io vm could in any way depend on backup, this is a rather strange implementation

leesteken · 2024-05-02T08:02:20+0200

mark99 said:
I tried with the default options, but I didn't find an option in the GUI to enable fleecing. I'll try the same thing later through a scheduled task with fleecing enabled.

Then fleecing was not enabled. As said, it looks like you need a scheduled job for it (at the moment).

mark99 said:
but I still don't understand why the backup procedure can somehow break the io inside the vm.
all other hypervisors and any backup software, except pve, first make a regular/tagged snapshot of the vm, and then copy the data at any speed, without channel requirements or anything else, without losing almost anything in io speed
at the end of the procedure, the snapshot merges with the main disk anyway, regardless of the task status

for what reason is it done differently in pve?

UPD: In any case, consistency is respected as much as fsfreeze commands are respected by software running inside the vm, right?
I do not understand the algorithms of pve/pbs, but I could not even imagine that the speed of the io vm could in any way depend on backup, this is a rather strange implementation

If the used storage does not support instant snapshots (and I think the underlying QEMU/KVM does not use them anyway), it will take time to copy the consistent data (ensured by fsfreeze). Use the Suspend backup type to prevent changes to the data during this process.

If you want the VM to keep running during a backup, Proxmox (the underlying QEMU/KVM) temporarily stores writes (that happen inside the VM) somewhere else (in memory for example). The number/speed of writes happening inside the VM does influence this process a lot. If those writes get lost, you might have an outdated filesystem inside the VM (because the writes did not happen), while the filesystem cache assumes the writes did happen and this is a recipe for corruption.

Fleecing works the other way around and copies the original data to a separate temporary storage (that you chose), so that the original consistent data is backed up (instead of a mix of old and new data). If this fails or gets lost, the backup fails but the writes did happen inside the VM, so no corruption.

mark99 · 2024-05-02T10:36:17+0200

leesteken said:
If you want the VM to keep running during a backup, Proxmox (the underlying QEMU/KVM) temporarily stores writes (that happen inside the VM) somewhere else (in memory for example). The number/speed of writes happening inside the VM does influence this process a lot. If those writes get lost, you might have an outdated filesystem inside the VM (because the writes did not happen), while the filesystem cache assumes the writes did happen and this is a recipe for corruption.

are there any reasons why such a solution was chosen?
why not just disable the snapshot mode option if the vm storage does not support snapshots
and use the built-in qemu snapshots?

fiona · 2024-05-02T10:52:50+0200

Hi,

leesteken said:
Then fleecing was not enabled. As said, it looks like you need a scheduled job for it (at the moment).

you can also configure it as a node-wide default in /etc/vzdump.conf with a line like fleecing: enabled=true,storage=local-lvm.

mark99 said:
are there any reasons why such a solution was chosen?
why not just disable the snapshot mode option if the vm storage does not support snapshots
and use the built-in qemu snapshots?

1. it is independent of the storage, i.e. avoids limiting which storages you can use for the VM without having downtime during backup
2. it allows tracking which parts of the disks are dirty for incremental backups
3. built-in QEMU snapshots are only supported by qcow2 and RBD and can be rather inefficient (e.g. qcow2 on NFS).

Search

Search

Filesystem in VM has been corrupted after backup failed

mark99

New Member

leesteken

Distinguished Member

mark99

New Member

leesteken

Distinguished Member

mark99

New Member

leesteken

Distinguished Member

leesteken

Distinguished Member

_gabriel

Well-Known Member

mark99

New Member

leesteken

Distinguished Member

mark99

New Member

leesteken

Distinguished Member

mark99

New Member

fiona

Proxmox Staff Member