Hello, I am using the latest version of PVE without cluster functions and I am facing the problem of file system corruption inside the VM if the backup task (zstd process) fails
I found one mention here, but I decided to create a separate topic, as I consider this a critical problem
I did not find the exact place in the source codes where the error occurs, but the backup snapshot should be merged with the main image in any case, including when the task fails.
the current behavior when a VM block device can be changed while the VM is running and writing data is unacceptable: this can lead to irreversible damage to the structure of the guest file system
I have encountered this behavior several times and the easiest way to reproduce it is to fill up the backup disk:
while performing this task, the VM file systems ID 103, 107 and 128 were definitely corrupted
is there a fix or workarounds for this behavior?
I found one mention here, but I decided to create a separate topic, as I consider this a critical problem
I did not find the exact place in the source codes where the error occurs, but the backup snapshot should be merged with the main image in any case, including when the task fails.
the current behavior when a VM block device can be changed while the VM is running and writing data is unacceptable: this can lead to irreversible damage to the structure of the guest file system
Bash:
root@pve:~# pveversion --verbose
proxmox-ve: 8.1.0 (running kernel: 6.2.16-19-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
pve-kernel-5.13: 7.1-9
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve1
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.10-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1
I have encountered this behavior several times and the easiest way to reproduce it is to fill up the backup disk:
while performing this task, the VM file systems ID 103, 107 and 128 were definitely corrupted
Code:
INFO: starting new backup job: vzdump 107 109 111 103 102 108 128 106 124 --storage hs-mnt-2 --quiet 1 --prune-backups 'keep-last=2' --mailto mark@* --node pve --compress zstd --mailnotification failure --mode snapshot
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2024-04-19 00:00:02
INFO: status = running
INFO: CT Name: slavik
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
Logical volume "snap_vm-102-disk-0_vzdump" created.
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.tar.zst'
INFO: zstd: error 70 : Write error : cannot write block : No space left on device
INFO: cleanup temporary 'vzdump' snapshot
Logical volume "snap_vm-102-disk-0_vzdump" successfully removed.
ERROR: Backup of VM 102 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/hdd0/tmp/vzdumptmp2489997_102/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd '--threads=1' >/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.tar.dat' failed: exit code 70
INFO: Failed at 2024-04-19 00:00:06
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-102-2024_04_19-00_00_02.log': No space left on device
INFO: Starting Backup of VM 103 (qemu)
INFO: Backup started at 2024-04-19 00:00:06
INFO: status = running
INFO: VM Name: blog
INFO: include disk 'virtio0' 'ssd2-lvm:vm-103-disk-0' 25G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-103-2024_04_19-00_00_06.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a07d8d37-9f51-4c3a-b20b-e63abca924ca'
INFO: resuming VM again
INFO: 6% (1.7 GiB of 25.0 GiB) in 3s, read: 583.6 MiB/s, write: 414.2 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 6% (1.7 GiB of 25.0 GiB) in 14s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 103 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:00:20
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-103-2024_04_19-00_00_06.log': No space left on device
INFO: Starting Backup of VM 106 (lxc)
INFO: Backup started at 2024-04-19 00:00:20
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: css-server
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.tar.zst'
INFO: zstd: error 70 : Write error : cannot write block : No space left on device
ERROR: Backup of VM 106 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/hdd0/tmp/vzdumptmp2489997_106/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd '--threads=1' >/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.tar.dat' failed: exit code 70
INFO: Failed at 2024-04-19 00:00:42
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-lxc-106-2024_04_19-00_00_20.log': No space left on device
INFO: Starting Backup of VM 107 (qemu)
INFO: Backup started at 2024-04-19 00:00:42
INFO: status = running
INFO: VM Name: musicapp
INFO: include disk 'virtio0' 'ssd1-lvm:vm-107-disk-1' 120G
INFO: include disk 'virtio1' 'ssd1-lvm:vm-107-disk-0' 64G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-107-2024_04_19-00_00_42.vma.zst'
INFO: skipping guest-agent 'fs-freeze', agent configured but not running?
INFO: started backup task 'a2627416-d86e-4867-b3a1-124250dc6c94'
INFO: resuming VM again
INFO: 0% (1.2 GiB of 184.0 GiB) in 3s, read: 418.1 MiB/s, write: 389.7 MiB/s
INFO: 1% (2.1 GiB of 184.0 GiB) in 6s, read: 315.4 MiB/s, write: 310.3 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 1% (3.0 GiB of 184.0 GiB) in 30s, read: 38.4 MiB/s, write: 36.6 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 107 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:01:15
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-107-2024_04_19-00_00_42.log': No space left on device
INFO: Starting Backup of VM 108 (qemu)
INFO: Backup started at 2024-04-19 00:01:15
INFO: status = running
INFO: VM Name: server2012
INFO: include disk 'sata0' 'ssd1-lvm:vm-108-disk-0' 200G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-108-2024_04_19-00_01_15.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'fb24bff5-66af-437e-8b34-b319d050f5e5'
INFO: resuming VM again
INFO: 0% (1.2 GiB of 200.0 GiB) in 3s, read: 416.9 MiB/s, write: 339.1 MiB/s
INFO: 1% (2.2 GiB of 200.0 GiB) in 6s, read: 317.8 MiB/s, write: 301.5 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 1% (3.8 GiB of 200.0 GiB) in 30s, read: 68.3 MiB/s, write: 66.1 MiB/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 108 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:01:47
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-108-2024_04_19-00_01_15.log': No space left on device
INFO: Starting Backup of VM 109 (qemu)
INFO: Backup started at 2024-04-19 00:01:47
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: sslawa-work
INFO: include disk 'virtio0' 'ssd1-lvm:vm-109-disk-0' 50G
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-109-2024_04_19-00_01_47.vma.zst'
INFO: starting kvm to execute backup task
WARN: no efidisk configured! Using temporary efivars disk.
INFO: started backup task '0742a479-e5e5-4554-8382-23f806eee48a'
INFO: 1% (1022.2 MiB of 50.0 GiB) in 3s, read: 340.8 MiB/s, write: 307.3 MiB/s
INFO: 3% (1.8 GiB of 50.0 GiB) in 6s, read: 290.0 MiB/s, write: 285.0 MiB/s
INFO: 5% (2.6 GiB of 50.0 GiB) in 9s, read: 253.8 MiB/s, write: 253.3 MiB/s
INFO: 6% (3.5 GiB of 50.0 GiB) in 12s, read: 299.7 MiB/s, write: 298.7 MiB/s
INFO: 8% (4.2 GiB of 50.0 GiB) in 15s, read: 235.6 MiB/s, write: 232.8 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 8% (4.2 GiB of 50.0 GiB) in 34s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 109 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:02:24
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-109-2024_04_19-00_01_47.log': No space left on device
INFO: Starting Backup of VM 111 (qemu)
INFO: Backup started at 2024-04-19 00:02:24
INFO: status = running
INFO: VM Name: openvpn
INFO: include disk 'virtio0' 'ssd2-lvm:vm-111-disk-0' 10G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-111-2024_04_19-00_02_24.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'a5b70cc9-2ce7-42b7-bfae-17429a42121d'
INFO: resuming VM again
INFO: 15% (1.5 GiB of 10.0 GiB) in 3s, read: 518.4 MiB/s, write: 337.1 MiB/s
INFO: 24% (2.5 GiB of 10.0 GiB) in 6s, read: 333.5 MiB/s, write: 324.4 MiB/s
INFO: 44% (4.4 GiB of 10.0 GiB) in 9s, read: 653.5 MiB/s, write: 287.2 MiB/s
INFO: 59% (5.9 GiB of 10.0 GiB) in 12s, read: 508.9 MiB/s, write: 299.9 MiB/s
INFO: 68% (6.9 GiB of 10.0 GiB) in 15s, read: 340.8 MiB/s, write: 290.9 MiB/s
INFO: 71% (7.2 GiB of 10.0 GiB) in 18s, read: 88.2 MiB/s, write: 82.3 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 71% (7.2 GiB of 10.0 GiB) in 35s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 111 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:03:00
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-111-2024_04_19-00_02_24.log': No space left on device
INFO: Starting Backup of VM 124 (qemu)
INFO: Backup started at 2024-04-19 00:03:00
INFO: status = running
INFO: VM Name: slavik-vm
INFO: include disk 'virtio0' 'ssd1-lvm:vm-124-disk-0' 20G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-124-2024_04_19-00_03_00.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'd5536f0a-aca3-4031-99c0-35aa015a78bb'
INFO: resuming VM again
INFO: 5% (1.0 GiB of 20.0 GiB) in 3s, read: 347.0 MiB/s, write: 297.1 MiB/s
INFO: 11% (2.2 GiB of 20.0 GiB) in 6s, read: 419.1 MiB/s, write: 341.5 MiB/s
INFO: 16% (3.2 GiB of 20.0 GiB) in 9s, read: 342.0 MiB/s, write: 332.4 MiB/s
INFO: 19% (4.0 GiB of 20.0 GiB) in 12s, read: 256.2 MiB/s, write: 227.5 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 19% (4.0 GiB of 20.0 GiB) in 31s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 124 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:03:31
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-124-2024_04_19-00_03_00.log': No space left on device
INFO: Starting Backup of VM 128 (qemu)
INFO: Backup started at 2024-04-19 00:03:31
INFO: status = running
INFO: VM Name: kerio
INFO: include disk 'virtio0' 'ssd1-lvm:vm-128-disk-0' 30G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-128-2024_04_19-00_03_31.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'e8baff63-47ec-4099-b53a-8069da49f503'
INFO: resuming VM again
INFO: 7% (2.3 GiB of 30.0 GiB) in 3s, read: 778.2 MiB/s, write: 394.1 MiB/s
INFO: 11% (3.4 GiB of 30.0 GiB) in 6s, read: 397.4 MiB/s, write: 353.3 MiB/s
INFO: 14% (4.4 GiB of 30.0 GiB) in 9s, read: 336.1 MiB/s, write: 314.4 MiB/s
INFO: 17% (5.4 GiB of 30.0 GiB) in 12s, read: 316.6 MiB/s, write: 284.9 MiB/s
INFO: 21% (6.4 GiB of 30.0 GiB) in 15s, read: 369.8 MiB/s, write: 363.0 MiB/s
INFO: 22% (6.7 GiB of 30.0 GiB) in 18s, read: 94.2 MiB/s, write: 93.9 MiB/s
zstd: error 70 : Write error : cannot write block : No space left on device
INFO: 22% (6.7 GiB of 30.0 GiB) in 36s, read: 0 B/s, write: 0 B/s
ERROR: vma_queue_write: write error - Broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 128 failed - vma_queue_write: write error - Broken pipe
INFO: Failed at 2024-04-19 00:04:08
cp: failed to close '/mnt/pve/hs-mnt-2/dump/vzdump-qemu-128-2024_04_19-00_03_31.log': No space left on device
INFO: Backup job finished with errors
INFO: notified via target `<mark@*>`
TASK ERROR: job errors
is there a fix or workarounds for this behavior?
Last edited: