zstd: error 25 : Write error : No space left on device (cannot write compressed block)

sshami

New Member
Mar 14, 2021
17
0
1
34
Some of the VM's backup failed in proxmox and rest other are working fine.
Using Bacula for backup even the error says: No space left on device, which is actually not a case.

NFO: 0% (363.6 MiB of 50.0 GiB) in 3s, read: 121.2 MiB/s, write: 72.3 MiB/s
INFO: 1% (668.1 MiB of 50.0 GiB) in 6s, read: 101.5 MiB/s, write: 87.5 MiB/s
INFO: 2% (1.2 GiB of 50.0 GiB) in 9s, read: 171.2 MiB/s, write: 66.2 MiB/s
zstd: error 25 : Write error : No space left on device (cannot write compressed block)

Any idea where i have to check, even tried to execute script on proxmox for the VM which got failed, but single VM execution working fine.

script execution steps:
1. Execute backup script on proxmox server.
2. Copy backup from proxmox to backup.
3. Delete backup folder from proxmox.
 
can you post the complete task log, as well as the vm config, and the storage config?
 
I have the same problem, also with containers. I guesed that this is a problem with temporary file. But with your problem with kvm VMs, this may be a problem with zstd.
I think it has to do with zstd not getting rid of the data as quickly as it would like and then generating an error.
To which target system are you backing up? Is it possible that they are not written away as quickly?
I use a cloud service with rclone mounted path.

Code:
INFO: Starting Backup of VM 104 (lxc)
INFO: Backup started at 2022-07-13 05:52:19
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: gogs
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating vzdump archive '/mnt/pcloudbackup/dump/vzdump-lxc-104-2022_07_13-05_52_19.tar.zst'
INFO: zstd: error 25 : Write error : Input/output error (cannot write compressed block)
INFO: restarting vm
closing file '/var/lib/lxc/104/rules.seccomp.tmp.2015093' failed - No space left on device
ERROR: Backup of VM 104 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/var/tmp/vzdumptmp2015093_104' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd --rsyncable '--threads=1' >/mnt/pcloudbackup/dump/vzdump-lxc-104-2022_07_13-05_52_19.tar.dat' failed: exit code 25
INFO: Failed at 2022-07-13 05:55:46
INFO: Backup job finished with errors
TASK ERROR: job errors

I change now to gzip and wait for and am waiting for the results.
 
It is not zstd. Same with gzip.

Code:
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2022-07-14 06:38:26
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: sn
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating vzdump archive '/mnt/pcloudbackup/dump/vzdump-lxc-102-2022_07_14-06_38_26.tar.gz'
INFO: gzip: stdout: Input/output error
INFO: restarting vm
closing file '/var/lib/lxc/102/rules.seccomp.tmp.2452089' failed - No space left on device
ERROR: Backup of VM 102 failed - command 'set -o pipefail && lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/var/tmp/vzdumptmp2452089_102/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | gzip --rsyncable >/mnt/pcloudbackup/dump/vzdump-lxc-102-2022_07_14-06_38_26.tar.dat' failed: exit code 1
INFO: Failed at 2022-07-14 06:49:06

But this time another container.
Also strange. It says to restart the VM, but it doesn't .
 
This time, I try a local backup. Also fail.

Code:
INFO: starting new backup job: vzdump 100 --storage local --remove 0 --node pve01 --compress gzip --notes-template '{{guestname}}' --mode stop
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2022-07-26 07:16:44
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: cloud
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-100-2022_07_26-07_16_43.tar.gz'
INFO: gzip: stdout: No space left on device
ERROR: Backup of VM 100 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/var/tmp/vzdumptmp318888_100/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | gzip --rsyncable >/var/lib/vz/dump/vzdump-lxc-100-2022_07_26-07_16_43.tar.dat' failed: exit code 1
INFO: Failed at 2022-07-26 09:35:04
INFO: Backup job finished with errors
TASK ERROR: job errors

There is enough space:

Code:
root@pve01:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  2.9G     0  2.9G   0% /dev
tmpfs                 594M  784K  593M   1% /run
/dev/mapper/pve-root   76G   12G   61G  17% /
tmpfs                 2.9G   46M  2.9G   2% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   40K  128M   1% /etc/pve
tmpfs                 594M     0  594M   0% /run/user/0

I haven't a idea now.
 
150 GB. The last working backup was 70 GB. So it's clear. It doesn't fit into the local disk.
If I'm using a external (cloud) storage to backup, it still need the space to do a local backup?
 
If I'm using a external (cloud) storage to backup, it still need the space to do a local backup?
it shouldn't. does it there fail too? how much space do you have there?
 
are you sure the automount works (correctly)? it sounds to me like vzdump writes to the root partition, since both it (or rather, the compressor it writes with) and directly afterwards the starting of the container fail with ENOSPACE - and the latter doesn't write to the backup storage at all, but to another path very likely stored on /
 
I'm not 100% sure, but 2 other lxc backups working. All 3 backups starting with the same job.
So, how to be sure, that the automount is working all over the time?
 
are the other two after the failing one or before? the best way would probably be to ensure the mounting happens via a vzdump hook script before the actual backup task starts, and that the corresponding storage has is_mountpoint set..
 
is_mountpoint was a good hint. I will try that. I'm not a friend of hook scripts. I like to have the backup storage viewable in the GUI B-) .
I have also protect now the mount path with
Code:
chattr -i /mnt/pcloudbackup
So it should be give me an earlier error, if automount doesn't work.
 
Still doesn't work.
Code:
INFO: starting new backup job: vzdump 100 --storage pcloudbackup --node pve01 --compress zstd --notes-template '{{guestname}}' --mode stop --remove 0
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2022-08-01 12:10:47
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: cloud
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating vzdump archive '/mnt/pcloudbackup/dump/vzdump-lxc-100-2022_08_01-12_10_47.tar.zst'
INFO: zstd: error 25 : Write error : Input/output error (cannot write compressed block)
INFO: restarting vm
closing file '/var/lib/lxc/100/config.tmp.8443' failed - No space left on device
ERROR: Backup of VM 100 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/var/tmp/vzdumptmp8443_100/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd --rsyncable '--threads=1' >/mnt/pcloudbackup/dump/vzdump-lxc-100-2022_08_01-12_10_47.tar.dat' failed: exit code 25
INFO: Failed at 2022-08-01 13:47:35
INFO: Backup job finished with errors
TASK ERROR: job errors
I have started this by hand. This time with zstd.
 
I try to simulate this problem:
Code:
mknod full c 1 7
pct create 333 local:vztmpl/alpine-3.16-default_20220622_amd64.tar.xz -hostname testbackup -ostype alpine -storage local-lvm -start 1
vzdump 333 --dumpdir /mnt/pcloudbackup --mode stop --tmpdir /dev/full

result:
Code:
root@pve01:~# mknod full c 1 7
pct create 333 local:vztmpl/alpine-3.16-default_20220622_amd64.tar.xz -hostname testbackup -ostype alpine -storage local-lvm -start 1
vzdump 333 --dumpdir /mnt/pcloudbackup --mode stop --tmpdir /dev/full
mknod: full: File exists
  Logical volume "vm-333-disk-0" created.
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: 80b75155-9aba-49f1-9025-d3d331487612
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736
extracting archive '/var/lib/vz/template/cache/alpine-3.16-default_20220622_amd64.tar.xz'
Total bytes read: 9185280 (8.8MiB, 33MiB/s)
Detected container architecture: amd64
tmpdir '/dev/full' does not exist

Now I also find this behaviour strange.

This error message is wrong. /dev/full exists.

Is there a verbose option for vzdump?
 
I try to simulate this problem:
Code:
mknod full c 1 7
pct create 333 local:vztmpl/alpine-3.16-default_20220622_amd64.tar.xz -hostname testbackup -ostype alpine -storage local-lvm -start 1
vzdump 333 --dumpdir /mnt/pcloudbackup --mode stop --tmpdir /dev/full

result:
Code:
root@pve01:~# mknod full c 1 7
pct create 333 local:vztmpl/alpine-3.16-default_20220622_amd64.tar.xz -hostname testbackup -ostype alpine -storage local-lvm -start 1
vzdump 333 --dumpdir /mnt/pcloudbackup --mode stop --tmpdir /dev/full
mknod: full: File exists
  Logical volume "vm-333-disk-0" created.
Creating filesystem with 1048576 4k blocks and 262144 inodes
Filesystem UUID: 80b75155-9aba-49f1-9025-d3d331487612
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736
extracting archive '/var/lib/vz/template/cache/alpine-3.16-default_20220622_amd64.tar.xz'
Total bytes read: 9185280 (8.8MiB, 33MiB/s)
Detected container architecture: amd64
tmpdir '/dev/full' does not exist

Now I also find this behaviour strange.

This error message is wrong. /dev/full exists.
Yes, the message could be improved, but tmpdir needs to be a directory, which /dev/full isn't.
Is there a verbose option for vzdump?

Are you sure that /mnt/pcloudbackup/ is mounted properly? It doesn't show up in your earlier df output.
 
Yes, the message could be improved, but tmpdir needs to be a directory, which /dev/full isn't.
I would expect at stop mode to ignore the --tmpdir parameter. How could to simulate a full directory?
Are you sure that /mnt/pcloudbackup/ is mounted properly? It doesn't show up in your earlier df output.
Yes, I'm sure. I cut the last line with df last time, sorry.

Code:
root@pve01:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  2.9G     0  2.9G   0% /dev
tmpfs                 594M  788K  593M   1% /run
/dev/mapper/pve-root   76G   11G   62G  15% /
tmpfs                 2.9G   46M  2.9G   2% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   40K  128M   1% /etc/pve
cryptedpcloud:        2.0T  872G  1.2T  43% /mnt/pcloudbackup
tmpfs                 594M     0  594M   0% /run/user/0
 
I would expect at stop mode to ignore the --tmpdir parameter. How could to simulate a full directory?
It should, except for temporarily storing the container (and firewall) configuration which are included in the backup. But they are not big enough to fill the disk of course. By default, the tmpdir will be on the backup storage if we can detect that it's a POSIX filesystem, but in your case that detection doesn't succeed, so it uses /var/tmp/vzdumptmp<PID>_<container ID> as a fallback.

Can you check the temporary directory during the backup? Does the space on / actually run out? Inside the directory or somewhere else?

Are there any messages in /var/log/syslog during the backup?
 
I used
Code:
lsof |grep dump |less
to find the thief ;)
Code:
task\x20U 880416                              root   11wW     REG               0,26           0       1420 /run/vzdump.lock
task\x20U 880416                              root   12w      REG              253,1         414    1836263 /var/log/vzdump/lxc-100.log
tar       880849                              root    3r      DIR              253,1        4096    1835538 /var/tmp/vzdumptmp880416_100
zstd      880850                              root    1w      REG              253,1 10972250112    1836690 /var/lib/vz/dump/vzdump-lxc-100-2022_08_03-10_20_42.tar.dat
zstd      880850 880851 zstd                  root    1w      REG              253,1 10972250112    1836690 /var/lib/vz/dump/vzdump-lxc-100-2022_08_03-10_20_42.tar.dat
Code:
root@pve01:/var/lib/vz/dump# ls -lah /var/lib/vz/dump
total 16G
drwxr-xr-x 2 root root 4.0K Aug 3 10:21 .
drwxr-xr-x 7 root root 4.0K Aug 10 2020 ..
-rw-r--r-- 1 root root 1.2K Jul 20 13:57 vzdump-lxc-100-2022_07_20-12_47_12.log
-rw-r--r-- 1 root root 1.2K Jul 25 13:34 vzdump-lxc-100-2022_07_25-12_05_36.log
-rw-r--r-- 1 root root 1.2K Jul 26 09:35 vzdump-lxc-100-2022_07_26-07_16_43.log
-rw-r--r-- 1 root root  16G Aug  3 10:56 vzdump-lxc-100-2022_08_03-10_20_42.tar.dat

before I have disabled the vfs-cache from rclone in the mount option. So rclone need also another view.
 
status at this point.

with rclone vfs-cache off, vzdump makes a local cache copy of the hole container -> no space left. Backup failed and container starts again.
with rclone vfs-cache on, rclone makes a local cache copy of the hole container -> no space left. backup failed and container doesn't start again.

Conclusion: You need local free space of the size of the biggest VM/container to make an online backup over rclone.
There is a chance with clone chunker https://rclone.org/chunker/ , but I'm not sure if that is possible together with the crypt module.
 
  • Like
Reactions: juancarlosmx