Backup problem (vzdump or gzip?)

werter

Well-Known Member
Dec 10, 2017
91
9
48
Hi.

10 PVE's and on all PVE's I have problem with backups (GZIP compession)

All PVE's on last software version:
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-5.0.21-2-pve: 5.0.21-3
pve-kernel-5.0.21-1-pve: 5.0.21-2
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-8
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2

And it's not a problem with:
- RAM (try another ram modules - the same result)
- HDDs (SMART is ok)
- Free space is OK


In PVE web gui backup process ends without any problems:
INFO: starting new backup job: vzdump 100 101 --quiet 1 --mailnotification failure --storage local --mode snapshot --compress gzip
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2019-09-24 22:00:03
INFO: status = running
INFO: update VM 100: -lock backup
INFO: VM Name: VM1
INFO: include disk 'virtio0' 'local-zfs:vm-100-disk-1' 40G
INFO: include disk 'virtio1' 'local-zfs:vm-100-disk-2' 80G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-100-2019_09_24-22_00_03.vma.gz'
INFO: started backup task 'a5b5d999-fdff-4cfa-88a2-e680e99baa3e'
...
INFO: status: 100% (128849018880/128849018880), sparse 9% (12523520000), duration 3067, read/write 32/32 MB/s
INFO: transferred 128849 MB in 3067 seconds (42 MB/s)
INFO: archive file size: 61.98GB
INFO: Finished Backup of VM 100 (00:51:13)
INFO: Backup finished at 2019-09-24 22:51:16

INFO: Starting Backup of VM 101 (qemu)
INFO: Backup started at 2019-09-24 22:51:16
INFO: status = running
INFO: update VM 101: -lock backup
INFO: VM Name: VM2
INFO: include disk 'virtio0' 'local-zfs:vm-101-disk-0' 80G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz'
INFO: started backup task 'a926e604-8a88-4c9f-9848-07f368da6961'
...
INFO: status: 100% (85899345920/85899345920), sparse 98% (84260102144), duration 181, read/write 139/0 MB/s
INFO: transferred 85899 MB in 181 seconds (474 MB/s)
INFO: archive file size: 611MB
INFO: Finished Backup of VM 101 (00:03:03)
INFO: Backup finished at 2019-09-24 22:54:19
INFO: Backup job finished successfully
TASK OK

But when I'm try to check backup:

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs AMD A10-9700 RADEON R7, 10 COMPUTE CORES 4C+6G (660F51),ASM,AES-NI)

Scanning the drive for archives:
1 file, 66554542746 bytes (62 GiB)

Testing archive: /var/lib/vz/dump/vzdump-qemu-100-2019_09_24-22_00_03.vma.gz
--
Path = /var/lib/vz/dump/vzdump-qemu-100-2019_09_24-22_00_03.vma.gz
Type = gzip
Headers Size = 10


Sub items Errors: 1

Archives with Errors: 1

Sub items Errors: 1

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs AMD A10-9700 RADEON R7, 10 COMPUTE CORES 4C+6G (660F51),ASM,AES-NI)

Scanning the drive for archives:
1 file, 641554994 bytes (612 MiB)

Testing archive: /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz
--
Path = /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz
Type = gzip
Headers Size = 10

Sub items Errors: 1

Archives with Errors: 1

Sub items Errors: 1


When I'm trying to restore VM:
gzip: /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz: invalid compressed data--crc error

or

gzip: /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz: invalid compressed data--format violated
** (process:23913): ERROR **: 12:47:29.844: restore failed - short vma extent (3287040 < 3781120)
/bin/bash: line 1: 23912 Exit 1 zcat /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz
23913 Trace/breakpoint trap | vma extract -v -r /var/tmp/vzdumptmp23910.fifo - /var/tmp/vzdumptmp23910
TASK ERROR: command 'set -o pipefail && zcat /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz | vma extract -v -r /var/tmp/vzdumptmp23910.fifo - /var/tmp/vzdumptmp23910' failed: exit code 133

# vzdump default settings

#tmpdir: DIR
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
#stdexcludes: BOOLEAN
#mailto: ADDRESSLIST
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST
#pigz: N:
pigz: 4

Reinstalling GZIP doesn't help me.
Backup with LZO compression is OK.


Is it vzdump or GZIP problem?
 
Last edited:
you could try "zcat <PATH/TO/VMA.gz> | vma verify - -v" (warning: this will take a while for bigger archives).

how reproducible is the issue? if it triggers on every backup, can you test old backups and pinpoint some upgrade/change of environment/... when the issue first started appearing? /var/log/apt/history.log might be helpful..
 
@fabian He is trying that and it fails. That is very strange. I've only seen this on the forums with NFS, never with a local disk. Also LZOP is fine, so that's the way to go (for now). Personally, I only compress via LZOP, because gzip is not parallel, so we are often limited by CPU power, not backup bandwidth.

/bin/bash: line 1: 23912 Exit 1 zcat /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz
23913 Trace/breakpoint trap | vma extract -v -r /var/tmp/vzdumptmp23910.fifo - /var/tmp/vzdumptmp23910
TASK ERROR: command 'set -o pipefail && zcat /var/lib/vz/dump/vzdump-qemu-101-2019_09_24-22_51_16.vma.gz | vma extract -v -r /var/tmp/vzdumptmp23910.fifo - /var/tmp/vzdumptmp23910' failed: exit code 133
 
Hi

you could try "zcat <PATH/TO/VMA.gz> | vma verify - -v" (warning: this will take a while for bigger archives).

how reproducible is the issue? if it triggers on every backup, can you test old backups and pinpoint some upgrade/change of environment/... when the issue first started appearing? /var/log/apt/history.log might be helpful..

I can't extract .vma from archive. And 'zcat ... | vma verify -- v ' can't be worked.

I'm try it with NFS storage - the same (
The problems with backups was discovered by accident. I think that at least 2 weeks is for sure

@LnxBil
Pigz is parrallel with gzip syntax compatible
 
Last edited:
@fabian He is trying that and it fails. That is very strange. I've only seen this on the forums with NFS, never with a local disk. Also LZOP is fine, so that's the way to go (for now). Personally, I only compress via LZOP, because gzip is not parallel, so we are often limited by CPU power, not backup bandwidth.

I know - I wanted to see the full output ;)

you can check with "debsums" whether some package's files are corrupt. you could also try manually compressing an uncompressed vma with gzip and see whether that works..
 
-Date: 2019-09-02 04:00:33
Commandline: /usr/bin/apt -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold full-upgrade -y
Install: pve-kernel-5.0.21-1-pve:amd64 (5.0.21-1, automatic)
Upgrade: pve-kernel-5.0:amd64 (6.0-6, 6.0-7), libnghttp2-14:amd64 (1.36.0-2, 1.36.0-2+deb10u1), zfs-initramfs:amd64 (0.8.1-pve1, 0.8.1-pve2), zfsutils-linux:amd64 (0.$
End-Date: 2019-09-02 04:03:00

Start-Date: 2019-09-04 06:15:50
Commandline: /usr/bin/unattended-upgrade
Upgrade: libwbclient0:amd64 (2:4.9.5+dfsg-5, 2:4.9.5+dfsg-5+deb10u1), samba-libs:amd64 (2:4.9.5+dfsg-5, 2:4.9.5+dfsg-5+deb10u1), samba-common:amd64 (2:4.9.5+dfsg-5, 2$
End-Date: 2019-09-04 06:16:15

Start-Date: 2019-09-16 04:00:03
Commandline: apt remove -y --purge pve-kernel-4.15.18-18-pve
Purge: pve-kernel-4.15.18-18-pve:amd64 (4.15.18-44), pve-kernel-4.15:amd64 (5.4-6)
End-Date: 2019-09-16 04:00:41

Start-Date: 2019-09-16 04:01:33
Commandline: /usr/bin/apt -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold full-upgrade -y
Upgrade: pve-kernel-5.0.21-1-pve:amd64 (5.0.21-1, 5.0.21-2), libcomerr2:amd64 (1.44.5-1, 1.44.5-1+deb10u1), libcom-err2:amd64 (1.44.5-1,
1.44.5-1+deb10u1), libcups2:a$
End-Date: 2019-09-16 04:05:38

Start-Date: 2019-09-23 04:01:03
Commandline: /usr/bin/apt -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold full-upgrade -y
Install: pve-kernel-5.0.21-2-pve:amd64 (5.0.21-3, automatic)
Upgrade: pve-kernel-5.0:amd64 (6.0-7, 6.0-8), libexpat1:amd64 (2.2.6-2, 2.2.6-2+deb10u1), lxc-pve:amd64 (3.1.0-64, 3.1.0-65), pve-kernel-helper:amd64 (6.0-7, 6.0-8), $
End-Date: 2019-09-23 04:03:19

Start-Date: 2019-09-23 04:03:20
Commandline: /usr/bin/apt autoremove --purge -y
Purge: pve-kernel-5.0.18-1-pve:amd64 (5.0.18-3)
End-Date: 2019-09-23 04:03:38

debsums >> check
grep -v 'OK' check
/lib/systemd/system/fail2ban.service FAILED
/lib/systemd/system/rpc-statd.service FAILED
/bin/upssched-cmd FAILED
/lib/systemd/system/nut-monitor.service FAILED
/lib/systemd/system/nut-driver.service FAILED
/usr/share/misc/pci.ids FAILED

@fabian
You can reproduce my problem.
Create VM with disk >200GB and put some big files inside VM (more then 75% VM hdd space summary).
After apt install -y pigz, then printf "\\npigz: %s"$(getconf _NPROCESSORS_ONLN)\\n >> /etc/vzdump.conf
Make backup in Web GUI with GZIP compression SEVERAL time. Then try to check backups.
 
Last edited:
cannot reproduce this - can you please also include your storage.cfg and a VM config? does this happen for all VMs? or just some?
 
As I'm understand if I'm using PIGZ for compressing in PVE (activate PIGZ option in vzdump.conf), than and FOR DECOMPRESSING PIGZ
should also be used too?

No, as decompression done with pigz would use one thread only anyway, see man pigz
Code:
       Decompression can't be parallelized, at  least  not  without  specially
       prepared  deflate  streams  for that purpose.  As a result, pigz uses a
       single thread (the main thread)  for  decompression,  but  will  create
       three  other threads for reading, writing, and check calculation, which
       can speed up decompression under some circumstances.   Parallel  decom‐
       pression  can  be turned off by specifying one process ( -dp 1 or -tp 1
       ).
 
Yep?

The SAME VM backup restore on the SAME hardware.

GZIP :

Code:
restore vma archive: zcat /mnt/2TB/Backup/dump/vzdump-qemu-103-2019_10_03-13_30_04.vma.gz | vma extract -v -r /var/tmp/vzdumptmp15468.fifo - /var/tmp/vzdumptmp15468
CFG: size: 503 name: qemu-server.conf
DEV: dev_id=1 size: 1048576 devname: drive-efidisk0
DEV: dev_id=2 size: 42949672960 devname: drive-scsi0
CTIME: Thu Oct  3 13:30:19 2019
new volume ID is 'local-zfs:vm-10000-disk-0'
map 'drive-efidisk0' to '/dev/zvol/rpool/data/vm-10000-disk-0' (write zeros = 0)
new volume ID is 'local-zfs:vm-10000-disk-1'
map 'drive-scsi0' to '/dev/zvol/rpool/data/vm-10000-disk-1' (write zeros = 0)
...
progress 100% (read 42950721536 bytes, duration 102 sec)
total bytes read 42950721536, sparse bytes 31440908288 (73.2%)
space reduction due to 4K zero blocks 1.95%
rescan volumes...
VM 10000: update disk 'efidisk0' information.
TASK OK


PIGZ ( after ln -nsf $(command -v pigz) $(command -v gzip ) run):

Code:
restore vma archive: zcat /mnt/2TB/Backup/dump/vzdump-qemu-103-2019_10_03-13_30_04.vma.gz | vma extract -v -r /var/tmp/vzdumptmp2892.fifo - /var/tmp/vzdumptmp2892
CFG: size: 503 name: qemu-server.conf
DEV: dev_id=1 size: 1048576 devname: drive-efidisk0
DEV: dev_id=2 size: 42949672960 devname: drive-scsi0
CTIME: Thu Oct  3 13:30:19 2019
new volume ID is 'local-zfs:vm-10000-disk-0'
map 'drive-efidisk0' to '/dev/zvol/rpool/data/vm-10000-disk-0' (write zeros = 0)
new volume ID is 'local-zfs:vm-10000-disk-1'
map 'drive-scsi0' to '/dev/zvol/rpool/data/vm-10000-disk-1' (write zeros = 0)
....
progress 100% (read 42950721536 bytes, duration 53 sec)
total bytes read 42950721536, sparse bytes 31440908288 (73.2%)
space reduction due to 4K zero blocks 1.95%
rescan volumes...
VM 10000: update disk 'efidisk0' information.
TASK OK

Did you see it? TWO times faster.

And yes. You can simply reproduce it on your own PVE.

Upd1. You can don't do "ln -nsf $(command -v pigz) $(command -v gzip)" . Just sed -i.bak "s/\(exec\) gzip/\1 pigz/" /bin/zcat
And then try to restore your VM from vzdump-*.vma.gz file
 
Last edited:
Did you see it? TWO times faster.

Did you do cleared the page cache inbetween? Else your second run just profits from the already warmed up cache from the first run (which is not a realistic scenario in practice):
Code:
echo 1 > /proc/sys/vm/drop_caches

You can simply reproduce it on your own PVE

Cannot reproduce that here on a smaller CT file, pigz is even slower (gzip 6.6s and pigz 7.8s).

For a bigger VM archive:
Code:
# echo 1 > /proc/sys/vm/drop_caches
# time pigz -k -d /var/lib/vz/dump/vzdump-qemu-108-2019_10_03-13_34_54.vma.gzch
real    2m40.046s
user    1m21.447s
sys    0m19.010s

# echo 1 > /proc/sys/vm/drop_caches
# time gzip -k -d /var/lib/vz/dump/vzdump-qemu-108-2019_10_03-13_34_54.vma.gz 

real    2m46.577s
user    1m21.338s
sys    0m18.941s

Here gzip is only 6 seconds slower.. But it will totally depend on the underlying storage(s) - they're probably the limiting factor.
What storage are you using? How fast is it? And was the source and destination storage from the backup different?
 
Restore:

GZIP:
Code:
restore vma archive: zcat /mnt/2TB/Backup/dump/vzdump-qemu-111-2019_01_02-16_05_13.vma.gz | vma extract -v -r /var/tmp/vzdumptmp9867.fifo - /var/tmp/vzdumptmp9867
CFG: size: 398 name: qemu-server.conf
DEV: dev_id=1 size: 34359738368 devname: drive-scsi0
CTIME: Wed Jan  2 16:05:14 2019
new volume ID is 'local-zfs:vm-10000-disk-0'
map 'drive-scsi0' to '/dev/zvol/rpool/data/vm-10000-disk-0' (write zeros = 0)
...
progress 100% (read 34359738368 bytes, duration 43 sec)
total bytes read 34359738368, sparse bytes 30586761216 (89%)
space reduction due to 4K zero blocks 4.78%
rescan volumes...
TASK OK

echo 1 > /proc/sys/vm/drop_caches
sed -i.bak "s/\(exec\) gzip/\1 pigz/" /bin/zcat

PIGZ:
Code:
restore vma archive: zcat /mnt/2TB/Backup/dump/vzdump-qemu-111-2019_01_02-16_05_13.vma.gz | vma extract -v -r /var/tmp/vzdumptmp16034.fifo - /var/tmp/vzdumptmp16034

CFG: size: 398 name: qemu-server.conf

DEV: dev_id=1 size: 34359738368 devname: drive-scsi0

CTIME: Wed Jan  2 16:05:14 2019

new volume ID is 'local-zfs:vm-10000-disk-0'

map 'drive-scsi0' to '/dev/zvol/rpool/data/vm-10000-disk-0' (write zeros = 0)

...

progress 100% (read 34359738368 bytes, duration 27 sec)

total bytes read 34359738368, sparse bytes 30586761216 (89%)

space reduction due to 4K zero blocks 4.78%

rescan volumes...

TASK OK


~ 63% faster with PIGZ

Run Restore from Web GUI - not from CLI.


What storage are you using? How fast is it? And was the source and destination storage from the backup different?

The SAME hardware.

Source - single SATA3 2TB 7200rpm 64MB cache disk with ext4 directly attached to PVE
Dest - ZFS RAID10 pool on 4x1TB 7200rpm 64MB cache disk

May be VZDUMP have some default limit for restore process?
 
Last edited:
2019-10-03 16_56_48-pve - Proxmox Virtual Environment - Vivaldi.png


As you see PIGZ restore VM .gz file with multicore.


Guys, who can check it ? It's important for ALL of us.
 
Last edited:
@werter did you try the new kernel yet? does it still exhibit this issue or not?