Corrupt backup? Unable to restore, but backups succeed

bguscott

Member
Sep 15, 2020
7
0
6
34
I have a problem right now where I tried to restore a VM to a backup that has been completing successfully, but upon restore shows as corrupted.

The backup is via gzip to a NFS-mounted directory within a Proxmox node running the latest updates (Virtual Environment 6.2-11, kernel 6.2-5). The VM running the NFS server is on another proxmox node.

On a regular basis the backup job completes successfully, as shown below:

Code:
INFO: starting new backup job: vzdump --mode snapshot --compress gzip --mailto REDACTED@gmail.com --node kuzotz --mailnotification always --quiet 1 --all 1 --storage giddeus
INFO: Starting Backup of VM 3010 (qemu)
INFO: Backup started at 2020-08-31 02:30:02
INFO: status = running
INFO: VM Name: altepa
INFO: include disk 'scsi0' 'vmdata-kuzotz:vm-3010-disk-0' 300G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'ab0d4e86-2a3f-499c-86e0-bb8806cb153a'
INFO: resuming VM again
INFO: status: 0% (58.0 MiB of 300.0 GiB), duration 3, read: 19.3 MiB/s, write: 19.0 MiB/s
INFO: status: 1% (3.0 GiB of 300.0 GiB), duration 112, read: 27.7 MiB/s, write: 20.3 MiB/s
INFO: status: 2% (6.0 GiB of 300.0 GiB), duration 272, read: 19.2 MiB/s, write: 19.0 MiB/s
INFO: status: 3% (9.0 GiB of 300.0 GiB), duration 432, read: 19.3 MiB/s, write: 19.3 MiB/s
INFO: status: 4% (12.0 GiB of 300.0 GiB), duration 593, read: 19.0 MiB/s, write: 19.0 MiB/s
INFO: status: 5% (15.0 GiB of 300.0 GiB), duration 752, read: 19.4 MiB/s, write: 19.4 MiB/s
(...)
INFO: status: 95% (285.0 GiB of 300.0 GiB), duration 15349, read: 18.5 MiB/s, write: 18.5 MiB/s
INFO: status: 96% (288.0 GiB of 300.0 GiB), duration 15516, read: 18.4 MiB/s, write: 18.4 MiB/s
INFO: status: 97% (291.0 GiB of 300.0 GiB), duration 15679, read: 18.9 MiB/s, write: 18.8 MiB/s
INFO: status: 98% (294.0 GiB of 300.0 GiB), duration 15840, read: 19.1 MiB/s, write: 19.1 MiB/s
INFO: status: 99% (297.0 GiB of 300.0 GiB), duration 16009, read: 18.2 MiB/s, write: 18.2 MiB/s
INFO: status: 100% (300.0 GiB of 300.0 GiB), duration 16176, read: 18.3 MiB/s, write: 18.3 MiB/s
INFO: transferred 300.00 GiB in 16176 seconds (19.0 MiB/s)
INFO: Backup is sparse: 0% (999.82 MiB) zero data
Warning: unable to close filehandle GEN60 properly: No space left on device at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 671.
INFO: archive file size: 235.81GB
INFO: delete old backup '/mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_24-02_30_02.vma.gz'
INFO: Finished Backup of VM 3010 (04:29:40)
INFO: Backup finished at 2020-08-31 06:59:42
INFO: Backup job finished successfully
TASK OK


However upon attempting to restore the VM, it yields the following error hinting at a corrupt backup:

Code:
restore vma archive: zcat /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz | vma extract -v -r /var/tmp/vzdumptmp609.fifo - /var/tmp/vzdumptmp609
CFG: size: 596 name: qemu-server.conf
DEV: dev_id=1 size: 322122547200 devname: drive-scsi0
CTIME: Mon Aug 31 02:30:02 2020
Formatting '/mnt/pve/giddeus/images/3010/vm-3010-disk-0.raw', fmt=raw size=322122547200
new volume ID is 'giddeus:3010/vm-3010-disk-0.raw'
map 'drive-scsi0' to '/mnt/pve/giddeus/images/3010/vm-3010-disk-0.raw' (write zeros = 0)
progress 1% (read 3221225472 bytes, duration 55 sec)
progress 2% (read 6442450944 bytes, duration 84 sec)
progress 3% (read 9663676416 bytes, duration 117 sec)
progress 4% (read 12884901888 bytes, duration 214 sec)
progress 5% (read 16106127360 bytes, duration 312 sec)
(...)
progress 85% (read 273804165120 bytes, duration 10983 sec)
progress 86% (read 277025390592 bytes, duration 11107 sec)
progress 87% (read 280246616064 bytes, duration 11246 sec)
progress 88% (read 283467841536 bytes, duration 11392 sec)
progress 89% (read 286689067008 bytes, duration 11541 sec)
progress 90% (read 289910292480 bytes, duration 11706 sec)

gzip: /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz: invalid compressed data--format violated
vma: restore failed - short vma extent (1146368 < 3801600)
/bin/bash: line 1:   620 Exit 1                  zcat /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz
       628 Trace/breakpoint trap   | vma extract -v -r /var/tmp/vzdumptmp609.fifo - /var/tmp/vzdumptmp609
temporary volume 'giddeus:3010/vm-3010-disk-0.raw' sucessfuly removed
no lock found trying to remove 'create'  lock
TASK ERROR: command 'set -o pipefail && zcat /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz | vma extract -v -r /var/tmp/vzdumptmp609.fifo - /var/tmp/vzdumptmp609' failed: exit code 133


I tried repairing the gzip (using gzrt) but extracting the .vma file also fails per below:

Code:
restore vma archive: vma extract -v -r /var/tmp/vzdumptmp12076.fifo /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma /var/tmp/vzdumptmp12076
CFG: size: 596 name: qemu-server.conf
DEV: dev_id=1 size: 322122547200 devname: drive-scsi0
CTIME: Mon Aug 31 02:30:02 2020
Logical volume "vm-3010-disk-0" created.
new volume ID is 'vmdata-kuzotz:vm-3010-disk-0'
map 'drive-scsi0' to '/dev/pve/vm-3010-disk-0' (write zeros = 0)
progress 1% (read 3221225472 bytes, duration 35 sec)
progress 2% (read 6442450944 bytes, duration 85 sec)
progress 3% (read 9663676416 bytes, duration 130 sec)
progress 4% (read 12884901888 bytes, duration 188 sec)
progress 5% (read 16106127360 bytes, duration 249 sec)
progress 6% (read 19327352832 bytes, duration 296 sec)
progress 7% (read 22548578304 bytes, duration 343 sec)
progress 8% (read 25769803776 bytes, duration 390 sec)
(...)
progress 85% (read 273804165120 bytes, duration 5265 sec)
progress 86% (read 277025390592 bytes, duration 5319 sec)
progress 87% (read 280246616064 bytes, duration 5380 sec)
progress 88% (read 283467841536 bytes, duration 5446 sec)
progress 89% (read 286689067008 bytes, duration 5530 sec)
progress 90% (read 289910292480 bytes, duration 5630 sec)
vma: restore failed - wrong vma extent header chechsum
Logical volume "vm-3010-disk-0" successfully removed
temporary volume 'vmdata-kuzotz:vm-3010-disk-0' sucessfuly removed
no lock found trying to remove 'create' lock
TASK ERROR: command 'set -o pipefail && vma extract -v -r /var/tmp/vzdumptmp12076.fifo /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma /var/tmp/vzdumptmp12076' failed: got signal 5


Any thoughts or suggestions on what the issue could be, or logs I could share to assist with troubleshooting?

Also, as the original VM was destroyed in the act of restoring, I'd be highly appreciative for any tips on how I can extract any contents out of the .vma. If all my backups are corrupted (which I suspect they may be--currently pulling an archived backup to confirm) I'm hoping for tips on how I could recover some of the data.

Thanks!
 
Warning: unable to close filehandle GEN60 properly: No space left on device at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 671.

does not sound like the backup went fine.. what kind of storage is the backup target? how is it mounted?
 
The backup target is NFS storage, mounted via the PVE GUI (under Datacenter). I agree it seems like it did not backup successfully, but PVE reported that the backup was successful.

I receive mail notifications on the backup job results (setup via the GUI) and I just noticed that the email log does not include the warnings from the logs in the PVE GUI. Here is the email I received with the same logs, but note the warning " unable to close filehandle GEN60 properly " doesn't appear here.


MIDNAMESTATUSTIMESIZEFILENAME
3010altepaOK04:29:40
235.81GB​
/mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz
TOTAL​
04:29:40235.81GB

Code:
Detailed backup logs:

vzdump --mode snapshot --compress gzip --mailtoREDACTED@gmail.com --node kuzotz --mailnotification always --quiet 1 --all 1 --storage giddeus


3010: 2020-08-31 02:30:02 INFO: Starting Backup of VM 3010 (qemu)
3010: 2020-08-31 02:30:02 INFO: status = running
3010: 2020-08-31 02:30:02 INFO: VM Name: altepa
3010: 2020-08-31 02:30:02 INFO: include disk 'scsi0' 'vmdata-kuzotz:vm-3010-disk-0' 300G
3010: 2020-08-31 02:30:02 INFO: backup mode: snapshot
3010: 2020-08-31 02:30:02 INFO: ionice priority: 7
3010: 2020-08-31 02:30:02 INFO: creating vzdump archive '/mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_31-02_30_02.vma.gz'
3010: 2020-08-31 02:30:02 INFO: issuing guest-agent 'fs-freeze' command
3010: 2020-08-31 02:30:02 INFO: issuing guest-agent 'fs-thaw' command
3010: 2020-08-31 02:30:02 INFO: started backup task 'ab0d4e86-2a3f-499c-86e0-bb8806cb153a'
3010: 2020-08-31 02:30:02 INFO: resuming VM again
3010: 2020-08-31 02:30:05 INFO: status: 0% (58.0 MiB of 300.0 GiB), duration 3, read: 19.3 MiB/s, write: 19.0 MiB/s
3010: 2020-08-31 02:31:54 INFO: status: 1% (3.0 GiB of 300.0 GiB), duration 112, read: 27.7 MiB/s, write: 20.3 MiB/s
3010: 2020-08-31 02:34:34 INFO: status: 2% (6.0 GiB of 300.0 GiB), duration 272, read: 19.2 MiB/s, write: 19.0 MiB/s
3010: 2020-08-31 02:37:14 INFO: status: 3% (9.0 GiB of 300.0 GiB), duration 432, read: 19.3 MiB/s, write: 19.3 MiB/s
3010: 2020-08-31 02:39:55 INFO: status: 4% (12.0 GiB of 300.0 GiB), duration 593, read: 19.0 MiB/s, write: 19.0 MiB/s
3010: 2020-08-31 02:42:34 INFO: status: 5% (15.0 GiB of 300.0 GiB), duration 752, read: 19.4 MiB/s, write: 19.4 MiB/s
(...)
3010: 2020-08-31 06:40:18 INFO: status: 93% (279.0 GiB of 300.0 GiB), duration 15016, read: 18.9 MiB/s, write: 18.9 MiB/s
3010: 2020-08-31 06:43:05 INFO: status: 94% (282.0 GiB of 300.0 GiB), duration 15183, read: 18.4 MiB/s, write: 18.4 MiB/s
3010: 2020-08-31 06:45:51 INFO: status: 95% (285.0 GiB of 300.0 GiB), duration 15349, read: 18.5 MiB/s, write: 18.5 MiB/s
3010: 2020-08-31 06:48:38 INFO: status: 96% (288.0 GiB of 300.0 GiB), duration 15516, read: 18.4 MiB/s, write: 18.4 MiB/s
3010: 2020-08-31 06:51:21 INFO: status: 97% (291.0 GiB of 300.0 GiB), duration 15679, read: 18.9 MiB/s, write: 18.8 MiB/s
3010: 2020-08-31 06:54:02 INFO: status: 98% (294.0 GiB of 300.0 GiB), duration 15840, read: 19.1 MiB/s, write: 19.1 MiB/s
3010: 2020-08-31 06:56:51 INFO: status: 99% (297.0 GiB of 300.0 GiB), duration 16009, read: 18.2 MiB/s, write: 18.2 MiB/s
3010: 2020-08-31 06:59:38 INFO: status: 100% (300.0 GiB of 300.0 GiB), duration 16176, read: 18.3 MiB/s, write: 18.3 MiB/s
3010: 2020-08-31 06:59:38 INFO: transferred 300.00 GiB in 16176 seconds (19.0 MiB/s)
3010: 2020-08-31 06:59:38 INFO: Backup is sparse: 0% (999.82 MiB) zero data
3010: 2020-08-31 06:59:41 INFO: archive file size: 235.81GB
3010: 2020-08-31 06:59:41 INFO: delete old backup '/mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_24-02_30_02.vma.gz'
3010: 2020-08-31 06:59:42 INFO: Finished Backup of VM 3010 (04:29:40)

In addition, I tried restoring from an archived backup and that also failed with the same error.

Code:
restore vma archive: zcat /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_24-02_30_02.vma.gz | vma extract -v -r /var/tmp/vzdumptmp22556.fifo - /var/tmp/vzdumptmp22556
CFG: size: 596 name: qemu-server.conf
DEV: dev_id=1 size: 322122547200 devname: drive-scsi0
CTIME: Mon Aug 24 02:30:03 2020
Logical volume "vm-3010-disk-0" created.
new volume ID is 'vmdata-kuzotz:vm-3010-disk-0'
map 'drive-scsi0' to '/dev/pve/vm-3010-disk-0' (write zeros = 0)
progress 1% (read 3221225472 bytes, duration 32 sec)
progress 2% (read 6442450944 bytes, duration 67 sec)
progress 3% (read 9663676416 bytes, duration 100 sec)
progress 4% (read 12884901888 bytes, duration 134 sec)
progress 5% (read 16106127360 bytes, duration 166 sec)
progress 6% (read 19327352832 bytes, duration 200 sec)
(...)
progress 89% (read 286689067008 bytes, duration 2857 sec)
progress 90% (read 289910292480 bytes, duration 2891 sec)
progress 91% (read 293131517952 bytes, duration 2924 sec)
progress 92% (read 296352743424 bytes, duration 2955 sec)
progress 93% (read 299573968896 bytes, duration 2988 sec)
progress 94% (read 302795194368 bytes, duration 3020 sec)
progress 95% (read 306016419840 bytes, duration 3053 sec)
progress 96% (read 309237645312 bytes, duration 3085 sec)
progress 97% (read 312458870784 bytes, duration 3117 sec)
progress 98% (read 315680096256 bytes, duration 3150 sec)

gzip: /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_24-02_30_02.vma.gz: invalid compressed data--format violated
vma: restore failed - short vma extent (2042880 < 3801600)
/bin/bash: line 1: 22558 Exit 1 zcat /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_24-02_30_02.vma.gz
22559 Trace/breakpoint trap | vma extract -v -r /var/tmp/vzdumptmp22556.fifo - /var/tmp/vzdumptmp22556
Logical volume "vm-3010-disk-0" successfully removed
temporary volume 'vmdata-kuzotz:vm-3010-disk-0' sucessfuly removed
no lock found trying to remove 'create' lock
TASK ERROR: command 'set -o pipefail && zcat /mnt/pve/giddeus/dump/vzdump-qemu-3010-2020_08_24-02_30_02.vma.gz | vma extract -v -r /var/tmp/vzdumptmp22556.fifo - /var/tmp/vzdumptmp22556' failed: exit code 133
 
There have been some changes to this part of the code recently, so it would be great to know your exact version numbers. Could you please post the following?
Code:
sed -n '660,680p' /usr/share/perl5/PVE/VZDump/QemuServer.pm
pveversion -v
qm config 3010
cat /etc/pve/storage.cfg
mount | grep nfs

Could you, just to be sure, try to create the backup without compression?
 
Last edited:
For sure, please see below and let me know if you'd like me to run anything else.

Regarding backing up without compression, that seemed to be the fix from another thread so I changed my backup schedule going forward. Right now, however, I'm trying to recover the VM (if possible) as I hadn't expected the backups would be corrupted.

If I'm not able to recover from one of my recent backups, I will be rebuilding from a cold-storage archive and then I can try backing up without compression.

Code:
root@kuzotz:~# sed -n '660,680p' /usr/share/perl5/PVE/VZDump/QemuServer.pm
        if ($cpid) {
            POSIX::close($outfileno) == 0 ||
                die "close output file handle failed\n";
        }

        die "got no uuid for backup task\n" if !defined($backup_job_uuid);

        $self->loginfo("started backup task '$backup_job_uuid'");

        $self->resume_vm_after_job_start($task, $vmid);

        $query_backup_status_loop->($self, $vmid, $backup_job_uuid);
    };
    my $err = $@;
    if ($err) {
        $self->logerr($err);
        $self->mon_backup_cancel($vmid) if defined($backup_job_uuid);
    }

    $self->restore_vm_power_state($vmid);

root@kuzotz:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.55-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-5
pve-kernel-helper: 6.2-5
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-10
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-2
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-12
pve-xtermjs: 4.7.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1
root@kuzotz:~# qm config 3010
memory: 128
root@kuzotz:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso
        maxfiles 2
        shared 0

zfspool: local-zfs
        pool rpool/data
        content images,rootdir
        nodes jeuno
        sparse 1

lvmthin: vmdata-root
        thinpool vmdata_pool
        vgname vmdata
        content rootdir,images
        nodes jeuno

dir: vmdata-images
        path /mnt/thin_vol_images
        content vztmpl,iso,backup
        maxfiles 1
        shared 0

dir: vmdir
        path /mnt/pve/vmdir
        content vztmpl,images,iso,rootdir,backup
        is_mountpoint 1
        maxfiles 0
        nodes jeuno
        shared 0

lvm: vmdata-nfs
        vgname vmdata-nfs
        content images,rootdir
        nodes jeuno
        shared 0

lvmthin: vmdata-kuzotz
        thinpool data
        vgname pve
        content rootdir,images
        nodes kuzotz

nfs: giddeus
        export /opt/share
        path /mnt/pve/giddeus
        server 10.0.10.14
        content backup,snippets,rootdir,iso,images,vztmpl
        maxfiles 1
        nodes kuzotz

root@kuzotz:~# mount | grep nfs
10.0.10.14:/opt/share on /mnt/pve/giddeus type nfs4 (rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.10.11,local_lock=none,addr=10.0.10.14)
 
Thank you!

Could you perform an update of your system? Currently all repositories (including enterprise) should have version 6.2-14 of qemu-server. Otherwise we might be trying to fix something that has already changed.
 
Following up:

My older (1+ month old) backups also appear to be corrupted - proxmox fails extracting them at around 90% again. I will be rebuilding the server from my archived 6+ month old backup, then I will re-run the backup with the new code to see if it still happens.

Question: how does the CRC/validation work with this backup process? I'm surprised that my logs have been reporting successful backups for months when in fact it seems that none of them were saved correctly.
 
there is no verification after writing - the backups are checksummed in the format to detect bit-rot. there seems to be / have been an issue where a failure to sync out the written archive can go by undetected (possibly only on some file systems/mount settings?)..

please report back whether the issue still occurs with recent versions (and if possible, also give more details how your backup storage and vzdump.conf look like!)
 
Thanks fabian. That could be the case - could a feature request be submitted to detect when such a write issue may happen? I imagine it may be tricky to catch.

Regarding the backup storage setup in question: my PVE node1 has a NFS drive mounted via the PVE GUI, and that NFS storage is provided by a VM on PVE node2 using a LVM SSD.

Essentially:

Code:
PVE1
|- VM1 - NFS server using volume on Storage1
|- Storage1 - LVM on SSD

PVE2
|- VM2 - server to be backed up
|- Storage2 - VM1 NFS mount on PVE2 (via GUI)

As for vzdump.conf:

Code:
root@jeuno:~# cat /etc/vzdump.conf
# vzdump default settings

#tmpdir: DIR
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
#stdexcludes: BOOLEAN
#mailto: ADDRESSLIST
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST
#pigz: N

Is there some other vzdump.conf or other info you need? Otherwise, I will be conducting a test this upcoming weekend on a new backup to see if the issue persists.
 
Last edited:
Otherwise, I will be conducting a test this upcoming weekend on a new backup to see if the issue persists.

Have you found something? Have you upgraded to the latest version already?

One additional thing to check could be manually verifying the backup
  1. Manually decompressing the backup, then
  2. vma verify the decompressed .vma file (run vma help or see its wiki page).
Another detail: The backup logs of VM 3010 contains i.a. disks. However, qm config 3010 does not show any disks (only memory). Have you changed anything there?
 
I updated to the latest version and backed up to NFS again (using zst this time) where a restore worked. It's however a lot smaller (~90GB) compared to the previous failing gzip backup which was ~250GB.

Thanks for the suggestions. Does vma verify get run before it's compressed? One of my gzip backups was able to be uncompressed, but failed the vma extract process. This hints that something became corrupted before it was gzipped.

Regarding the missing disk that's due to the restore process wiping everything, so please disregard that.
 
(using zst this time) where a restore worked
That's a start.
Does vma verify get run before it's compressed?
Had to look in the code myself, but I couldn't find it => I don't think so.

It's however a lot smaller (~90GB) compared to the previous failing gzip backup which was ~250GB
It is not surprising that zstd made it smaller than gzip. For Linux for example, I found a benchmark with 117 vs 177 MB. Could you maybe create another gzip backup to check if restoring it now works too and also for the file size?
 
Following up;

I made a gzip backup and restore seems to work again. I'm honestly not sure where the backups were failing (as it was consistently failing) before, but it seems to be all right now after the PVE update. I've also lowered the size of the VM root disk in case that had some impact before (from 300GB down to 150 GB).

For insurance, could a feature request be created to run vma verify before proceeding with the compression? I'm not familiar with the CRC process for zstd but I'd also like to ask if there is a post-compression check as well.

Anyways, thanks so much for the help Dominic!

Code:
Proxmox
Virtual Environment 6.2-11
Virtual Machine 108 (altepa) on node 'kuzotz'
Status
 
stopped
HA State
 
none
Node
 
kuzotz
CPU usage
 
0.00% of 8 CPU(s)
Memory usage
 
0.00% (0 B of 24.00 GiB)
Bootdisk size
 
150.00 GiB
IPs
Guest Agent not running
scsi1: vmdata-kuzotz:vm-3010-disk-2,backup=0,discard=on,size=200G
Logs
()
restore vma archive: zcat /mnt/pve/backup-vms/dump/vzdump-qemu-3010-2020_10_03-18_19_34.vma.gz | vma extract -v -r /var/tmp/vzdumptmp26628.fifo - /var/tmp/vzdumptmp26628
CFG: size: 525 name: qemu-server.conf
DEV: dev_id=1 size: 161061273600 devname: drive-scsi0
CTIME: Sat Oct  3 18:19:34 2020
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "vm-108-disk-0" created.
  WARNING: Sum of all thin volume sizes (500.00 GiB) exceeds the size of thin pool pve/data and the size of whole volume group (<476.44 GiB).
new volume ID is 'vmdata-kuzotz:vm-108-disk-0'
map 'drive-scsi0' to '/dev/pve/vm-108-disk-0' (write zeros = 0)
progress 1% (read 1610612736 bytes, duration 9 sec)
progress 2% (read 3221225472 bytes, duration 20 sec)
progress 3% (read 4831838208 bytes, duration 24 sec)
(...)
progress 96% (read 154618822656 bytes, duration 1818 sec)
progress 97% (read 156229435392 bytes, duration 1843 sec)
progress 98% (read 157840048128 bytes, duration 1870 sec)
progress 99% (read 159450660864 bytes, duration 1883 sec)
progress 100% (read 161061273600 bytes, duration 1883 sec)
total bytes read 161061273600, sparse bytes 45561749504 (28.3%)
space reduction due to 4K zero blocks 0.18%
rescan volumes...
TASK OK
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!