Get more details of failed backup

promoxer · May 1, 2023

Bash:

()
INFO: starting new backup job: vzdump 100 --compress zstd --remove 0 --notes-template stable2 --storage local --node pve --mode snapshot
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2023-05-01 12:30:56
INFO: status = running
INFO: VM Name: windows
INFO: include disk 'scsi0' 'local-zfs:vm-100-disk-1' 80G
INFO: exclude disk 'scsi5' '/dev/disk/by-id/nvme-GIGABYTE_GP-GSM2NE3100TNTD_SN200908905007' (backup=no)
INFO: exclude disk 'scsi6' '/dev/disk/by-id/usb-Seagate_BUP_RD_NA9FPY7F-0:0' (backup=no)
INFO: include disk 'efidisk0' 'local-zfs:vm-100-disk-0' 1M
INFO: include disk 'tpmstate0' 'local-zfs:vm-100-disk-2' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-100-2023_05_01-12_30_56.vma.zst'
INFO: attaching TPM drive to QEMU for backup
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'e4c98bd9-200b-44c9-8bf5-5e4f8978fe46'
INFO: resuming VM again
INFO:   0% (721.4 MiB of 80.0 GiB) in 3s, read: 240.5 MiB/s, write: 224.6 MiB/s
INFO:   1% (1.2 GiB of 80.0 GiB) in 6s, read: 175.2 MiB/s, write: 174.7 MiB/s
INFO:   2% (1.8 GiB of 80.0 GiB) in 9s, read: 200.6 MiB/s, write: 196.6 MiB/s
INFO:   3% (2.5 GiB of 80.0 GiB) in 13s, read: 178.5 MiB/s, write: 178.1 MiB/s
INFO:   3% (2.7 GiB of 80.0 GiB) in 15s, read: 124.3 MiB/s, write: 123.2 MiB/s
ERROR: job failed with err -125 - Operation canceled
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 100 failed - job failed with err -125 - Operation canceled
INFO: Failed at 2023-05-01 12:31:14
INFO: Backup job finished with errors
TASK ERROR: job errors

From /var/log/daemon.log

Bash:

May  1 12:45:18 pve pvedaemon[2474]: <root@pam> starting task UPID:pve:0002A71C:00023BE0:644F43DE:vzdump:100:root@pam:
May  1 12:45:18 pve pvedaemon[173852]: INFO: starting new backup job: vzdump 100 --remove 0 --compress zstd --mode snapshot --node pve --storage local --notes-template stable2
May  1 12:45:18 pve pvedaemon[173852]: INFO: Starting Backup of VM 100 (qemu)
May  1 12:45:35 pve zed: eid=15 class=checksum pool='rpool' vdev=nvme-eui.0025385521403c96-part3 algorithm=fletcher4 size=8192 offset=1208656699392 priority=0 err=52 flags=0x380880 bookmark=15585:1:0:389466
May  1 12:45:40 pve zed: eid=16 class=data pool='rpool' priority=0 err=52 flags=0x8881 bookmark=15585:1:0:530503
May  1 12:45:40 pve zed: eid=17 class=checksum pool='rpool' vdev=nvme-eui.0025385521403c96-part3 algorithm=fletcher4 size=8192 offset=1209360121856 priority=0 err=52 flags=0x380880 bookmark=15585:1:0:530503
May  1 12:45:41 pve pvedaemon[173852]: ERROR: Backup of VM 100 failed - job failed with err -125 - Operation canceled
May  1 12:45:41 pve pvedaemon[173852]: INFO: Backup job finished with errors
May  1 12:45:41 pve pvedaemon[173852]: job errors
May  1 12:45:41 pve pvedaemon[2474]: <root@pam> end task UPID:pve:0002A71C:00023BE0:644F43DE:vzdump:100:root@pam: job errors
May  1 12:45:57 pve sniproxy[2069]: Request from [::ffff:193.118.53.210]:39142 did not include a hostname

One of my backup is failing, where can I get more info about it?
There is plenty of storage as it is a new setup.

promoxer · May 1, 2023

Bash:

ZSTD-compressed data is corrupt

System halted

Can't boot PVE now, Rescue Boot from installer also not helping, says "error: compression algo 68 not supported"

LnxBil · May 1, 2023

Something went sideways ... maybe a failed disk?

promoxer · May 1, 2023

Yea, that's why I was quite reluctant to use ZFS in the first place. The hardware seems fine, it's probably due to my power cycle because proxmox could not reboot for over 30 minutes.

Nuke Bloodaxe · May 1, 2023

With error 68, that will be because the rescue disk has support for an older version of ZFS, without support for the newer ZFS version. I suspect, though, you're running the most recent version of the proxmox iso?

Now, I've suffered this pain before myself, the method I went on to use was to:
1) mount the old proxmox system using another proxmox installation (use any disk for that, as long as it is added to the system, and make sure it's fully updated.) [The idea is to access the old proxmox installation, not the other ZFS partitions for containers/VMs]
2) Copy the configuration files out of the disk (unless you have these already).
3) replace the config on the rescue copy with the config files you have acquired.
4) pray you'll be able to mount the ZFS partitions to attempt to copy the data you need; reboot.

If you're really lucky, you'll be able to see VMs and containers on the ZFS partitions.
Note: I absolutely do not recommend this method, but it worked for me. Also, if you can make a dd'd copy of the disks concerned, and work on the copies, then you don't have to worry about touching the original data.

promoxer · May 1, 2023

Oh.. the boot disk was 7.3 while I updated to 7.4, I suppose I need the newest installer.

Anyway, I have moved ahead and reinstalled PVE on another disk and managed to mount the ZFS, with 2 problems left:

1. Is there a way to restore the conf files from config.db?
2. I have the disks from earlier, rpool/data/vm-1xx-disk-x, is there a way to re-use them in a new VM?

promoxer · May 1, 2023

Speaking of which, convenient to share your PVE disk structure?

I had everything on in 1 NVMe using ZFS.

I'm planning to have PVE on 1 SSD, and have all VM files on the NVMe like what I had. Anything good or bad about this?
My main considerations are:

1. easy to recover, no extensive knowledge or training required
2. recoverable part by part, i.e. I get PVE re-installed first, then bring back all/selected VMs

Nuke Bloodaxe · May 1, 2023

Yes, the config.db should have the settings for those VMs.
So, you need to mount the old proxmox installation and acquire the config.db from /var/lib/config.db
It's sort of being covered in this thread over here: https://forum.proxmox.com/threads/how-to-mount-a-zfs-drive-from-promox.37104/

Oh, regarding PVE at home, I keep the main installation on an NVMe [no ZFS there], and have ZFS set up in 2 disk mirror, plus a ZIL/READ cache device.

In terms of production, I have a hardware RAID 10 of 4 SSDs where LVM-Thin, local and Proxmox live. A 2 disk ZFS mirror for bulk data storage, which also has an overlay directory for backup storage.

Nuke Bloodaxe · May 1, 2023

I should add, not anything bad, really, but keep in mind bad vs good does depend on what you are running in the VMs. Lots of writing could be a problem.

promoxer · May 1, 2023

Bash:

root@pve:/pve-1/var/lib/pve-cluster# zfs list
NAME                             USED  AVAIL     REFER  MOUNTPOINT
rpool                            332G  1.43T      104K  /rpool
rpool/ROOT                       141G  1.43T       96K  /rpool/ROOT
rpool/ROOT/pve-1                 141G  1.43T     58.6G  /pve-1
rpool/ROOT/pve-1/vm-100-disk-3  82.5G  1.50T     16.5G  -
rpool/ROOT/pve-1/vm-100-disk-4     6M  1.43T       84K  -
rpool/data                       153G  1.43T       96K  /rpool/data
rpool/data/vm-101-disk-0        14.2G  1.43T     14.2G  -
rpool/data/vm-101-disk-1         106G  1.43T      106G  -
rpool/data/vm-101-disk-2        7.79G  1.43T     7.79G  -
rpool/data/vm-102-disk-0          56K  1.43T       56K  -
rpool/data/vm-102-disk-1        6.79G  1.43T     6.79G  -
rpool/data/vm-103-disk-0         164K  1.43T      164K  -
rpool/data/vm-103-disk-1          68K  1.43T       68K  -
rpool/data/vm-103-disk-2        17.8G  1.43T     17.8G  -
root@pve:/pve-1/var/lib/pve-cluster#

Ok, I have the disks from earlier, rpool/data/vm-1xx-disk-x, do you know if there is a way to re-use them in a new VM?

promoxer · May 1, 2023

It is not really anything intensive, development server for coders. Good and bad in terms of ease of recovery/restoration.

lessfoobar · May 1, 2023

promoxer said:

Bash:

root@pve:/pve-1/var/lib/pve-cluster# zfs list
NAME                             USED  AVAIL     REFER  MOUNTPOINT
rpool                            332G  1.43T      104K  /rpool
rpool/ROOT                       141G  1.43T       96K  /rpool/ROOT
rpool/ROOT/pve-1                 141G  1.43T     58.6G  /pve-1
rpool/ROOT/pve-1/vm-100-disk-3  82.5G  1.50T     16.5G  -
rpool/ROOT/pve-1/vm-100-disk-4     6M  1.43T       84K  -
rpool/data                       153G  1.43T       96K  /rpool/data
rpool/data/vm-101-disk-0        14.2G  1.43T     14.2G  -
rpool/data/vm-101-disk-1         106G  1.43T      106G  -
rpool/data/vm-101-disk-2        7.79G  1.43T     7.79G  -
rpool/data/vm-102-disk-0          56K  1.43T       56K  -
rpool/data/vm-102-disk-1        6.79G  1.43T     6.79G  -
rpool/data/vm-103-disk-0         164K  1.43T      164K  -
rpool/data/vm-103-disk-1          68K  1.43T       68K  -
rpool/data/vm-103-disk-2        17.8G  1.43T     17.8G  -
root@pve:/pve-1/var/lib/pve-cluster#

Ok, I have the disks from earlier, rpool/data/vm-1xx-disk-x, do you know if there is a way to re-use them in a new VM?

you have the disks from your earlier config, if you don't have the /etc/pve/qemu-server/$VMID.conf just recreate the vms to use the same disks as before, your data is there

Nuke Bloodaxe · May 1, 2023

Fortunately, those are not too big. You could copy them to the new installation data directories; or mount and reuse.
If you have and use the config.db from the old installation [make sure you keep a copy of this] it should be possible to edit the locations of the files for each VM on the new installation; see where I'm going with this?

Example structure for qemu:

Example structure for lxc:

promoxer · May 1, 2023

Thanks, but

lessfoobar said:
you have the disks from your earlier config, if you don't have the /etc/pve/qemu-server/$VMID.conf just recreate the vms to use the same disks as before, your data is there

Thanks, but I have no idea how to do what you are describing. Can it be done from the UI? I can't see any of the previous disks when adding a hard disk.

Nuke Bloodaxe · May 1, 2023

Each of those conf files contains all the VM details, including the file structure; it's an example. You can adjust those as needed on your new install.

promoxer · May 1, 2023

The conf files have a line scsi0: ss:vm-100-disk-3,iothread=1,size=80G. It doesn't contain the path though.

Do you mean it is enough to just edit the name to match what I see on my zfs status and it will magically attach?

Nuke Bloodaxe · May 1, 2023

Let's say 100 was on ZFS-BULK:

I could change the entry to ZFS-Bulk, but that's assuming PVE can see it as an entry.

promoxer · May 1, 2023

My ZFS seems to have issues that I can't resolve, i.e. I still can't perform back ups. It get interrupted halfway.

I think it's easiest for me to take what I need and reformat everything.

Is there a way to make a copy of the rpool/data/vm-1xx-disk-x so that they can be re-used elsewhere?

Nuke Bloodaxe · May 1, 2023

You can literally copy those files to another location, if you have sftp access to the proxmox server you can download them to your desktop using a tool such as Filezilla.
(I actually use Filezilla extensively to grab copies of local backups. No reason why you can do the same with the target VMs/LXCs, connect and navigate to the folders containing them, download.)

In my case, during a recovery, I attached a USB drive to the Proxmox system and mounted it, then exported copies of a VM; this was a ZFS rescue though, but copying VM files won't be that much of an issue.

Note: Actually, see the next message.

Nuke Bloodaxe · May 1, 2023

I have some notes from my evil rescue:

Code:

Rescueing:
qemu-img create -f qcow2  /mnt/USBBACKUP/images/100/vm-100-disk-0.qcow2 6000G
/usr/bin/qemu-img convert --salvage -p -n -T none -f raw -O qcow2 /dev/zvol/rpool/data/vm-100-disk-0 zeroinit:/mnt/USBBACKUP/images/100/vm-100-disk-0.qcow2

Worth learning from.

Get more details of failed backup

Member

Member

Distinguished Member

Member

Active Member

Member

Member

Active Member

Active Member

Member

Member

New Member

Active Member

Attachments

Member

Active Member

Member

Active Member

Member

Active Member

Active Member