Error restoring backups

Omega Destroyer · Apr 10, 2023

I backed up a Ubuntu VM on Proxmox 7.3-3

I'm trying to restore from several of the backups and they all give me this error. It's not a lack of drive space issue, as the VM was previously on the same disks at it fit fine. I've emptied all the disks that are used (other than local and local-lvm since I have other VMs on there. The entire backup is less than 30 gigs, and all the drives have significantly more space than that.








()






restore vma archive: zstd -q -d -c /mnt/pve/Backups/dump/vzdump-qemu-102-2023_03_05-01_04_37.vma.zst | vma extract -v -r /var/tmp/vzdumptmp92267.fifo - /var/tmp/vzdumptmp92267
CFG: size: 779 name: qemu-server.conf
DEV: dev_id=1 size: 540672 devname: drive-efidisk0
DEV: dev_id=2 size: 75161927680 devname: drive-virtio0
DEV: dev_id=3 size: 2147483648000 devname: drive-virtio1
CTIME: Sun Mar  5 01:04:39 2023
  Rounding up size to full physical extent 4.00 MiB
  Logical volume "vm-102-disk-0" created.
new volume ID is 'local-lvm:vm-102-disk-0'
  Logical volume "vm-102-disk-1" created.
new volume ID is 'local-lvm:vm-102-disk-1'
Formatting '/mnt/pve/NextcloudDB/images/102/vm-102-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=off compression_type=zlib size=2147483648000 lazy_refcounts=off refcount_bits=16
new volume ID is 'NextcloudDB:102/vm-102-disk-0.qcow2'
map 'drive-efidisk0' to '/dev/pve/vm-102-disk-0' (write zeros = 0)
map 'drive-virtio0' to '/dev/pve/vm-102-disk-1' (write zeros = 0)
map 'drive-virtio1' to '/mnt/pve/NextcloudDB/images/102/vm-102-disk-0.qcow2' (write zeros = 0)
vma: restore failed - blk_pwrite to  failed (-5)
/bin/bash: line 1: 92269 Broken pipe             zstd -q -d -c /mnt/pve/Backups/dump/vzdump-qemu-102-2023_03_05-01_04_37.vma.zst
     92270 Trace/breakpoint trap   | vma extract -v -r /var/tmp/vzdumptmp92267.fifo - /var/tmp/vzdumptmp92267
  Logical volume "vm-102-disk-1" successfully removed
temporary volume 'local-lvm:vm-102-disk-1' sucessfuly removed
  Logical volume "vm-102-disk-0" successfully removed
temporary volume 'local-lvm:vm-102-disk-0' sucessfuly removed
temporary volume 'NextcloudDB:102/vm-102-disk-0.qcow2' sucessfuly removed
no lock found trying to remove 'create'  lock
error before or during data restore, some or all disks were not completely restored. VM 102 state is NOT cleaned up.
TASK ERROR: command 'set -o pipefail && zstd -q -d -c /mnt/pve/Backups/dump/vzdump-qemu-102-2023_03_05-01_04_37.vma.zst | vma extract -v -r /var/tmp/vzdumptmp92267.fifo - /var/tmp/vzdumptmp92267' failed: exit code 133

Omega Destroyer · Apr 10, 2023

Additional information on the disks in question.

chwl · Apr 10, 2023

Same with me. Made some backups today and reinstalled proxmox on the same Hardware. The restore of the smaller VMs are no problem, the big on (ubuntu 1.5 TB) compressed 8xx GB always ends at 31% with the same error. Any solutions?

fiona · Apr 12, 2023

Hi,

Omega Destroyer said:
vma: restore failed - blk_pwrite to failed (-5)

5 is the error number for EIO 5 Input/output error which usually points to an issue with the underlying storage. This error message here is printed when writing to the target storage fails. Unfortunately, the target disk name is missing from the error message, so it's not clear if writing to the LVM-thin or Nextcloud mount failed. Can you check if the involved disks are healthy? I'd also look in /var/log/syslog from around the time of the restore for any errors.

Does selecting another storage during restore work?

Omega Destroyer · Apr 15, 2023

@fiona Thanks so much for this tip, it set me on the right path after chasing my tail from various other suggestions throughout the internet.

I went through all t he disks that are involved in this VM and traced it down to the NextcloudDB disk which is shared via SMB through TrueNAS on a different server.

After going through lots of troubleshooting (deleting shares and datasets on TrueNAS), re-adding the shares in Proxmox etc, I traced the problem.

1) If preallocation for the share is turned off - I receive the pwrite to failed (-5) error
2) If I enable preallocation, because it's a network share, the restore starts fine, but fails after the timeout period because it can't create a lock

Proxmox definitely has full access to that SMB share (I can create files and directories in the Proxmox shell onto that drive), and when it starts the backup with Pre-allocation turned on, it creates the right files.

Is this a bug? I can't seem to find a way around this issue.

fiona · Apr 17, 2023

I'd check the physical disk used by the SMB server and logs on that server about any I/O errors.

With preallocation on, you probably just run into a timeout, because the allocation of the disk takes too long (and it doesn't even begin to write the disk for restoring, so you don't even reach the actual error).

Omega Destroyer · Apr 17, 2023

I think it's extremely unlikely that is an issue with the SMB share. My thinking:

1) The share is a mirrored TrueNAS ZFS of 4 drives and I haven't been alerted of any issues with any of the disk. Even if one disk had an error, TrueNAS can still write to the remaining disks with a single offline disk.

2) There are other SMB shares on that same mirrored array that are working fine (for other purposes). If there was an I/O problem, the other shares would also be affected.

3) If there was an I/O problem, then the restore should give the same error with pre-allocation turned off or on. However, I only get error 5 with pre-allocation turned off. When pre-allocation is on, it starts the restore process and creates the necessary files, but times out - suggesting that writing to the share is not the issue.

4) With pre-allocation turned off. I can still write to the share via Proxmox console (create new directories and files) - again suggesting that the share works fine.

5) Related to 4 - when pre-allocation is off, the error comes up immediately after an attempted restore - suggesting that nothing (or very little) is written to the disk.

fiona · Apr 18, 2023

Omega Destroyer said:
4) With pre-allocation turned off. I can still write to the share via Proxmox console (create new directories and files) - again suggesting that the share works fine.

But in this case probably nothing is written, because the restore doesn't get that far.

Omega Destroyer said:
5) Related to 4 - when pre-allocation is off, the error comes up immediately after an attempted restore - suggesting that nothing (or very little) is written to the disk.

Well, maybe the first write already fails. I'm not saying it's necessarily a hardware issue with your share. And maybe the local-lvm storage is the issue here. Again, the error message happens when writing during restore fails, but unfortunately it doesn't include the disk name.

Try selecting the LVM-Thin storage as the target storage for restore, and see if the error happens. Of course you should abort the restore if the storage is not big enough to avoid it getting full once you see that the error didn't happen. Then try selecting the Nextcloud mount as the target storage for restore, and see if the error happens. You could also create a dummy VM, back it up and restore that to the different storages to test.

Omega Destroyer · Apr 20, 2023

@fiona. Thank you for your continued support.

My apologies, but I should have been more clear. When you mentioned that error 5 was a write error in your first post, I tested all the disks, both through SMART tools and through manually writing to them.

I also was able to restore to all the disks except the SMB share labelled NextcloudDB. I was even able to restore to a different network share on the same TrueNAS server. The only one that gave me the error was the NextcloudDB.

Over the past day, I also wrote many TB of data onto the SMB share (same set of disks, different share) from a guest OS on the same node. All the files integrity were verified after writing and there were no errors.

Furthermore, after restoring the backup to a different drive, I can manually move the disk image to the NextcloudDB share with no errors.

However, the issue only arises when restoring the backup based on the configuration that was originally used to create the backup, and only when pre-allocation is on.

The method of restoring to a different disk then moving the storage manually to where it belongs works as a workaround, but I still think this might be a bug as opposed to a config or other disk issue.

fiona · Apr 26, 2023

Omega Destroyer said:
@fiona. Thank you for your continued support.

My apologies, but I should have been more clear. When you mentioned that error 5 was a write error in your first post, I tested all the disks, both through SMART tools and through manually writing to them.

I also was able to restore to all the disks except the SMB share labelled NextcloudDB. I was even able to restore to a different network share on the same TrueNAS server. The only one that gave me the error was the NextcloudDB.

Over the past day, I also wrote many TB of data onto the SMB share (same set of disks, different share) from a guest OS on the same node. All the files integrity were verified after writing and there were no errors.

Furthermore, after restoring the backup to a different drive, I can manually move the disk image to the NextcloudDB share with no errors.

Omega Destroyer said:
However, the issue only arises when restoring the backup based on the configuration that was originally used to create the backup, and only when pre-allocation is on.

Omega Destroyer said:
5) Related to 4 - when pre-allocation is off, the error comes up immediately after an attempted restore - suggesting that nothing (or very little) is written to the disk.

Did you have a successful restore with pre-allocation off now? Or did you get the earlier error again? In the latter case it's not clear that the write issue wouldn't happen with pre-allocation off too (the restore just doesn't get that far).

Omega Destroyer said:
The method of restoring to a different disk then moving the storage manually to where it belongs works as a workaround, but I still think this might be a bug as opposed to a config or other disk issue.

I'm not aware of other reports of such an issue currently. And what you wrote does indicate that it is somehow specific to your NextCloud storage. Can you create a backup of some other VM (I'd test both, a smaller VM and a second VM with a large empty disk) and restore it to the NextCloud storage?

Search

Search

Error restoring backups

Omega Destroyer

New Member

Omega Destroyer

New Member

chwl

Member

fiona

Proxmox Staff Member

Omega Destroyer

New Member

fiona

Proxmox Staff Member

Omega Destroyer

New Member

fiona

Proxmox Staff Member

Omega Destroyer

New Member

fiona

Proxmox Staff Member

We value your privacy