Very Serious Backup/Restore Issue in PVE 5.0

vlab

New Member
Oct 16, 2017
8
0
1
27
Over the weekend I blew away two Proxmox hosts (clustered) and reinstalled the hypervisors from scratch (two single nodes). Because I had backups of everything, I thought I'd be fine to just restore the backup files and continue working.

The hosts have multiple backups of most containers/VM's that were on these machines (stored on NFS via network). The containers restored normally, however not a single VM file was recoverable on the new hosts. In the past I have restored VM's and containers to the original hosts, just not the new ones.

I've tried everything and am at my wit's end. This is entirely unacceptable...what is the point of making backups if they are totally unusable? I have looked to this forum for advice, and most of the posts seem to be saying the files are just corrupted and totally useless. I have tried unzipping manually, it still gave me the same error:

** (process:7029): ERROR **: restore failed - wrong vma extent header chechsum

This error has been reported for at least a year, with still no fix in production, as far as I can see. Even with entire distribution upgrades!

Needless to say I am very frustrated and disappointed - I feel as though there was absolutely no purpose in even backing up my VM's in the first place!

If I am incorrect and these VM's are easily recoverable, someone please let me know. I do apologize if this comes off in a nasty way, but in this situation, all VM data appears lost because of the failed checksum, and this has been a known issue for years, which has not been fixed. I hope someone proves me wrong, because I love the platform, but this is kind of a dealbreaker if it can't be fixed....



[EDIT 1:]
As suggested, I attempted upgrading to a test version, rebooting the host, and restoring again. This did not make a difference.

https://forum.proxmox.com/threads/c...re-failed-wrong-vma-extent.33820/#post-166109

[UPDATE:]
I was now able to restore 2 of my ~10 machines. However the rest still give me the same "chechsum" error. I am not sure what certain users mean by "unreliable" storage; the NFS is on a FreeNAS server and I am not sure how these specific files (only for VM's on Proxmox) would have been the only files affected on the whole RAID drive for that server. I guess it's possible, but it really seems like there would be more than exclusively VM files being affected were that to be the case..

I have also used the backup/restore feature many times in the past without issue. My main concern is that somehow they were linked to the original host/cluster, as I was able to restore (even some of the same files) multiple times until I reinstalled Proxmox.
 
Last edited:
I am not aware of such a bug. We had one in the beta of 5.0, but this one was fixed fast.

I assume you missed updating your hosts.
 
I did not miss updating my hosts; I actually downloaded the latest ISO from the Proxmox site, burned it to a USB and installed it. I then immediately updated everything.


proxmox-ve: 5.0-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.0-33 (running version: 5.0-33/23132090)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-14
qemu-server: 5.0-16
pve-firmware: 2.0-3
libpve-common-perl: 5.0-19
libpve-guest-common-perl: 2.0-12
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-15
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.1-1
pve-container: 2.0-16
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
 
Please create a new backup and to a test restore with this version, is this working?
 
If you made a backup with that bad qemu version, it is corrupted no matter what you do. Is that the one that you are trying to restore to the new beta you set up?
 
This kind of error (chechsum error) is not an error of our backup system. Instead, it indicates that the backup file was modified somehow -most like by an unreliable backup storage.
 
  • Like
Reactions: fireon
We use Proxmox a lot year. Backuped and recoverd a lot of VM's. In all situations, the easiest thing was always, if something not working or is damaged, recover the VM. And really we never had a problem to recover a VM.

@dietmar But yes if there some problems with the backupstorage... maybe there is a better solution that backups can checked on consistency? Or exist there a verifyprocess on the backuptask?

Thanks.
 
@dietmar But yes if there some problems with the backupstorage... maybe there is a better solution that backups can checked on consistency? Or exist there a verifyprocess on the backuptask?

There is a

Code:
vma verify

for that:

Code:
root@backup /proxmox > lzop -cd vzdump-qemu-1002-2017_10_16-21_49_14.vma.lzo | vma verify /dev/stdin
root@backup /proxmox > echo $?
0
 
  • Like
Reactions: jeffwadsworth
@dietmar But yes if there some problems with the backupstorage... maybe there is a better solution that backups can checked on consistency? Or exist there a verifyprocess on the backuptask?

I don't think that is a problem during backup. Unreliable Storage can/will lose data later ...
 
@vlab. Before you blew away the existing hosts, were you still on Proxmox 4.4? I would verify the backup as LnxBil mentioned, and if possible, try and restore the backup files to another test server based on the same version as your original host. Or you can try booting up with an older kernel and try restore again.
 
@dietmar But yes if there some problems with the backupstorage... maybe there is a better solution that backups can checked on consistency? Or exist there a verifyprocess on the backuptask?
A bit flip on the disk/storage some time in the future cannot be detected at a specific point in time but must be detected on a regularly basis.
 
If you made a backup with that bad qemu version, it is corrupted no matter what you do. Is that the one that you are trying to restore to the new beta you set up?

Yes however the version I used to make that was upgraded to 5.0 as well.
 
@vlab. Before you blew away the existing hosts, were you still on Proxmox 4.4? I would verify the backup as LnxBil mentioned, and if possible, try and restore the backup files to another test server based on the same version as your original host. Or you can try booting up with an older kernel and try restore again.
Unfortunately I was also using 5.0 (non-Beta).
 
Unfortunately I was also using 5.0 (non-Beta).

this was fixed in pve-qemu-kvm in version 2.9.0-1~rc2+4. the first non-beta release of PVE 5 contained 2.9.0-2.
 
Now would be the best point in time to add a backup check to your backup routine. Needless to say that you should never trust a backup that you haven't validated yourself!
You're right and I should have done this originally. It just seems unfortunate that the backups I so diligently stored numerous copies of are now unusable.
 
There is a

Code:
vma verify

for that:

Code:
root@backup /proxmox > lzop -cd vzdump-qemu-1002-2017_10_16-21_49_14.vma.lzo | vma verify /dev/stdin
root@backup /proxmox > echo $?
0
root@proxmox1:/home# vma verify vzdump-qemu-101-2017_09_24-01_45_02.vma

** (process:29773): ERROR **: verify failed - wrong vma extent header chechsum
Trace/breakpoint trap
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!