restore failed - wrong vma extent header chechsum

ispirto

Renowned Member
Oct 20, 2012
37
1
73
We have been backing up 5 host nodes including several VMs to a NFS storage. We have tested restoring backups around 2 weeks ago and we didn't have any issues with it.

However, when we needed to restore a VM from backups today, we have found out that all the backups taken on all 5 different host nodes were corrupted and non restorable.

Code:
restore vma archive: lzop -d -c /mnt/pve/node29-backup-dalpremium3/dump/vzdump-qemu-35847-2017_03_08-04_45_02.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp18423.fifo - /var/tmp/vzdumptmp18423
CFG: size: 390 name: qemu-server.conf
DEV: dev_id=1 size: 64424509440 devname: drive-virtio0
CTIME: Wed Mar  8 04:45:03 2017
  Logical volume "vm-99999-disk-1" created.
new volume ID is 'vmstore:vm-99999-disk-1'
map 'drive-virtio0' to '/dev/vmdata/vm-99999-disk-1' (write zeros = 0)

** (process:18426): ERROR **: restore failed - wrong vma extent header chechsum
/bin/bash: line 1: 18425 Broken pipe             lzop -d -c /mnt/pve/node29-backup-dalpremium3/dump/vzdump-qemu-35847-2017_03_08-04_45_02.vma.lzo
     18426 Trace/breakpoint trap   | vma extract -v -r /var/tmp/vzdumptmp18423.fifo - /var/tmp/vzdumptmp18423
  Logical volume "vm-99999-disk-1" successfully removed
temporary volume 'vmstore:vm-99999-disk-1' sucessfuly removed
TASK ERROR: command 'lzop -d -c /mnt/pve/node29-backup-dalpremium3/dump/vzdump-qemu-35847-2017_03_08-04_45_02.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp18423.fifo - /var/tmp/vzdumptmp18423' failed: exit code 133

We have tried multiple backup files from all the nodes and all are returning this error.

Manual lzop -d and then vma extract <file> <dir> is also not working and returning the same checksum error.

Code:
vma extract vzdump-qemu-36200-2017_03_07-06_25_52.vma bla
DEVINFO bla/tmp-disk-drive-virtio0.raw 64424509440
Formatting 'bla/tmp-disk-drive-virtio0.raw', fmt=raw size=64424509440

** (process:21215): ERROR **: restore failed - wrong vma extent header chechsum
Trace/breakpoint trap

So at this point, I'm looking to try to salvage some data from the somehow corrupted vma file but it seems there is no way to skip the checksum validation.

Question:

- If I recompile the vma binary removing the checksum validation lines, would I get usable disk image with some corruption?

- If so, what would be the fastest way of achieving this?
 
So I've found the reason why the taken backups are showing this error.

I've switched using the Jessie Backports kernel: 4.9.0-0.bpo.1-amd64

I've found out that backups taken under that kernel are showing this error. When I take the backup using the latest PVE kernel, they work fine.

To sum up:
Backups taken under 4.9.0-0.bpo.1-amd64 are not restorable under 4.9.0-0.bpo.1-amd64
Backups taken under 4.9.0-0.bpo.1-amd64 are not restorable under 4.4.40-1-pve
Backups taken under 4.4.40-1-pve are are restorable under 4.4.40-1-pve
Backups taken under 4.4.40-1-pve are restorable under 4.9.0-0.bpo.1-amd64

I'm not really sure how they are related. Maybe because of md5sum acting different? Any insights?
 
New findings:

The file size of the backup taken on the 4.4.40-1pve is the same as the backup taken on 4.9.0-0.bpo.1-amd64. So I think the file is not actually corrupt.

I've compiled the pve-qemu-kvm after commenting out checksum check and the other 2 checks after that and then extracted the vma application and used it to extract the contents from the vma file.

Code:
+        if (memcmp(md5sum, ehead->md5sum, 16) != 0) {
+            /* error_setg(errp, "wrong vma extent header chechsum"); */
+            /* return -1; */
+        }
+
+        if (memcmp(h->uuid, ehead->uuid, sizeof(ehead->uuid)) != 0) {
+            /* error_setg(errp, "wrong vma extent uuid"); */
+            /* return -1; */
+        }
+
+        if (ehead->magic != VMA_EXTENT_MAGIC || ehead->reserved1 != 0) {
+            /* error_setg(errp, "wrong vma extent header magic"); */
+            /* return -1; */
+        }

But it didn't work. Here's the error message I got:

Code:
vma extract nonworking.vma out
DEVINFO out/tmp-disk-drive-virtio0.raw 64424509440
Formatting 'out/tmp-disk-drive-virtio0.raw', fmt=raw size=64424509440

** (process:23157): ERROR **: restore failed - short vma extent (3867136 < 107385344)
Trace/breakpoint trap

So, I'm currently wondering if I strip all the headers generated by vma from the file and then I'd have the actual disk image binary.
 
So, I'm currently wondering if I strip all the headers generated by vma from the file and then I'd have the actual disk image binary.

Your files are damaged - the code clearly shows that there are checksum errors. I suggest that you use the kernel provided by our team, because it is well tested.
 
  • Like
Reactions: ispirto
Your files are damaged - the code clearly shows that there are checksum errors. I suggest that you use the kernel provided by our team, because it is well tested.

Thank you. Unfortunately, the kernel provided by PVE 4.4 is causing CPU soft lockups and therefore I had to use a newer kernel.

I was wondering what could be the reason of this. Every backup taken on the new kernel has this problem.
 
I also used the 4.9.x kernel from debian, and it generated corrupt backup archives. (WTF the kernel has to do with a linux application that generates the backup archives?? why that relation? i would say that must be two COMPLETELY DIFFERENT OS LAYERS!!?)

Because of that s**t, i will never use proxmox again. NEVER EVER!!!!
 
I also used the 4.9.x kernel from debian, and it generated corrupt backup archives. (WTF the kernel has to do with a linux application that generates the backup archives?? why that relation? i would say that must be two COMPLETELY DIFFERENT OS LAYERS!!?)

Because of that s**t, i will never use proxmox again. NEVER EVER!!!!

Now, my VM host servers runs with Win7 Pro x64 and VirtualBoxWeb, and certified drivers for my LSI RAID Controller (Hardware RAID 10), it seems to be 10x faster that your proxmox...
 
I also used the 4.9.x kernel from debian, and it generated corrupt backup archives. (WTF the kernel has to do with a linux application that generates the backup archives?? why that relation? i would say that must be two COMPLETELY DIFFERENT OS LAYERS!!?)
the reason for this is a change in the kernel for specific mode of pipes (O_DIRECT) in kernel version 4.5, but is fixed in a later version of our pve-qemu-kvm version which we use in pve5
so the pve 4.4 kernel is not affected, but every kernel >=4.5 is
 
Thank you for your feedback!

There is no way to extract the generated vma.lzo file, if (maybe) only the checksum is invalid, but not the content? If the content is invalid too, there is no way for sure..:(

Regards, Jan
 
Last edited:
"There is no way to extract the generated vma.lzo file, if (maybe) only the checksum is invalid, but not the content? If the content is invalid too, there is no way for sure.."

May there is a solution to extract the vma.lzo file? Or not??

Regards, Jan
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!