Partitions lost in a Qcow2 VM

mel128

Member
May 9, 2011
77
0
6
Hello and my best wishes for 2017.

This morning I had a strange issue with a VM, an Ubuntu 14.04 LTS with 32 GB of HD with default configuration. That VM ran since months without problem. After a simple dist-upgrade, the reboot failed saying that the disk was not bootable. PartitionMagic detected any partition. I tried to restore a backup, even the older we had showed the same problem. Partitions and boot sectors was lost.

Hopefully, thanks to TestDisk to retreive the partions and Boot-Repair to retreive the backup, I was able to recover the VM.

Meanwhile I would know what's happened.

Best regards.

Michel
 
Hello everyone, happy new year!

I am reviving this 1-year old post, as I faced this very problem on Jan. 01, 2018. Same symptom as described above, partition repaired with testdisk and boot-repair. Backups of the vm also had the partition table lost.

Any idea as to what could be the cause for this?
 
Upon further investigation, I restored a number of vms to see whether their partition tables were okay or corrupted. It turned out that 1 other vm has the same problem, all of the others being fine. Both vms share a number of characteristics.

My setup is the following: a cluster of 3 hypervisors all running proxmox 4.4, each having a raid5 local storage for local vms and also sharing a drive in Ceph. VMs that are in the Ceph are also all managed by HA. Ceph health is OK.

The 2 VMs that have a partition table problem are managed by the same hypervisor (an HP ProLiant -- please to don't get me started on HP :o/ ) and are in the Ceph storage.

Machines that are on either of the two other hypervisors are fine and their backups restored perfectly. Machines that are on the HP hypervisor on the local raid5 are also fine. For now, the only common trunk I can think of could be a problem with this HP Proliant and some of its vms on the ceph. Does this ring a bell to anyone?
 
Never using cache for virtual drives. The qcow2 files being store on ceph, it is managed by rdb. There is only one pool, all of the vms managed by HA are in it, just 2 are having this issue.

What I find striking is that originally the backup of the vm would have a corrupted partition table. But by simply migrating the vm to another node, all other things being equal, suddenly everything is getting back to normal.