Partitions lost in a Qcow2 VM

mel128 · Jan 1, 2017

Hello and my best wishes for 2017.

This morning I had a strange issue with a VM, an Ubuntu 14.04 LTS with 32 GB of HD with default configuration. That VM ran since months without problem. After a simple dist-upgrade, the reboot failed saying that the disk was not bootable. PartitionMagic detected any partition. I tried to restore a backup, even the older we had showed the same problem. Partitions and boot sectors was lost.

Hopefully, thanks to TestDisk to retreive the partions and Boot-Repair to retreive the backup, I was able to recover the VM.

Meanwhile I would know what's happened.

Best regards.

Michel

Fred Saunier · Jan 2, 2018

Hello everyone, happy new year!

I am reviving this 1-year old post, as I faced this very problem on Jan. 01, 2018. Same symptom as described above, partition repaired with testdisk and boot-repair. Backups of the vm also had the partition table lost.

Any idea as to what could be the cause for this?

Fred Saunier · Jan 2, 2018

Upon further investigation, I restored a number of vms to see whether their partition tables were okay or corrupted. It turned out that 1 other vm has the same problem, all of the others being fine. Both vms share a number of characteristics.

My setup is the following: a cluster of 3 hypervisors all running proxmox 4.4, each having a raid5 local storage for local vms and also sharing a drive in Ceph. VMs that are in the Ceph are also all managed by HA. Ceph health is OK.

The 2 VMs that have a partition table problem are managed by the same hypervisor (an HP ProLiant -- please to don't get me started on HP

/ ) and are in the Ceph storage.

Machines that are on either of the two other hypervisors are fine and their backups restored perfectly. Machines that are on the HP hypervisor on the local raid5 are also fine. For now, the only common trunk I can think of could be a problem with this HP Proliant and some of its vms on the ceph. Does this ring a bell to anyone?

udo · Jan 2, 2018

Fred Saunier said:
Does this ring a bell to anyone?

Hi,
not really.

What drive cache settings do you use for the qcow2-drive in the VM-config?
What filesystem do you use to store the qcow2-Files - and with wich mount options?

Udo

Fred Saunier · Jan 2, 2018

Never using cache for virtual drives. The qcow2 files being store on ceph, it is managed by rdb. There is only one pool, all of the vms managed by HA are in it, just 2 are having this issue.

What I find striking is that originally the backup of the vm would have a corrupted partition table. But by simply migrating the vm to another node, all other things being equal, suddenly everything is getting back to normal.

udo · Jan 2, 2018

Fred Saunier said:
Never using cache for virtual drives. The qcow2 files being store on ceph, it is managed by rdb.

Hi,
I assume that's not right... qcow2 need an underlying filesystem.
If you hav mounted cephfs on the node, it's not managed by rbd. If the disk is an rbd-volume it's not qcow2 (then it's raw content on the ceph object store).

Udo

Fred Saunier · Jan 2, 2018

You are absolutely right. All vms that have been migrated to ceph are raw.

Search

Search

Partitions lost in a Qcow2 VM

mel128

Member

Fred Saunier

Well-Known Member

Fred Saunier

Well-Known Member

udo

Distinguished Member

Fred Saunier

Well-Known Member

udo

Distinguished Member

Fred Saunier

Well-Known Member

We value your privacy