Recover VM-disk from failed drive

dredvard

New Member
Jun 12, 2024
5
0
1
I ran into a failure of a superblock on my drive and had to do a e2fschk /dev/dm-1. My server boots now but when I look at smartctl it gives me plenty of old-age status errors . Essentially I need to get everything off the drive if its not already too late.

lvs -a provides the following

1718158818885.png

How to clone my vm so I can rebuild this onto a new server?


Starting any of my drives provide me this error: activating LV 'pve/data' failed: Check of pool pve/data failed (status:1). Manual repair required!
 
Code:
Do ye no' have backups, mon??

https://github.com/kneutron/ansitest/tree/master/proxmox

Look into the bkpcrit script, set the target to separate non-root media or NAS. This will at least backup your VM/CTR configs.

If you have a Win10 desktop with sufficient free disk space, setup a Samba shared drive and use Proxmox integrated backup to that.

Datacenter / Backup in the GUI

You'll need to setup the Samba share in Storage - if the dropdown doesn't work then mount it in /etc/fstab and define it as Directory storage. Alternative is to use sshfs for convenience.

Another way is to attach a fresh disk to the system, partition it and format it with XFS and put a mount point in /etc/fstab for it.

Define it as Storage in pve with VZDump backup file and Disk image and Container, and start trying to move your VM disks to it -

Datacenter / Virtual machines / [vmid] / Hardware / [click on v.hard disk ] / Disk Action dropdown button / Move Storage

=====================================

If/when you get back to a stable running state, come back here and read this part later.

The time to salvage your valuable data is BEFORE the disk fails. NOT in the process of it already falling over, because then you cannot rely on data integrity. You NEED to setup regular backups. Preferably to NAS with redundancy.

I'm getting really tired of trying to help people in the middle of a scramble, ESPECIALLY when Proxmox INCLUDES backup functionality FOR FREE. Learn from other people's mistakes, and BACKUP YOUR VMS AND CRITICAL DATA before SHTF!
 
Yeah, No 3-2-1 backup yet. Never had a complete failure so quickly in my life! I did snapshotted but onto the same drive - the 4x4 TB NAS is scheduled to be delivered Monday ... and this is obviously occurs the week before. Sigh. Had only started to play with Proxmox in the past few months (and went on a vacation in between so that why the NAS and few things got delayed).

Truthfully, the thing that I most care about is the docker.compose files on my one VM. Couple other would save time reseting but aren't critical.

This script will be able to access the pve/data? lvscan says all VM's and data are inactive.

Code:
  inactive          '/dev/pve/data' [349.14 GiB] inherit
  ACTIVE            '/dev/pve/swap' [<7.67 GiB] inherit
  ACTIVE            '/dev/pve/root' [96.00 GiB] inherit
  inactive          '/dev/pve/vm-100-disk-0' [4.00 MiB] inherit
  inactive          '/dev/pve/vm-100-disk-1' [32.00 GiB] inherit 
...
 
Rich (BB code):
 lvchange -ay pve/data
  Check of pool pve/data failed (status:1). Manual repair required!


lvconvert --repair pve/data
The following field needs to be provided on the command line due to corruption in the superblock: transaction id
  Repair of thin metadata volume of thin pool pve/data failed (status:1). Manual repair required!
 
Last edited:
So I've run lvchange -an pve/data_meta. to mount the meta data in readonly.
I installed thin-provisioning tools from here : https://github.com/jthornber/thin-provisioning-tools/issues.

ran the repair

Code:
thin_check /dev/mapper/pve-data_tmeta
bad checksum in superblock

then did

Code:
thin_dump --repair /dev/mapper/pve-data_tmeta > repaired.xml
data block size needs to be provided due to corruption in the superblock

did a bit wise copy ( dd if=/dev/pve/data_tmeta of=~/backup/data_tmeta.img bs=4M status=progress).

so my question is, how do I figure out what the block size is? I'm assuming since I've done a bitwise copy, I'm assuming there is no issue if I screw up now.
 
I'm a bit out of my wheelhouse here, but ' pvdisplay; vgdisplay; lvdisplay ' are all giving me an LV size of 4MB?

If you're not sure then I would consider buying a support subscription and filing a ticket.
 
  • Like
Reactions: justinclift
I've created a repair.xml but I can't seem to do the
Code:
thin_repair -i repaired.xml -o /dev/mapper/pve-data_tmeta
. Its says it can't find the output file (despite doing the dd from it in the first place - although its been awhile since I reattempted this). How do I regenerate the mapper? I've been waiting until I got my NAS operational.

EDIT: Think its a lost cause. Not looking to do a service ticket (which is $1000) and this just for home use. I had a typo:

Code:
lvchange -an pve/data_tmeta
this is the command I was supposed to use to mount in read only but when I executed the thin_repair it says that no compatible roots founds. Think I've hit my limit.

Thanks for the help. Is it possible to run data recoevery and at least recover text files from a the hard drive vm?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!