(disaster recovery howto) Recover VM disks of running VM

Glowsome · Apr 14, 2018

Situation :

- 3 standalone nodes with VM's on XFS storage
- No Shared storage.

- NFS shares on all 3 nodes to eachother to restore backup files from one node to the other. (added as NFS storage, with backup files only selected in PVE)

Task :

- Introduce new node and setup first cluster node (no shared storage)
- setup/erect NFS connectivity for pulling backup files across and restore them on new clusternode (added as NFS storage, with backup files only selected in PVE)
- Restore VMs from newly taken backup after shutdown.

Issue:

- restore goes fine, but it adds 'unused disks' that are linked to NFS storage on old host.
- deleting those unused disk leads to deletion of actual file over NFS on old node with same VMID while VM is still running.
- backups of those VMS now leads to file not found on diskfiles

Challenge:

- get diskfiles back (without having backups)

first of all this method ONLY works if the files are in use ( so aslong as you dont power off the VM they will still be there)

determine the flagged deleted files

Code:

find /proc/*/fd -ls | grep  '(deleted)'

In my case the result was :

Code:

 12572685      0 lrwx------   1 root     root           64 Apr 14 01:21 /proc/1722/fd/22 -> /data/vms/images/100/vm-100-disk-3.qcow2\ (deleted)
 12572687      0 lrwx------   1 root     root           64 Apr 14 01:21 /proc/1722/fd/24 -> /data/vms/images/100/vm-100-disk-2.qcow2\ (deleted)
 12572688      0 lrwx------   1 root     root           64 Apr 14 01:21 /proc/1722/fd/25 -> /data/vms/images/100/vm-100-disk-1.qcow2\ (deleted)
 12572689      0 lrwx------   1 root     root           64 Apr 14 01:21 /proc/1722/fd/26 -> /data/vms/images/100/vm-100-disk-4.qcow2\ (deleted)
 12572690      0 lrwx------   1 root     root           64 Apr 14 01:21 /proc/1722/fd/27 -> /data/vms/images/100/vm-100-disk-5.qcow2\ (deleted)
 12572691      0 lrwx------   1 root     root           64 Apr 14 01:21 /proc/1722/fd/28 -> /data/vms/images/100/vm-100-disk-6.qcow2\ (deleted)

Copy the files (yes they are open and in use, still its better then nothing, and a filesystemrepair can do alot) :

Code:

 cp /proc/1722/fd/22 /backup-store/vm-100-disk-3.qcow2
 cp /proc/1722/fd/24 /backup-store/vm-100-disk-2.qcow2
 cp /proc/1722/fd/25 /backup-store/vm-100-disk-1.qcow2
 cp /proc/1722/fd/26 /backup-store/vm-100-disk-4.qcow2
 cp /proc/1722/fd/27 /backup-store/vm-100-disk-5.qcow2
 cp /proc/1722/fd/28 /backup-store/vm-100-disk-6.qcow2

Test the integrity of the copied files (and repair if needed with -t all)

Code:

 qemu-img check /backup-store/vm-100-disk-3.qcow2
 .....
 .....
 .....

Now all files have been rescued for as far as possible outside of the VM
This point is the hardest decision you make in this procedure : power down the VM which lost its disks !
- powering down means the active files are released and deleted ( so make sure you executed the copy of the active files has gone correctly)
After powering down copy the saved files back to their original location

Code:

cp /backup-store/vm-100-disk-3.qcow2 /data/vms/images/100/
 cp /backup-store/vm-100-disk-2.qcow2 /data/vms/images/100/
 cp /backup-store/vm-100-disk-1.qcow2 /data/vms/images/100/
 cp /backup-store/vm-100-disk-4.qcow2 /data/vms/images/100/
 cp /backup-store/vm-100-disk-5.qcow2 /data/vms/images/100/
 cp /backup-store/vm-100-disk-6.qcow2 /data/vms/images/100/

Most likely the filesystem will be inconsistent as it was still running, so boot the VM from a liveCD and repair the disks (fsck)
After done reboot the VM normally, and hope everything is there ( in my case i had a full restoration of the deleted diskfiles and machine !)

Just to share the steps i have taken so if someone else ever runs into it he/she can attempt the same procedure.

- Michael

udo · Apr 14, 2018

Glowsome said:
Issue:

- restore goes fine, but it adds 'unused disks' that are linked to NFS storage on old host.
- deleting those unused disk leads to deletion of actual file over NFS on old node with same VMID while VM is still running.
- backups of those VMS now leads to file not found on diskfiles

Hi Michael,
that's not really an issue because you should NEVER use the same storage outside of an cluster (like betweeen cluster or between cluster and single nodes).
Except you realy know what you doing - like to recreate an cluster with an new installation.

This is the normal behaviar. If you create an VM, on all storages will look for vm-disks with this VMID and added to the VM.

Udo

Glowsome · Apr 14, 2018

udo said:
This is the normal behaviar. If you create an VM, on all storages will look for vm-disks with this VMID and added to the VM.
Udo

Then this means the filter i had set on the NFS storage in the PVE webinterface to backups only is not honored by the underlying process.

imho by setting that it _should_ also have restricted the search for disks in those places.

i have had the storage shared like this for a long time, only change is that i made it writable so i could drop the bakcup directly on the other node .... by doing that i also enabled the possibility to delete the disks....so its back to read-only ...

Funny thing is .. i have migrated VM's before with the NFS sorage on Read-Only, and when restoring it did not add the unassigned disks from the other storage ?

so maybe the underlying process notices if storage available is RO or RW .. and stops searching if its RO

- Michael

Search

Search

(disaster recovery howto) Recover VM disks of running VM

Glowsome

Renowned Member

udo

Distinguished Member

Glowsome

Renowned Member