cannot restore vm from snapshot

Carsten (digiCULT) · Feb 26, 2018

Hi there,
after running for about two years, one of my VMs came up with a strange behaviour, which led to a total desaster:
Preamble:
one VM noticed me about an issue with its filesystem and suggested that i should run 'fsck' to repair it.
After rebooting the VM, the filesytem only started in read-only mode. So I ran 'fsck' a second time and again it apperently fixed some things. Since then the VM did not boot anymore, instead the 'grub rescue mode console' came up. I tried to 'ls' the partitions, but they all had an unknown filesystem.

Now I wanted to restore the VM out of one of several snapshots, which are regulary written by the backup function. Unfortunatly the old snapshots show the same error. Then I tried to restore the snapshot to a new VM, but the same error occured. Within the grub rescue console one of the partions was identified as ext2, but trying to 'ls' it, it only showed random strings.

What surprises me most, is the following:

there are two other VMs on that hardware, which are running fine
the snapshots are 1, 2 and 3 weeks old, but the error occured the day before the next snapshot would be taken. So they should not have the filesytem problems.

I used the following command to restore the VM:

Code:

qmrestore /srv/backup-server/dump/vzdump-qemu-102-2018_02_03-05_09_03.vma.gz 104

Are there any further parameter to get a better result?

Here is a screenshot from the grub rescue console:

2018-02-26 11_49_35-digicultvm2_grub_rescue.png

proxmox-ve: 4.4-105 (running kernel: 4.4.21-1-pve)
pve-manager: 4.4-22 (running version: 4.4-22/2728f613)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-115
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-6~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80

dcsapak · Feb 26, 2018

maybe the real error is a level below (e.g. your disks/storage?) and got replicated to your backup
also:

Carsten (digiCULT) said:
proxmox-ve: 4.4-105 (running kernel: 4.4.21-1-pve)

you should really try to find time to reboot

Carsten (digiCULT) · Feb 27, 2018

From what I can tell from smartctl the HDD is fine...and the other VMs on the same disk show no sign of problems.

Are there any ways to get at least at the files in the backup? While we usually employ a more redundant backup strategy for our databases and files, on this VM we relied only on the proxmox backups for some reasons.

Could the commercial support help in such a case?

dcsapak · Feb 27, 2018

you can try to boot the vm (or clones) with a rescue live cd and try to get your files out

Carsten (digiCULT) said:
From what I can tell from smartctl the HDD is fine...and the other VMs on the same disk show no sign of problems.

in my experience smart data is not a good indicator that a disk is ok/broken, there are sometimes readings that should be noted and acted upon, but a smart "passed" is not a
confirmation that the disk is ok

Search

Search

cannot restore vm from snapshot

Carsten (digiCULT)

New Member

dcsapak

Proxmox Staff Member

Carsten (digiCULT)

New Member

dcsapak

Proxmox Staff Member

We value your privacy