cannot restore vm from snapshot

Feb 26, 2018
2
0
1
49
Hi there,
after running for about two years, one of my VMs came up with a strange behaviour, which led to a total desaster:
Preamble:
one VM noticed me about an issue with its filesystem and suggested that i should run 'fsck' to repair it.
After rebooting the VM, the filesytem only started in read-only mode. So I ran 'fsck' a second time and again it apperently fixed some things. Since then the VM did not boot anymore, instead the 'grub rescue mode console' came up. I tried to 'ls' the partitions, but they all had an unknown filesystem.

Now I wanted to restore the VM out of one of several snapshots, which are regulary written by the backup function. Unfortunatly the old snapshots show the same error. Then I tried to restore the snapshot to a new VM, but the same error occured. Within the grub rescue console one of the partions was identified as ext2, but trying to 'ls' it, it only showed random strings.

What surprises me most, is the following:
  1. there are two other VMs on that hardware, which are running fine
  2. the snapshots are 1, 2 and 3 weeks old, but the error occured the day before the next snapshot would be taken. So they should not have the filesytem problems.
I used the following command to restore the VM:
Code:
qmrestore /srv/backup-server/dump/vzdump-qemu-102-2018_02_03-05_09_03.vma.gz 104
Are there any further parameter to get a better result?

Here is a screenshot from the grub rescue console:
2018-02-26 11_49_35-digicultvm2_grub_rescue.png

proxmox-ve: 4.4-105 (running kernel: 4.4.21-1-pve)
pve-manager: 4.4-22 (running version: 4.4-22/2728f613)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-115
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-6~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
 
maybe the real error is a level below (e.g. your disks/storage?) and got replicated to your backup
also:
proxmox-ve: 4.4-105 (running kernel: 4.4.21-1-pve)
you should really try to find time to reboot
 
From what I can tell from smartctl the HDD is fine...and the other VMs on the same disk show no sign of problems.

Are there any ways to get at least at the files in the backup? While we usually employ a more redundant backup strategy for our databases and files, on this VM we relied only on the proxmox backups for some reasons.

Could the commercial support help in such a case?
 
you can try to boot the vm (or clones) with a rescue live cd and try to get your files out

From what I can tell from smartctl the HDD is fine...and the other VMs on the same disk show no sign of problems.
in my experience smart data is not a good indicator that a disk is ok/broken, there are sometimes readings that should be noted and acted upon, but a smart "passed" is not a
confirmation that the disk is ok
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!