Hi there,
after running for about two years, one of my VMs came up with a strange behaviour, which led to a total desaster:
Preamble:
one VM noticed me about an issue with its filesystem and suggested that i should run 'fsck' to repair it.
After rebooting the VM, the filesytem only started in read-only mode. So I ran 'fsck' a second time and again it apperently fixed some things. Since then the VM did not boot anymore, instead the 'grub rescue mode console' came up. I tried to 'ls' the partitions, but they all had an unknown filesystem.
Now I wanted to restore the VM out of one of several snapshots, which are regulary written by the backup function. Unfortunatly the old snapshots show the same error. Then I tried to restore the snapshot to a new VM, but the same error occured. Within the grub rescue console one of the partions was identified as ext2, but trying to 'ls' it, it only showed random strings.
What surprises me most, is the following:
Are there any further parameter to get a better result?
Here is a screenshot from the grub rescue console:
after running for about two years, one of my VMs came up with a strange behaviour, which led to a total desaster:
Preamble:
one VM noticed me about an issue with its filesystem and suggested that i should run 'fsck' to repair it.
After rebooting the VM, the filesytem only started in read-only mode. So I ran 'fsck' a second time and again it apperently fixed some things. Since then the VM did not boot anymore, instead the 'grub rescue mode console' came up. I tried to 'ls' the partitions, but they all had an unknown filesystem.
Now I wanted to restore the VM out of one of several snapshots, which are regulary written by the backup function. Unfortunatly the old snapshots show the same error. Then I tried to restore the snapshot to a new VM, but the same error occured. Within the grub rescue console one of the partions was identified as ext2, but trying to 'ls' it, it only showed random strings.
What surprises me most, is the following:
- there are two other VMs on that hardware, which are running fine
- the snapshots are 1, 2 and 3 weeks old, but the error occured the day before the next snapshot would be taken. So they should not have the filesytem problems.
Code:
qmrestore /srv/backup-server/dump/vzdump-qemu-102-2018_02_03-05_09_03.vma.gz 104
Here is a screenshot from the grub rescue console:
proxmox-ve: 4.4-105 (running kernel: 4.4.21-1-pve)
pve-manager: 4.4-22 (running version: 4.4-22/2728f613)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-115
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-6~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
pve-manager: 4.4-22 (running version: 4.4-22/2728f613)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-115
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-6~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: not correctly installed
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80