All VM Disks inactive after power failure

lektech

New Member
Feb 17, 2022
7
0
1
24
Hi all, I've seen a few other people have had a similar issue from what I can see/do - nothing seems to work.

Sometime over the previous night the system experienced a power failure and rebooted - but the virtual machines are all unable to start.

Run of qm start 103
Code:
root@pve:~# qm start 103
kvm: -drive file=/dev/pve/vm-103-disk-0,if=none,id=drive-sata1,format=raw,cache=none,aio=native,detect-zeroes=on: Could not open '/dev/pve/vm-103-disk-0': No such file or directory
start failed: QEMU exited with code 1

Run of lvs
Code:
root@pve:~# lvs
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz--   4.76t             0.00   0.22                           
  data_meta0    pve -wi-a-----  15.81g                                                   
  root          pve -wi-ao----  96.00g                                                   
  swap          pve -wi-ao----   8.00g                                                   
  vm-100-disk-0 pve Vwi---tz--  10.00g data                                              
  vm-101-disk-0 pve Vwi---tz-- 180.00g data                                              
  vm-102-disk-0 pve Vwi---tz-- 180.00g data                                              
  vm-103-disk-0 pve Vwi---tz-- 180.00g data                                              
  vm-104-disk-0 pve Vwi---tz-- 100.00g data

All of the VM disks are inactive
When I try to run lvchange -ay
Code:
root@pve:~# lvchange -ay /dev/pve/vm-103-disk-0
  device-mapper: reload ioctl on  (253:5) failed: No data available

When I do an lvscan
Code:
root@pve:~# lvscan
  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [96.00 GiB] inherit
  ACTIVE            '/dev/pve/data' [4.76 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-0' [180.00 GiB] inherit
  inactive          '/dev/pve/vm-100-disk-0' [10.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-0' [180.00 GiB] inherit
  inactive          '/dev/pve/vm-103-disk-0' [180.00 GiB] inherit
  inactive          '/dev/pve/vm-150-disk-0' [75.00 GiB] inherit
  inactive          '/dev/pve/vm-150-disk-1' [2.00 TiB] inherit
  inactive          '/dev/pve/vm-120-disk-0' [160.00 GiB] inherit
  ---
  ACTIVE            '/dev/pve/data_meta0' [15.81 GiB] inherit

I don't know what to do next. There are 34 virtual machines in total.
The hardware is running 4x 2TB SAS disks configured in a hardware RAID controller in RAID 5.
Proxmox is installed on the RAID 5 volume, and local-lvm which contained the VM disks was also on the RAID volume.
I doubt there is hardware failure..

There is critical data on vm-150-disk-1
The rest of the data on all of the other disks is unimportant in comparison.

Please can anyone help? Do you think the data is recoverable?
Would a Proxmox VE Subscription make the chance of success any higher?
 
Last edited:
output of lvchange -ay data
Code:
root@pve:~# lvchange -ay data
  Volume group "data" not found
  Cannot process volume group data

output of vgs
Code:
root@pve:~# vgs
  VG  #PV #LV #SN Attr   VSize VFree 
  pve   1  39   0 wz--n- 4.91t 576.00m

if I try run lvconvert --repair pve/data
Code:
root@pve:~# lvconvert --repair pve/data
  WARNING: Sum of all thin volume sizes (5.16 TiB) exceeds the size of thin pools and the size of whole volume group (4.91 TiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  WARNING: LV pve/data_meta0 holds a backup of the unrepaired metadata. Use lvremove when no longer required.


Thanks,
 
output of lvchange -ay data/vm-103-disk-0
Code:
root@pve:~# lvchange -ay data/vm-103-disk-0
  Volume group "data" not found
  Cannot process volume group data

but the output of lvchange -ay pve/vm-103-disk-0
Code:
root@pve:~# lvchange -ay pve/vm-103-disk-0
  device-mapper: reload ioctl on  (253:6) failed: No data available
 
after running vgchange -a y
Code:
root@pve:~# vgchange -a y
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  ------
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  device-mapper: reload ioctl on  (253:6) failed: No data available
  5 logical volume(s) in volume group "pve" now active

after running vchange -ay
Code:
root@pve:~# vchange -ay
-bash: vchange: command not found

pveversion
Code:
root@pve:~# pveversion
pve-manager/6.3-2/22f57405 (running kernel: 5.4.73-1-pve)
 
output of lvchange -ay
Code:
root@pve:~# lvchange -ay
  No command with matching syntax recognised.  Run 'lvchange --help' for more information.
  Nearest similar command has syntax:
  lvchange -a|--activate y|n|ay VG|LV|Tag|Select ...
  Activate or deactivate an LV.
 
I'm starting to feel like all the data is lost and nothing can be done,
The original metadata backup from when I initially tried lvconvert --repair pve/data is gone.

I don't know what's caused this. I'm not sure if it was the metadata fulling up or external forces.
 
Hi Lektech, did you had a solution for this, i am having similar problem

Hi, no I did not.
I did reply to your DM as well- but for anybody else who stumbles across this thread in the future- I did not solve it.
We bit the bullet and did a fresh install and tried to rebuild what we could.

Another power failure occured one week later and the same thing happened.
This Proxmox install had suffered countless power failures for over a year. All of which are due to national rotational load shedding (South Africa).
Only after a lightning strike (that seemed to have only damaged some networking equipment) did this whole issue with data loss occur- on the first and second power failures. The physical server is completely undamaged.

All I can say now is that we are not using Proxmox any more- we have switched to another system.

I still have, use and somewhat trust Proxmox in my homelab.
Unfortunatley, I will never trust Proxmox to securely store data again.
So now, all of my data is stored on hard drives that have been passed through to my TrueNAS virtual machine.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!