After a power outage, pve/data cannot be activated or repaired

pixel24

Well-Known Member
Dec 11, 2019
107
2
58
47
Hi everyone,

I still have a very old PVE 7.4-19 setup here with Thin-LVM on LSI hardware RAID. After a power outage, pve/data can no longer be activated. The volume group is 100% allocated, but the "data" inside it (VM disks & snapshots) actually only takes up about half the space. I’ve read several times in the forum that it’s a problem for the repair process if the volume group is 100% allocated. So I connected an external USB hard drive, added it as another volume, and expanded the volume group by 10 GB. I booted the computer using SystemRescue and am now trying to repair it, but I’m having trouble. Here is some information about the LVM:

Code:
[root@sysrescue ~]# lsblk
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0                7:0    0     1G  1 loop /run/archiso/sfs/airootfs
sda                  8:0    0   2.7T  0 disk
├─sda1               8:1    0  1007K  0 part
├─sda2               8:2    0   512M  0 part
└─sda3               8:3    0   2.7T  0 part
  ├─pve-swap       253:0    0     8G  0 lvm 
  ├─pve-root       253:1    0    50G  0 lvm 
  ├─pve-data_meta0 253:2    0  15.9G  1 lvm 
  └─pve-data_meta1 253:3    0  15.9G  1 lvm 
sdb                  8:16   0 931.5G  0 disk
├─pve-data_meta0   253:2    0  15.9G  1 lvm 
├─pve-data_meta1   253:3    0  15.9G  1 lvm 
└─pve-data_meta2   253:4    0  15.9G  1 lvm 
sdc                  8:32   1   1.9G  0 disk
├─sdc1               8:33   1   1.2G  0 part /run/archiso/bootmnt
└─sdc2               8:34   1   1.4M  0 part

Note that /dev/sda is the RAID on the LSI controller and was created by the installer. /dev/sdb is the external USB drive.

Code:
[root@sysrescue ~]# pvs
  PV         VG  Fmt  Attr PSize   PFree 
  /dev/sda3  pve lvm2 a--   <2.73t      0
  /dev/sdb   pve lvm2 a--  931.51g 883.74g

Code:
[root@sysrescue ~]# vgs
  VG  #PV #LV #SN Attr    VSize  VFree 
  pve   2  16   0 wz--n-- <3.64t 883.74g

Code:
[root@sysrescue ~]# lvs
  LV                                VG  Attr       LSize   Pool Origin        Data%  Meta%  Move Log Cpy%Sync Convert
  data                              pve twi---tz--   2.64t                                                           
  data_meta0                        pve -ri-a----- <15.88g                                                           
  data_meta1                        pve -ri-a----- <15.88g                                                           
  data_meta2                        pve -ri-a----- <15.88g                                                           
  root                              pve -wi-a-----  50.00g                                                           
  snap_vm-101-disk-0_Update24_09_16 pve Vri---tz-k 500.00g data vm-101-disk-0                                       
  snap_vm-101-disk-0_Update24_12_15 pve Vri---tz-k 500.00g data vm-101-disk-0                                       
  snap_vm-101-disk-1_Update24_08_27 pve Vri---tz-k   2.50t data vm-101-disk-1                                       
  snap_vm-101-disk-1_Update24_09_16 pve Vri---tz-k   2.50t data vm-101-disk-1                                       
  snap_vm-101-disk-1_Update24_12_15 pve Vri---tz-k   2.50t data vm-101-disk-1                                       
  snap_vm-102-disk-0_Update24_09_16 pve Vri---tz-k  32.00g data vm-102-disk-0                                       
  snap_vm-102-disk-0_Update2_7_0    pve Vri---tz-k  32.00g data vm-102-disk-0                                       
  swap                              pve -wi-a-----   8.00g                                                           
  vm-101-disk-0                     pve Vwi---tz-- 500.00g data                                                     
  vm-101-disk-1                     pve Vwi---tz--   2.50t data                                                     
  vm-102-disk-0                     pve Vwi---tz--  32.00g data

data_metaX was generated during previous repair attempts. I'm trying to start the repair:

Code:
[root@sysrescue ~]# vgchange -an pve
  0 logical volume(s) in volume group "pve" now active
[root@sysrescue ~]# lvconvert -v --repair pve/data
  activation/volume_list configuration setting not defined: Checking only host tags for pve/lvol2_pmspare.
  Creating pve-lvol2_pmspare
  Loading table for pve-lvol2_pmspare (253:0).
  Resuming pve-lvol2_pmspare (253:0).
  activation/volume_list configuration setting not defined: Checking only host tags for pve/data_tmeta.
  Creating pve-data_tmeta
  Loading table for pve-data_tmeta (253:1).
  Resuming pve-data_tmeta (253:1).
  Executing: /usr/bin/thin_repair -i /dev/pve/data_tmeta -o /dev/pve/lvol2_pmspare
  Piping: /usr/bin/thin_dump /dev/pve/lvol2_pmspare
  Removing pve-data_tmeta (253:1)
  Removing pve-lvol2_pmspare (253:0)
  Preparing pool metadata spare volume for Volume group pve.
  Creating logical volume lvol3
  WARNING: Sum of all thin volume sizes (<11.56 TiB) exceeds the size of thin pools and the size of whole volume group (<3.64 TiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Archiving volume group "pve" metadata (seqno 280).
  Activating logical volume pve/lvol3.
  activation/volume_list configuration setting not defined: Checking only host tags for pve/lvol3.
  Creating pve-lvol3
  Loading table for pve-lvol3 (253:0).
  Resuming pve-lvol3 (253:0).
  Initializing <15.88 GiB of logical volume pve/lvol3 with value 0.
  Temporary logical volume "lvol3" created.
  Removing pve-lvol3 (253:0)
  Renaming lvol3 as pool metadata spare volume lvol3_pmspare.
  Archiving volume group "pve" metadata (seqno 281).
  WARNING: LV pve/data_meta2 holds a backup of the unrepaired metadata. Use lvremove when no longer required.
  WARNING: New metadata LV pve/data_tmeta might use different PVs.  Move it with pvmove if required.
  Creating volume group backup "/etc/lvm/backup/pve" (seqno 282).

Even after that, pve/data still won't activate:
Code:
[root@sysrescue ~]# lvchange -ay /dev/pve/data
  Check of pool pve/data failed (status:64). Manual repair required!

I'm stuck here and need some help. I know the computer and the PVE are very old. They'll be replaced soon with new hardware and a new PVE, which unfortunately doesn't help me right now. And as if that weren't bad enough, the backup is also outdated. So I'd like to try to recover it and do better in the future.

with best