Recovering from lvm thin metadata exhaustion

djzort

Member
Aug 8, 2013
29
1
23
Ok so this happened:
Code:
Jul  3 13:16:24 saito kernel: [131695.910332] device-mapper: space map metadata: unable to allocate new metadata b
lock
Jul  3 13:16:24 saito kernel: [131695.910762] device-mapper: thin: 253:4: metadata operation 'dm_thin_remove_range' failed: error = -28
Jul  3 13:16:24 saito kernel: [131695.911019] device-mapper: thin: 253:4: aborting current metadata transaction
Jul  3 13:16:24 saito kernel: [131695.974977] device-mapper: thin: 253:4: switching pool to read-only mode
Jul  3 13:16:33 saito kernel: [131705.274889] device-mapper: thin: dm_thin_get_highest_mapped_block returned -61
Jul  3 13:16:43 saito kernel: [131715.351896] device-mapper: thin: dm_thin_get_highest_mapped_block returned -61
Jul  3 13:16:53 saito kernel: [131725.446482] device-mapper: thin: dm_thin_get_highest_mapped_block returned -61

And sure enough
Code:
root@saito:/var/log# lvs -a
  Failed to parse thin params: Error.
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data            pve twi-cotzM- 500.00g             37.28  96.39                           
  [data_tdata]    pve Twi-ao---- 500.00g                                                   
  [data_tmeta]    pve ewi-ao---- 100.00m                                                   
  [lvol0_pmspare] pve ewi------- 100.00m                                                   
  root            pve -wi-ao----  93.13g                                                   
  swap            pve -wi-ao----  14.90g                                                   
  vm-100-disk-1   pve Vwi-XXtzX- 200.00g data                                               
  vm-100-disk-2   pve Vwi-a-tz-- 100.00g data        23.25

So i added more:
Code:
root@saito:/var/log# lvextend --poolmetadatasize +1G pve/data
  Size of logical volume pve/data_tmeta changed from 100.00 MiB (25 extents) to 1.10 GiB (281 extents).
  Logical volume pve/data_tmeta successfully resized.

killed off the stuck qemu processess then

Code:
root@saito:/var/log# lvchange -an -v /dev/pve/vm-100-disk-1
    Deactivating logical volume pve/vm-100-disk-1.
    Removing pve-vm--100--disk--1 (253:6)
root@saito:/var/log# lvchange -an -v /dev/pve/vm-100-disk-2
    Deactivating logical volume pve/vm-100-disk-2.
    Removing pve-vm--100--disk--2 (253:7)
root@saito:/var/log# lvchange -an -v /dev/pve/data
    Deactivating logical volume pve/data.
    Not monitoring pve/data with libdevmapper-event-lvm2thin.so
    Removing pve-data (253:5)
    Removing pve-data-tpool (253:4)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag /dev/mapper/pve-data_tmeta
    /usr/sbin/thin_check failed: 1
  WARNING: Integrity check of metadata for pool pve/data failed.
    Removing pve-data_tdata (253:3)
    Removing pve-data_tmeta (253:2)

then do repair
Code:
root@saito:/var/log# lvconvert --repair pve/data
  Using default stripesize 64.00 KiB.
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove pve/data_meta0 volume.
  WARNING: Use pvmove command to move pve/data_tmeta on the best fitting PV.

looks good, bring it back up and check metadata state:
Code:
root@saito:/var/log# lvchange -ay -v /dev/pve/data
    Activating logical volume pve/data exclusively.
    activation/volume_list configuration setting not defined: Checking only host tags for pve/data.
    Creating pve-data_tmeta
    Loading pve-data_tmeta table (253:2)
    Resuming pve-data_tmeta (253:2)
    Creating pve-data_tdata
    Loading pve-data_tdata table (253:3)
    Resuming pve-data_tdata (253:3)
    Executing: /usr/sbin/thin_check -q --clear-needs-check-flag /dev/mapper/pve-data_tmeta
    Creating pve-data-tpool
    Loading pve-data-tpool table (253:4)
    Resuming pve-data-tpool (253:4)
    Creating pve-data
    Loading pve-data table (253:5)
    Resuming pve-data (253:5)
    Monitoring pve/data
root@saito:/var/log# lvs -a
  LV            VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- 500.00g             4.65   1.19                           
  data_meta0    pve -wi-------   1.15g                                                   
  [data_tdata]  pve Twi-ao---- 500.00g                                                   
  [data_tmeta]  pve ewi-ao----   1.15g                                                   
  root          pve -wi-ao----  93.13g                                                   
  swap          pve -wi-ao----  14.90g                                                   
  vm-100-disk-1 pve Vwi---tz-- 200.00g data                                               
  vm-100-disk-2 pve Vwi---tz-- 100.00g data                                               
root@saito:/var/log# pvdisplay

good news for disk 2...
Code:
root@saito:/var/log# lvchange -ay -v /dev/pve/vm-100-disk-2
    Activating logical volume pve/vm-100-disk-2 exclusively.
    activation/volume_list configuration setting not defined: Checking only host tags for pve/vm-100-disk-2.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--2
    Loading pve-vm--100--disk--2 table (253:6)
    Resuming pve-vm--100--disk--2 (253:6)
    pve/data already monitored.

now bad news for disk 1...
Code:
root@saito:/var/log# lvchange -ay -v /dev/pve/vm-100-disk-1
    Activating logical volume pve/vm-100-disk-1 exclusively.
    activation/volume_list configuration setting not defined: Checking only host tags for pve/vm-100-disk-1.
    Loading pve-data_tdata table (253:3)
    Suppressed pve-data_tdata (253:3) identical table reload.
    Loading pve-data_tmeta table (253:2)
    Suppressed pve-data_tmeta (253:2) identical table reload.
    Loading pve-data-tpool table (253:4)
    Suppressed pve-data-tpool (253:4) identical table reload.
    Creating pve-vm--100--disk--1
    Loading pve-vm--100--disk--1 table (253:7)
  device-mapper: reload ioctl on (253:7) failed: No data available
    Removing pve-vm--100--disk--1 (253:7)

and from dmesg regarding disk one:
Code:
[481216.385943] device-mapper: table: 253:7: thin: Couldn't open thin internal device
[481216.386433] device-mapper: ioctl: error adding target to table

any thoughts on how to bring this disk back?
 
I think i may be in serious trouble
Code:
root@saito:/var/log# thin_dump /dev/pve/data_meta0 > /tmp/foo.txt
root@saito:/var/log# grep superblock /tmp/foo.txt
<superblock uuid="" time="0" transaction="6" data_block_size="128" nr_data_blocks="8192000">
</superblock>