Good evening,
My server is equipped with proxmox 5.2.1, has the two default storage (local and local-lvm), all default settings during installation phase.
I have 3 VM and 1 CT, hardware is a Dell, 4 discs 10 TB each, in hardware RAID5. All was working brilliant (and we have a lot of servers working in same conditions) until a few days ago.
All VMs and CT are not starting anymore, proxmox GUI and shell are ok (the two LVM volumes /root and swap are both activated, whilst data - and all disks in it - are inactive).
I have noticed, from logs that after a normal shutdown (correctly triggered from NUT UPS monitoring software), the following startup I found on logs (attached only sections of last two shutdown/startup) this entry:
Check of pool pve/data failed (status:1). Manual repair required!
I just tried (after googling around) to execute:
lvconvert --repair -v /dev/pve/data
this is the output:
WARNING: Not using lvmetad because config setting use_lvmetad=0.
WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
Using default stripesize 64.00 KiB.
Preparing pool metadata spare volume for Volume group pve.
Archiving volume group "pve" metadata (seqno 30).
Creating logical volume lvol0
Creating volume group backup "/etc/lvm/backup/pve" (seqno 31).
Activating logical volume pve/lvol0 locally.
activation/volume_list configuration setting not defined: Checking only host tags for pve/lvol0.
Creating pve-lvol0
Loading pve-lvol0 table (253:4)
Resuming pve-lvol0 (253:4)
Initializing 4.00 KiB of logical volume "pve/lvol0" with value 0.
Temporary logical volume "lvol0" created.
Removing pve-lvol0 (253:4)
Renaming lvol0 as pool metadata spare volume lvol0_pmspare.
activation/volume_list configuration setting not defined: Checking only host tags for pve/lvol0_pmspare.
Creating pve-lvol0_pmspare
Loading pve-lvol0_pmspare table (253:4)
Resuming pve-lvol0_pmspare (253:4)
activation/volume_list configuration setting not defined: Checking only host tags for pve/data_tmeta.
Executing: /usr/sbin/thin_repair -i /dev/mapper/pve-data_tmeta -o /dev/mapper/pve-lvol0_pmspare
truncating metadata device to 4161600 4k blocks
Piping: /usr/sbin/thin_dump /dev/mapper/pve-lvol0_pmspare
Removing pve-data_tmeta (253:2)
Removing pve-lvol0_pmspare (253:4)
WARNING: recovery of pools without pool metadata spare LV is not automated.
WARNING: If everything works, remove pve/data_meta0 volume.
WARNING: Use pvmove command to move pve/data_tmeta on the best fitting PV.
After that I tried to activate data once again: lvchange -ay -v /dev/pve/data
This is the output:
WARNING: Not using lvmetad because config setting use_lvmetad=0.
WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
Activating logical volume pve/data exclusively.
activation/volume_list configuration setting not defined: Checking only host tags for pve/data.
Creating pve-data_tmeta
Loading pve-data_tmeta table (253:2)
Resuming pve-data_tmeta (253:2)
Loading pve-data_tdata table (253:3)
Suppressed pve-data_tdata (253:3) identical table reload.
Executing: /usr/sbin/thin_check -q --clear-needs-check-flag /dev/mapper/pve-data_tmeta
Creating pve-data-tpool
Loading pve-data-tpool table (253:4)
Resuming pve-data-tpool (253:4)
device-mapper: resume ioctl on (253:4) failed: Invalid argument
Unable to resume pve-data-tpool (253:4)
Removing pve-data-tpool (253:4)
On dmesg I noticed (and it appears only after having executed the "repair" command) this error:
device-mapper: thin: 253:4: metadata device (4145152 blocks) too small: expected 4161600
device-mapper: table: 253:4: thin-pool: preresume failed, error = -22
Unfortunately I have removed (during all tests) the "original" metadata volume (pve/data_meta0), so I am not able to examine "original metadata".
Executing thin_check and thin_repair (standalone) their exit code is 0, so I think metadata are ok.
Executing the "lvchange" command with some more verbose, there is a -think useful- section in the output:
#misc/lvm-exec.c:71 Executing: /usr/sbin/thin_check -q --clear-needs-check-flag /dev/mapper/pve-data_tmeta
#misc/lvm-flock.c:38 _drop_shared_flock /run/lock/lvm/V_pve.
#misc/lvm-flock.c:38 _drop_shared_flock /run/lock/lvm/A_AP2TG0zqtMBCO3lFbew51WT8JDfhdvF4UMCIGqcmct8eyFRM7fXm1SuGFxfmi8R.
#mm/memlock.c:642 memlock reset.
#device/dev-io.c:625 Closed /dev/sda3
#libdm-deptree.c:1985 Creating pve-data-tpool
#ioctl/libdm-iface.c:1838 dm create pve-data-tpool LVM-JAP2TG0zqtMBCO3lFbew51WT8JDfhdvF4UMCIGqcmct8eyFRM7fXm1SuGFxfmi8R-tpool [ noopencount flush ] [16384] (*1)
#libdm-deptree.c:2706 Loading pve-data-tpool table (253:4)
#libdm-deptree.c:2650 Adding target to (253:4): 0 58276175872 thin-pool 253:2 253:3 256 0 1 ignore_discard
#ioctl/libdm-iface.c:1838 dm table (253:4) [ opencount flush ] [16384] (*1)
#ioctl/libdm-iface.c:1838 dm reload (253:4) [ noopencount flush ] [16384] (*1)
#libdm-deptree.c:2759 Table size changed from 0 to 58276175872 for pve-data-tpool (253:4).
#libdm-deptree.c:1351 Resuming pve-data-tpool (253:4)
#libdm-common.c:2391
dm resume (253:4) [ noopencount flush ] [16384] (*1)
#ioctl/libdm-iface.c:1876 device-mapper: resume ioctl on (253:4) failed: Invalid argument
#libdm-common.c:2318 Udev cookie 0xd4d13cd (semid 10321956) decremented to 1
#libdm-deptree.c:1379 <backtrace>
#libdm-deptree.c:2877 Unable to resume pve-data-tpool (253:4)
#libdm-deptree.c:1026 Removing pve-data-tpool (253:4)
I am stuck, can someone help me on this?
My server is equipped with proxmox 5.2.1, has the two default storage (local and local-lvm), all default settings during installation phase.
I have 3 VM and 1 CT, hardware is a Dell, 4 discs 10 TB each, in hardware RAID5. All was working brilliant (and we have a lot of servers working in same conditions) until a few days ago.
All VMs and CT are not starting anymore, proxmox GUI and shell are ok (the two LVM volumes /root and swap are both activated, whilst data - and all disks in it - are inactive).
I have noticed, from logs that after a normal shutdown (correctly triggered from NUT UPS monitoring software), the following startup I found on logs (attached only sections of last two shutdown/startup) this entry:
Check of pool pve/data failed (status:1). Manual repair required!
I just tried (after googling around) to execute:
lvconvert --repair -v /dev/pve/data
this is the output:
WARNING: Not using lvmetad because config setting use_lvmetad=0.
WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
Using default stripesize 64.00 KiB.
Preparing pool metadata spare volume for Volume group pve.
Archiving volume group "pve" metadata (seqno 30).
Creating logical volume lvol0
Creating volume group backup "/etc/lvm/backup/pve" (seqno 31).
Activating logical volume pve/lvol0 locally.
activation/volume_list configuration setting not defined: Checking only host tags for pve/lvol0.
Creating pve-lvol0
Loading pve-lvol0 table (253:4)
Resuming pve-lvol0 (253:4)
Initializing 4.00 KiB of logical volume "pve/lvol0" with value 0.
Temporary logical volume "lvol0" created.
Removing pve-lvol0 (253:4)
Renaming lvol0 as pool metadata spare volume lvol0_pmspare.
activation/volume_list configuration setting not defined: Checking only host tags for pve/lvol0_pmspare.
Creating pve-lvol0_pmspare
Loading pve-lvol0_pmspare table (253:4)
Resuming pve-lvol0_pmspare (253:4)
activation/volume_list configuration setting not defined: Checking only host tags for pve/data_tmeta.
Executing: /usr/sbin/thin_repair -i /dev/mapper/pve-data_tmeta -o /dev/mapper/pve-lvol0_pmspare
truncating metadata device to 4161600 4k blocks
Piping: /usr/sbin/thin_dump /dev/mapper/pve-lvol0_pmspare
Removing pve-data_tmeta (253:2)
Removing pve-lvol0_pmspare (253:4)
WARNING: recovery of pools without pool metadata spare LV is not automated.
WARNING: If everything works, remove pve/data_meta0 volume.
WARNING: Use pvmove command to move pve/data_tmeta on the best fitting PV.
After that I tried to activate data once again: lvchange -ay -v /dev/pve/data
This is the output:
WARNING: Not using lvmetad because config setting use_lvmetad=0.
WARNING: To avoid corruption, rescan devices to make changes visible (pvscan --cache).
Activating logical volume pve/data exclusively.
activation/volume_list configuration setting not defined: Checking only host tags for pve/data.
Creating pve-data_tmeta
Loading pve-data_tmeta table (253:2)
Resuming pve-data_tmeta (253:2)
Loading pve-data_tdata table (253:3)
Suppressed pve-data_tdata (253:3) identical table reload.
Executing: /usr/sbin/thin_check -q --clear-needs-check-flag /dev/mapper/pve-data_tmeta
Creating pve-data-tpool
Loading pve-data-tpool table (253:4)
Resuming pve-data-tpool (253:4)
device-mapper: resume ioctl on (253:4) failed: Invalid argument
Unable to resume pve-data-tpool (253:4)
Removing pve-data-tpool (253:4)
On dmesg I noticed (and it appears only after having executed the "repair" command) this error:
device-mapper: thin: 253:4: metadata device (4145152 blocks) too small: expected 4161600
device-mapper: table: 253:4: thin-pool: preresume failed, error = -22
Unfortunately I have removed (during all tests) the "original" metadata volume (pve/data_meta0), so I am not able to examine "original metadata".
Executing thin_check and thin_repair (standalone) their exit code is 0, so I think metadata are ok.
Executing the "lvchange" command with some more verbose, there is a -think useful- section in the output:
#misc/lvm-exec.c:71 Executing: /usr/sbin/thin_check -q --clear-needs-check-flag /dev/mapper/pve-data_tmeta
#misc/lvm-flock.c:38 _drop_shared_flock /run/lock/lvm/V_pve.
#misc/lvm-flock.c:38 _drop_shared_flock /run/lock/lvm/A_AP2TG0zqtMBCO3lFbew51WT8JDfhdvF4UMCIGqcmct8eyFRM7fXm1SuGFxfmi8R.
#mm/memlock.c:642 memlock reset.
#device/dev-io.c:625 Closed /dev/sda3
#libdm-deptree.c:1985 Creating pve-data-tpool
#ioctl/libdm-iface.c:1838 dm create pve-data-tpool LVM-JAP2TG0zqtMBCO3lFbew51WT8JDfhdvF4UMCIGqcmct8eyFRM7fXm1SuGFxfmi8R-tpool [ noopencount flush ] [16384] (*1)
#libdm-deptree.c:2706 Loading pve-data-tpool table (253:4)
#libdm-deptree.c:2650 Adding target to (253:4): 0 58276175872 thin-pool 253:2 253:3 256 0 1 ignore_discard
#ioctl/libdm-iface.c:1838 dm table (253:4) [ opencount flush ] [16384] (*1)
#ioctl/libdm-iface.c:1838 dm reload (253:4) [ noopencount flush ] [16384] (*1)
#libdm-deptree.c:2759 Table size changed from 0 to 58276175872 for pve-data-tpool (253:4).
#libdm-deptree.c:1351 Resuming pve-data-tpool (253:4)
#libdm-common.c:2391
dm resume (253:4) [ noopencount flush ] [16384] (*1)
#ioctl/libdm-iface.c:1876 device-mapper: resume ioctl on (253:4) failed: Invalid argument
#libdm-common.c:2318 Udev cookie 0xd4d13cd (semid 10321956) decremented to 1
#libdm-deptree.c:1379 <backtrace>
#libdm-deptree.c:2877 Unable to resume pve-data-tpool (253:4)
#libdm-deptree.c:1026 Removing pve-data-tpool (253:4)
I am stuck, can someone help me on this?