the relevant volume group is not namedAfter rebooting my node, I've got the same problem, but the recommended solution isn't working for me:
Code:root@proxmox19:~# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert base-139-disk-0 images Vri---tz-k 128.00g images base-148-disk-0 images Vri---tz-k 256.00g images base-148-disk-1 images Vri---tz-k 256.00g images base-170-disk-0 images Vri---tz-k 128.00g images base-139-disk-0 images images twi---tz-- 16.34t [images_tdata] images Twi-a----- 16.34t [images_tmeta] images ewi-a----- <15.88g [lvol0_pmspare] images ewi------- <15.88g snap_vm-100-disk-0_one images Vri---tz-k 32.00g images vm-100-disk-0 snap_vm-101-disk-0_base images Vri---tz-k 256.00g images vm-101-disk-0 snap_vm-101-disk-1_base images Vri---tz-k 256.00g images vm-101-disk-1 snap_vm-131-disk-0_base images Vri---tz-k 256.00g images vm-131-disk-0 snap_vm-131-disk-1_base images Vri---tz-k 256.00g images vm-131-disk-1 snap_vm-132-disk-0_BASE images Vri---tz-k 256.00g images vm-132-disk-0 snap_vm-132-disk-1_BASE images Vri---tz-k 256.00g images vm-132-disk-1 snap_vm-137-disk-0_base_main images Vri---tz-k 256.00g images snap_vm-137-disk-1_base_main images Vri---tz-k 256.00g images snap_vm-150-disk-0_A1-1826-Purva images Vri---tz-k 256.00g images snap_vm-150-disk-0_A1-1829-Jignesh images Vri---tz-k 256.00g images snap_vm-150-disk-0_BASE images Vri---tz-k 256.00g images snap_vm-150-disk-0_Start images Vri---tz-k 256.00g images snap_vm-150-disk-1_A1-1826-Purva images Vri---tz-k 256.00g images snap_vm-150-disk-1_A1-1829-Jignesh images Vri---tz-k 256.00g images snap_vm-150-disk-1_BASE images Vri---tz-k 256.00g images snap_vm-150-disk-1_Start images Vri---tz-k 256.00g images snap_vm-154-disk-0_BaseUbuntu_camnet_storage images Vri---tz-k 128.00g images vm-154-disk-0 snap_vm-154-disk-2_BaseUbuntu_camnet_storage images Vri---tz-k 100.00g images vm-154-disk-2 vm-100-disk-0 images Vwi---tz-- 32.00g images vm-100-state-one images Vwi---tz-- <4.49g images vm-101-disk-0 images Vwi---tz-- 256.00g images vm-101-disk-1 images Vwi---tz-- 256.00g images vm-102-disk-0 images Vwi---tz-- 64.00g images vm-102-disk-1 images Vwi---tz-- 256.00g images vm-104-disk-0 images Vwi---tz-- 128.00g images vm-106-disk-0 images Vwi---tz-- 150.00g images vm-106-disk-1 images Vwi---tz-- <5.86t images vm-106-disk-2 images Vwi---tz-- 4.88t images vm-117-disk-0 images Vwi---tz-- 32.00g images vm-121-disk-0 images Vwi---tz-- 64.00g images vm-121-disk-1 images Vwi---tz-- 256.00g images vm-123-disk-0 images Vwi---tz-- 64.00g images vm-123-disk-1 images Vwi---tz-- 512.00g images vm-123-disk-2 images Vwi---tz-- 512.00g images vm-129-disk-0 images Vwi---tz-- 32.00g images vm-130-disk-0 images Vwi---tz-- 4.00m images vm-130-disk-1 images Vwi---tz-- 171.00g images vm-130-disk-2 images Vwi---tz-- 32.00g images vm-131-disk-0 images Vwi---tz-- 256.00g images vm-131-disk-1 images Vwi---tz-- 256.00g images vm-131-state-base images Vwi---tz-- <16.49g images vm-132-disk-0 images Vwi---tz-- 256.00g images vm-132-disk-1 images Vwi---tz-- 256.00g images vm-132-state-BASE images Vwi---tz-- <16.50g images vm-137-disk-0 images Vwi---tz-- 256.00g images snap_vm-137-disk-0_base_main vm-137-disk-1 images Vwi---tz-- 256.00g images snap_vm-137-disk-1_base_main vm-137-state-base_main images Vwi---tz-- <16.49g images vm-140-disk-0 images Vwi---tz-- 4.00m images vm-140-disk-1 images Vwi---tz-- 170.00g images vm-140-disk-2 images Vwi---tz-- 171.00g images vm-143-disk-0 images Vwi---tz-- 256.00g images vm-143-disk-1 images Vwi---tz-- 256.00g images vm-144-disk-0 images Vwi---tz-- 256.00g images vm-144-disk-1 images Vwi---tz-- 256.00g images vm-149-disk-0 images Vwi---tz-- 200.00g images vm-149-disk-1 images Vwi---tz-- 200.00g images vm-150-disk-0 images Vwi---tz-- 256.00g images snap_vm-150-disk-0_A1-1829-Jignesh vm-150-disk-1 images Vwi---tz-- 256.00g images snap_vm-150-disk-1_A1-1829-Jignesh vm-150-state-A1-1826-Purva images Vwi---tz-- <16.50g images vm-150-state-A1-1829-Jignesh images Vwi---tz-- <16.50g images vm-150-state-BASE images Vwi---tz-- <16.50g images vm-150-state-Start images Vwi---tz-- <16.50g images vm-152-disk-0 images Vwi---tz-- 256.00g images base-148-disk-0 vm-152-disk-1 images Vwi---tz-- 256.00g images base-148-disk-1 vm-154-disk-0 images Vwi---tz-- 128.00g images vm-154-disk-1 images Vwi---tz-- 100.00g images vm-154-disk-2 images Vwi---tz-- 100.00g images vm-154-state-BaseUbuntu_camnet_storage images Vwi---tz-- <16.49g images vm-157-disk-0 images Vwi---tz-- 4.00m images vm-157-disk-1 images Vwi---tz-- 171.00g images vm-157-disk-2 images Vwi---tz-- 32.00g images vm-171-disk-0 images Vwi---tz-- 128.00g images base-170-disk-0 vm-171-disk-1 images Vwi---tz-- 100.00g images vm-175-disk-0 images Vwi---tz-- 4.00m images vm-175-disk-1 images Vwi---tz-- 171.00g images vm-175-disk-2 images Vwi---tz-- 32.00g images vm-176-disk-0 images Vwi---tz-- 4.00m images vm-176-disk-1 images Vwi---tz-- 170.00g images vm-176-disk-2 images Vwi---tz-- 171.00g images vm-185-disk-0 images Vwi---tz-- 4.00m images vm-185-disk-1 images Vwi---tz-- 171.00g images vm-185-disk-2 images Vwi---tz-- 32.00g images vm-192-disk-0 images Vwi---tz-- 4.00m images vm-192-disk-1 images Vwi---tz-- 171.00g images vm-192-disk-2 images Vwi---tz-- 32.00g images vm-201-disk-0 images Vwi---tz-- 4.00m images vm-201-disk-1 images Vwi---tz-- 171.00g images vm-201-disk-2 images Vwi---tz-- 32.00g images root pve -wi-ao---- <802.00g swap pve -wi-ao---- 128.00g
And then:
Any help is appreciated
( EDIT for readability )
pve
, but images
, so you need to adapt the lvchange
commands.Thank you! I did figure that out eventually, still didn't seem to solve the problem and it wouldn't be permanent anyway. I did go through the suggested thread for a permanent config change, and that didn't work either. After multiple reboots, I managed to get the servers back up, but I feel like it was a timing / luck thing. Still looking for the "right" solution before I have to reboot them againHi,
the relevant volume group is not namedpve
, butimages
, so you need to adapt thelvchange
commands.
Did you run the update-initramfs command too? Otherwise you can also try to increase the udev timeout as suggested in the other thread.Thank you! I did figure that out eventually, still didn't seem to solve the problem and it wouldn't be permanent anyway. I did go through the suggested thread for a permanent config change, and that didn't work either. After multiple reboots, I managed to get the servers back up, but I feel like it was a timing / luck thing. Still looking for the "right" solution before I have to reboot them again
Did you run the update-initramfs command too? Otherwise you can also try to increase the udev timeout as suggested in the other thread.
update-initramfs -u
... is that what you meant by Did you run the update-initramfs command too?
adding / converting / rebooting / removing / rebooting
thin_check_options = [ "-q", "--skip-mappings" ]
update-initramfs -u
as recomendedlvconvert --repair images/images
thin_check
options aboveupdate-initramfs -u
as recomendedNo, the other workaround that is mentioned there, i.e.: https://forum.proxmox.com/threads/l...rnel-update-on-pve-7.97406/page-3#post-558890By "udev timeout" do you mean where they mention: "Add --skip-mappings in the udev rules for LVM"
Which is mentioned in this thread ... yes?
Did you boot into the latest installed kernel? Otherwise you'll need to specify update-initramfs -u -k all to rebuild it for all kernels (or specify the version you want to boot).That's where I see the commandupdate-initramfs -u
... is that what you meant byDid you run the update-initramfs command too?
EDIT:
To be clear, that's what I meant in my other thread byadding / converting / rebooting / removing / rebooting
Added:thin_check_options = [ "-q", "--skip-mappings" ]
Ran:update-initramfs -u
as recomended
Why did you do this? Did you get a prompt from LVM that a repair is required? If that is the case, your issue is likely not the same as in the other thread, and I would check the disk health.Converted withlvconvert --repair images/images
Rebooted (still failed to mount "images")
Removed thethin_check
options above
Re-ranupdate-initramfs -u
as recomended
Rebooted
After a few tries, I managed to get the system back up and running stable... but it doesn't feel like I solved it, it feels like I got lucky and I would like to know what causes it
EDIT EDIT:
Thank you for your help![]()
Oh, I didn't see that further down, you mean this? https://forum.proxmox.com/threads/local-lvm-not-available-after-kernel-update-on-pve-7.97406/page-3#post-558890:~:text=EDIT: INCREASING THE UDEV TIMEOUT DOES WORKNo, the other workaround that is mentioned there, i.e.: https://forum.proxmox.com/threads/l...rnel-update-on-pve-7.97406/page-3#post-558890
Valid question, I assumed update-initramfs was updating the current kernel. I'll have to double-check that... thanksDid you boot into the latest installed kernel? Otherwise you'll need to specify update-initramfs -u -k all to rebuild it for all kernels (or specify the version you want to boot).
Mentioned in the other thread, I was getting this when trying to start the VMs:Why did you do this? Did you get a prompt from LVM that a repair is required? If that is the case, your issue is likely not the same as in the other thread, and I would check the disk health.
cannot perform fix without a full examination
Usage: thin_check [options] {device|file}
Options:
{-q|--quiet}
{-h|--help}
{-V|--version}
{-m|--metadata-snap}
{--auto-repair}
{--override-mapping-root}
{--clear-needs-check-flag}
{--ignore-non-fatal-errors}
{--skip-mappings}
{--super-block-only}
TASK ERROR: activating LV 'images/images' failed: Check of pool images/images failed (status:1). Manual repair required!
The above and the skip-mappings option will only help if the reason for activation failure is that there was a timeout.Oh, I didn't see that further down, you mean this? https://forum.proxmox.com/threads/local-lvm-not-available-after-kernel-update-on-pve-7.97406/page-3#post-558890:~:text=EDIT: INCREASING THE UDEV TIMEOUT DOES WORK
I will try this, thank you
Valid question, I assumed update-initramfs was updating the current kernel. I'll have to double-check that... thanks
But in your case, it's likely not a timeout, but corruption or disk errors. First thing I'd check is the disk health with e.g.Mentioned in the other thread, I was getting this when trying to start the VMs:
Code:cannot perform fix without a full examination Usage: thin_check [options] {device|file} Options: {-q|--quiet} {-h|--help} {-V|--version} {-m|--metadata-snap} {--auto-repair} {--override-mapping-root} {--clear-needs-check-flag} {--ignore-non-fatal-errors} {--skip-mappings} {--super-block-only} TASK ERROR: activating LV 'images/images' failed: Check of pool images/images failed (status:1). Manual repair required!
I didn't know if that's related to this problem (given that images/images was not active at all), or something else entirely. I also don't know what is meant by "Manual repair required" and couldn't find anything that directly explained that message (maybe I missed something). After googling frantically and searching the forums, I came up with a post that recommended trying lvconvert to repair it. Didn't help... didn't seem to hurt either. But that's how I got there.
What is the recommended action for resolving the "Manual repair required!" message?
smartctl
. And also the system logs/journal for any messages regarding the underlying device.lvconvert --repair images/images -v
show? You might need to run lvchange -an images
to deactivate the pool first.Is there a way to know, from the journal or otherwise, if there was a timeout? When I looked in the journal boot messages (journalctl -b) I saw nothing indicative of a problemThe above and the skip-mappings option will only help if the reason for activation failure is that there was a timeout.
I have looked through the journal for any errors, and found none. When I ran "lvconvert --repair images/images" I didn't use "-v" but I can tell you it didn't report any errors.But in your case, it's likely not a timeout, but corruption or disk errors. First thing I'd check is the disk health with e.g.smartctl
. And also the system logs/journal for any messages regarding the underlying device.
If there are indeed disk errors, you should try to salvage the data to another disk.
If not, what doeslvconvert --repair images/images -v
show? You might need to runlvchange -an images
to deactivate the pool first.
There also exists an alternative implementation that has been reported to be able to fix other issues: https://github.com/jthornber/thin-provisioning-tools
IIRC (but disclaimer, I looked into the issue was years ago), it's not logged by default.Is there a way to know, from the journal or otherwise, if there was a timeout? When I looked in the journal boot messages (journalctl -b) I saw nothing indicative of a problem
Should theI have looked through the journal for any errors, and found none. When I ran "lvconvert --repair images/images" I didn't use "-v" but I can tell you it didn't report any errors.
manual repair required
message come again, please share the verbose output. Feel free to post the full journal for a problematic boot.You can still try the method with increasing the udev timeout. I'd not put theI can't take this cluster down for testing, it's a development system that's highly in-use. I will have to run through some checks when there is system downtime, and I can risk not being able to reboot it.
Currently we're working on a full system backup. That matches with your recommendation, to salvage the data. When that completes, I will have more flexibility in terms of risking taking the system down without being sure I can restart it.
Thanks for the help, I will repost here when I am able to do maintenance on the cluster
--skip-mappings
then.Okay, that sounds very unlikely. In rare cases, if the hardware was bought at the same time, identical disks, it could still happen, so I'd not rule it out completely, but yes, very unlikely.Also, follow-up question: I have three systems in this cluster, all set up as standalone systems. They are identically configured, but they do not share the "images" volumes. They all have exactly the same problem, with the same messages, and I went through the same processes to get the to restart.
Do you feel that all three nodes in the cluster are having identical disk issues, independently?
We use essential cookies to make this site work, and optional cookies to enhance your experience.