I ran into a fairly serious problem today on our development cluster.
I upgraded my 3-node ProxMox 8.3.4 cluster the other day, and it pulled in an update for dbus. This required a system reboot. After reboot, the VMs failed to start, and I had errors like:
And VMs failing to start with this message:
I found multiple threads, the bulk of which said to try things like:
which well may be my problem. EDIT: I don't think is my problem after all...
The only thing that I can find in journalctl is:
I found a link to THIS THREAD that talks about changing parameters for thin_check ... but that didn't work for me either. Eventually I found a NOTE HERE to try this:
Eventually I even found a post (can't find it now) that said REMOVING the thin_check options solved their problem.
So, after adding / converting / rebooting / removing / rebooting I finally have the systems back up. And I really have no idea why they are back up, except that I believe it's more a timing problem than anything else. I really don't know...
My questions:
Thanks
I upgraded my 3-node ProxMox 8.3.4 cluster the other day, and it pulled in an update for dbus. This required a system reboot. After reboot, the VMs failed to start, and I had errors like:
proxmox TASK ERROR: activating LV 'images/images' failed: Activation of logical volume images/images is prohibited while logical volume images/images_tmeta is active.
And VMs failing to start with this message:
cannot perform fix without a full examination
Usage: thin_check [options] {device|file}
Options:
{-q|--quiet}
{-h|--help}
{-V|--version}
{-m|--metadata-snap}
{--auto-repair}
{--override-mapping-root}
{--clear-needs-check-flag}
{--ignore-non-fatal-errors}
{--skip-mappings}
{--super-block-only}
TASK ERROR: activating LV 'images/images' failed: Check of pool images/images failed (status:1). Manual repair required!
I found multiple threads, the bulk of which said to try things like:
Or:lvchange -an images/images_tdata
lvchange -an images/images_tmeta
lvchange -ay images/images
But that didn't work for me. I also saw some people saying they thought image storage might be full...lvchange -an images/images
lvconvert --repair images/images
lvchange -ay images/images
The only thing that I can find in journalctl is:
Feb 28 13:18:55 proxmox19 pvestatd[3992]: activating LV 'images/images' failed: Check of pool images/images failed (status:1). Manual repair required!
I found a link to THIS THREAD that talks about changing parameters for thin_check ... but that didn't work for me either. Eventually I found a NOTE HERE to try this:
but that results in the message:lvconvert --repair images/images
Which apparently is problematic.truncating metadata device to 4161600 4k blocks
Eventually I even found a post (can't find it now) that said REMOVING the thin_check options solved their problem.
So, after adding / converting / rebooting / removing / rebooting I finally have the systems back up. And I really have no idea why they are back up, except that I believe it's more a timing problem than anything else. I really don't know...
My questions:
- does anyone know definitively how to figure out what causes that, and what the correct steps are to avoid / repair it should it happen again?
- how do you "manually repair" the system in that state?
- is there any way to get a better warning when system space is low, rather than failing to restart?
Thanks
Last edited: