Errors after reboot

voidindigo · Feb 28, 2025

I ran into a fairly serious problem today on our development cluster.

I upgraded my 3-node ProxMox 8.3.4 cluster the other day, and it pulled in an update for dbus. This required a system reboot. After reboot, the VMs failed to start, and I had errors like:

proxmox TASK ERROR: activating LV 'images/images' failed: Activation of logical volume images/images is prohibited while logical volume images/images_tmeta is active.

And VMs failing to start with this message:

cannot perform fix without a full examination
Usage: thin_check [options] {device|file}
Options:
{-q|--quiet}
{-h|--help}
{-V|--version}
{-m|--metadata-snap}
{--auto-repair}
{--override-mapping-root}
{--clear-needs-check-flag}
{--ignore-non-fatal-errors}
{--skip-mappings}
{--super-block-only}
TASK ERROR: activating LV 'images/images' failed: Check of pool images/images failed (status:1). Manual repair required!

I found multiple threads, the bulk of which said to try things like:

lvchange -an images/images_tdata
lvchange -an images/images_tmeta
lvchange -ay images/images

Or:

lvchange -an images/images
lvconvert --repair images/images
lvchange -ay images/images

But that didn't work for me. I also saw some people saying they thought image storage might be full... ~~which well may be my problem.~~ EDIT: I don't think is my problem after all...

The only thing that I can find in journalctl is:

Feb 28 13:18:55 proxmox19 pvestatd[3992]: activating LV 'images/images' failed: Check of pool images/images failed (status:1). Manual repair required!

I found a link to THIS THREAD that talks about changing parameters for thin_check ... but that didn't work for me either. Eventually I found a NOTE HERE to try this:

lvconvert --repair images/images

but that results in the message:

truncating metadata device to 4161600 4k blocks

Which apparently is problematic.

Eventually I even found a post (can't find it now) that said REMOVING the thin_check options solved their problem.

So, after adding / converting / rebooting / removing / rebooting I finally have the systems back up. And I really have no idea why they are back up, except that I believe it's more a timing problem than anything else. I really don't know...

My questions:

does anyone know definitively how to figure out what causes that, and what the correct steps are to avoid / repair it should it happen again?
how do you "manually repair" the system in that state?
is there any way to get a better warning when system space is low, rather than failing to restart?

I've been managing this cluster for a few years now, but I'm not a ProxMox guru by any rate. I feel like I'm missing some basic steps here... any help is appreciated.
Thanks

voidindigo · Mar 3, 2025

BUMP: No thoughts on this? This is a mission critical system for us, there's no thoughts on how to predict / resolve this?

fiona · Mar 4, 2025

Discussion already happening here: https://forum.proxmox.com/threads/t...e-pve-data_tdata-is-active.106225/post-752229

Search

Search

Errors after reboot

voidindigo

Well-Known Member

voidindigo

Well-Known Member

fiona

Proxmox Staff Member

We value your privacy