local-LVM not available after Kernel update on PVE 7

fiona · Aug 13, 2024

Hi,

DavidGA said:
This just started happening to me, but after a power outage, not a software update.

After the outage, the server appeared to hang for a long time, and then finally displayed "Timed out for waiting the udev queue being empty."

It then continued to boot, but without my LVM partition, which caused all my containers to fail to start.

I did the trick of increasing the timeout period, and this did work, but now my server takes an absolute age to boot, and, worse, my containers all take an age to start as well, for no apparent reason.

What actually causes this huge delay, why might it have happened after a power outage, and is there anything I can do to undo the damage?

please check the health of your physical disk, e.g. using smartctl -a /dev/XYZ. Are there any interesting messages in the system logs/journal?

Taz-Matt · Aug 13, 2024

I agree with @fiona. Maybe test the drives properly and replace the faulty one if any. Don't forget backups.

Also, regarding the "trick", I personally removed it from boot time and let the server start without the LVM partition on which the VM disks are. At least I got access to the host and can send the commands manually or thru a script and know what to expect. I hate being in the dark for 10-20 minutes, I prefer booting without LVM and then "having control". Note that this only happens on one of my servers and it is definitely the slowest one I have regarding disks. Others have even bigger disks and disk usage is higher but don't have this issue.

Good luck with that!

DavidGA · Aug 13, 2024

fiona said:
Hi,

please check the health of your physical disk, e.g. using smartctl -a /dev/XYZ. Are there any interesting messages in the system logs/journal?

SMART tests check out just fine, and there's nothing weird in the boot log.

However, I've noticed another symptom, despite the fact that containers to start and run OK, they can no longer be backed up. Telling proxmox to backup any container (I use the "stop" method, and the backup goes to an NFS mounted directory), the backup process simply runs forever, and never completes.

fiona · Aug 14, 2024

DavidGA said:
SMART tests check out just fine, and there's nothing weird in the boot log.

However, I've noticed another symptom, despite the fact that containers to start and run OK, they can no longer be backed up. Telling proxmox to backup any container (I use the "stop" method, and the backup goes to an NFS mounted directory), the backup process simply runs forever, and never completes.

What if you try to read an LV from start to finish? Does that also hang?

DavidGA · Aug 14, 2024

No, and I've since found that backups do complete, they're just verrrry slow.

stuartxt · Oct 4, 2024

Nice fix

kpmarcin · Oct 19, 2024

Gustavo Neves said:
Code:

event_timeout=600

Thank you so much. This is the only solution that worked for me.
I had this issue with my 20TB hard drive.

yxw · Jan 3, 2025

Could you please help me understand what’s going on? When I add a new hard drive, the physical machine running PVE shows an error during boot and takes a long time to get past it, unable to boot into the system. I can only boot normally if I remove the hard drive. I’ve already tried adding `thin_check_options = [ "-q", "--skip-mappings" ]`, as well as running `update-initramfs -u -k all` and `update-initramfs -u`.

fiona · Jan 3, 2025

Hi,

yxw said:
Could you please help me understand what’s going on? When I add a new hard drive, the physical machine running PVE shows an error during boot and takes a long time to get past it, unable to boot into the system. I can only boot normally if I remove the hard drive. I’ve already tried adding `thin_check_options = [ "-q", "--skip-mappings" ]`, as well as running `update-initramfs -u -k all` and `update-initramfs -u`.

View attachment 80140

please open a new thread, and provide the full system logs/journal from the failed boot attempt. Should you have the possibility to try, does hotplug work?

jandrews · Jan 12, 2025

Solved.....
I experienced a power cut without a UPS. I had the same problem with my lvm thin volume. Running the 3 commands (as previously posted) worked for me but I was worried there was something wrong with the meta-data.
My setup: single disk 4tb lvm-thin

Fix:
1) Stop all vm's and set all to not auto-start
2) Using an external hardrive >= to the size of the local-lvm, I created a new lvm-thin (external-lvm).
3) For each VM (on the hardware tab) I moved the storage disks to the new storage target.
4) I deleted the cloud-init drives attached to each vm (if that applies to your set up). They are easily added again later.

Once all were moved, I could see under 'node -> local-lvm -> VM Disks' was now empty. This is the crutial step, there cannot be any disks left on the faulty LVM.

Next I rebooted the host, my lvm errors were gone

I then proceeded to move the storage back to the local-lvm.

This was a lengthy process, but I was able to do it all in the web interface (no console). My theory is that it must have been caused by corrupt meta-data for one (or more) of the VM disks. Removing the VM disks deleted the bung meta-data.

Best of luck! I simply wasn't happy having the commands to repair running each boot.

untimateqq · Mar 1, 2025

fiona said:
It's not an LVM bug, but should rather be considered a bug in Proxmox VE's (and likely Debian's) init configuration/handling. What (likely) happens is that the thin_check during activation takes too long and pvscan is killed (see here for more information).

Another workaround besides the one suggested by @Fidor should be setting

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

EDIT3: Yet another alternative is to increase the udev timeout: https://forum.proxmox.com/threads/l...fter-kernel-update-on-pve-7.97406/post-558890

EDIT2: Upstream bug report in Debian

EDIT: The workaround from @Fidor doesn't seem to work when the partial LVs are active:

Code:

Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.

It would require deactivating XYZ_tmeta and XYZ_tdata first.

The workaround, it's work for version 8.2.2 too

tinyhaggis · Apr 18, 2025

The only problem with this is that it means my server takes 10 minutes to boot after a 4 minute POST. Has there been any forward movement on this issue? It still seems to exist in the new 6.14 kernel for 8.4.

wavesound · Saturday at 22:15

I'm seeing this issue as well...

Search

Search

local-LVM not available after Kernel update on PVE 7

fiona

Proxmox Staff Member

Taz-Matt

New Member

DavidGA

New Member

fiona

Proxmox Staff Member

DavidGA

New Member

stuartxt

New Member

kpmarcin

New Member

yxw

New Member

fiona

Proxmox Staff Member

jandrews

New Member

untimateqq

New Member

tinyhaggis

New Member

wavesound

Member

We value your privacy