local-LVM not available after Kernel update on PVE 7

Hi,
This just started happening to me, but after a power outage, not a software update.

After the outage, the server appeared to hang for a long time, and then finally displayed "Timed out for waiting the udev queue being empty."

It then continued to boot, but without my LVM partition, which caused all my containers to fail to start.

I did the trick of increasing the timeout period, and this did work, but now my server takes an absolute age to boot, and, worse, my containers all take an age to start as well, for no apparent reason.

What actually causes this huge delay, why might it have happened after a power outage, and is there anything I can do to undo the damage?
please check the health of your physical disk, e.g. using smartctl -a /dev/XYZ. Are there any interesting messages in the system logs/journal?
 
I agree with @fiona. Maybe test the drives properly and replace the faulty one if any. Don't forget backups. ;)

Also, regarding the "trick", I personally removed it from boot time and let the server start without the LVM partition on which the VM disks are. At least I got access to the host and can send the commands manually or thru a script and know what to expect. I hate being in the dark for 10-20 minutes, I prefer booting without LVM and then "having control". Note that this only happens on one of my servers and it is definitely the slowest one I have regarding disks. Others have even bigger disks and disk usage is higher but don't have this issue.

Good luck with that! :)
 
Hi,

please check the health of your physical disk, e.g. using smartctl -a /dev/XYZ. Are there any interesting messages in the system logs/journal?
SMART tests check out just fine, and there's nothing weird in the boot log.

However, I've noticed another symptom, despite the fact that containers to start and run OK, they can no longer be backed up. Telling proxmox to backup any container (I use the "stop" method, and the backup goes to an NFS mounted directory), the backup process simply runs forever, and never completes.
 
SMART tests check out just fine, and there's nothing weird in the boot log.

However, I've noticed another symptom, despite the fact that containers to start and run OK, they can no longer be backed up. Telling proxmox to backup any container (I use the "stop" method, and the backup goes to an NFS mounted directory), the backup process simply runs forever, and never completes.
What if you try to read an LV from start to finish? Does that also hang?
 
Could you please help me understand what’s going on? When I add a new hard drive, the physical machine running PVE shows an error during boot and takes a long time to get past it, unable to boot into the system. I can only boot normally if I remove the hard drive. I’ve already tried adding `thin_check_options = [ "-q", "--skip-mappings" ]`, as well as running `update-initramfs -u -k all` and `update-initramfs -u`.

1735894404281.png
 
Hi,
Could you please help me understand what’s going on? When I add a new hard drive, the physical machine running PVE shows an error during boot and takes a long time to get past it, unable to boot into the system. I can only boot normally if I remove the hard drive. I’ve already tried adding `thin_check_options = [ "-q", "--skip-mappings" ]`, as well as running `update-initramfs -u -k all` and `update-initramfs -u`.

View attachment 80140
please open a new thread, and provide the full system logs/journal from the failed boot attempt. Should you have the possibility to try, does hotplug work?
 
Solved.....
I experienced a power cut without a UPS. I had the same problem with my lvm thin volume. Running the 3 commands (as previously posted) worked for me but I was worried there was something wrong with the meta-data.
My setup: single disk 4tb lvm-thin

Fix:
1) Stop all vm's and set all to not auto-start
2) Using an external hardrive >= to the size of the local-lvm, I created a new lvm-thin (external-lvm).
3) For each VM (on the hardware tab) I moved the storage disks to the new storage target.
4) I deleted the cloud-init drives attached to each vm (if that applies to your set up). They are easily added again later.

Once all were moved, I could see under 'node -> local-lvm -> VM Disks' was now empty. This is the crutial step, there cannot be any disks left on the faulty LVM.

Next I rebooted the host, my lvm errors were gone :)

I then proceeded to move the storage back to the local-lvm.

This was a lengthy process, but I was able to do it all in the web interface (no console). My theory is that it must have been caused by corrupt meta-data for one (or more) of the VM disks. Removing the VM disks deleted the bung meta-data.

Best of luck! I simply wasn't happy having the commands to repair running each boot.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!