local-LVM not available after Kernel update on PVE 7

Thanks for this.

I wonder if there is a way to increase the udev timeout, so we can avoid having the pvscan process killed in the first place.
Code:
1. pvscan is started by the udev 69-lvm-metad.rules.
2. pvscan activates XYZ_tmeta and XYZ_tdata.
3. pvscan starts thin_check for the pool and waits for it to complete.
4. The timeout enforced by udev is hit and pvscan is killed.
5. Some time later, thin_check completes, but the activation of the
thin pool never completes.

EDIT: INCREASING THE UDEV TIMEOUT DOES WORK
The boot did take longer, but it did not fail.
I have set the timeout to 600s (10min). Default is 180s (3min).
Some people may need even more time, depending on how many disks and pools they have, but above 10min I would just use --skip-mappings :D

Edited
Code:
# nano /etc/udev/udev.conf
Added
Code:
event_timeout=600
Then I disabled the --skip-mappings option in lvm.conf, by commenting it
(you may skip this step if you haven't changed your lvm.conf file)
Code:
# nano /etc/lvm/lvm.conf
Disabled
Code:
        # thin_check_options = [ "-q", "--skip-mappings" ]
Then updated initramfs again with both changes
Code:
# update-initramfs -u
And rebooted to test and it worked.

I think I prefer it this way. The server should not be rebooted frequently anyways. It is a longer boot with more through tests (not sure why it takes so long though).

Testing here, it took about 2m20s for the first pool to appear on the screen as "found" and 3m17s for all the pools to load. Then the boot quickly finished and the system was online. In my case, I was just above the 3min limit.

EDIT2: I wonder if there is some optimization I can do to the metadata of the pool to make this better. Also, one of my pools has two 2TB disks, but the metadata is only in one of them (it was expanded to the second disk). Not sure if this matters, but this seems to be the slow pool to check/mount.

Anyways, hope this helps someone.
Cheers"

#1 comment, this is what solved it for me. Pemanent fix. It is not "fast" but at least, if I plan a 10 minutes downtime when I update the server, I'm OK with it. Thanks for that info, really helpful!
 
Hi,
updated to proxmox 8 and the issue still exists.

---
Kernel Version Linux 6.2.16-2-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-2 (2023-06-13T13:30Z)
PVE Manager Version pve-manager/8.0.3/bbf3993334bfa916
unfortunately, the issue is likely here to stay for a while. Fixing it would require rather fundamental changes to Proxmox VE's/Debian's initialization or setting certain defaults that would not be ideal for most people. You can use one of the workarounds mentioned in this thread if you are affected by the issue.
 
Just came across this issue after rebooting. I did the lvchange commands and they succeed, but reboot still had the issue.

I tried event_timeout=600 but it didn't seem to work. Some of my vms failed to start with the same error. I can start them manually from the web UI. I then tried adding a startup delay of 600 to a failing vm, but that didn't work either.

--skip-mappings is the only thing that worked.

FWIW I also notice this on the console on bootup when using event_timeout=600. It definitely did not wait 10 minutes before showing that. It came in under 20 seconds. Shouldn't it wait 10 minutes or is this some other timeout?

Code:
Timed out waiting for udev queue to be empty.
 
Last edited:
Hi,
hello, I have a similar problem, I cannot access the proxmox panel it starts on a VM that I created after restarting the server it is a VM-104 I do not understand, I have already run the forum commands but it doesn't work
in the screenshot, it looks like you installed cloud-init on the Proxmox host. In almost all cases, that package is intended to be used within a VM and not on the host. If it was not intentional, remove it and fix your network/hostname configuration.
 
Hi,

in the screenshot, it looks like you installed cloud-init on the Proxmox host. In almost all cases, that package is intended to be used within a VM and not on the host. If it was not intentional, remove it and fix your network/hostname configuration.
So no, that's not the problem, it's that it's a VM that asks instead of proxmox for a problem with LVM
 
Hi,

in the screenshot, it looks like you installed cloud-init on the Proxmox host. In almost all cases, that package is intended to be used within a VM and not on the host. If it was not intentional, remove it and fix your network/hostname configuration.
the proxmox works perfectly with several VMs and except that on reboot (proxmox 8) instead of booting on proxmox it reboots on the VM pve-vm--104--cloudinit while basic it is a simple VM but the server boots directly on it and not on the sda3 -> pve-root/pve-data
 

Attachments

  • cd0be5a1cfff8ac03783851824706037.png
    cd0be5a1cfff8ac03783851824706037.png
    549.6 KB · Views: 24
  • 85d28196f43e631a3e9d0151d6ccaa4c.png
    85d28196f43e631a3e9d0151d6ccaa4c.png
    404.7 KB · Views: 21
  • ecc8d9642604191062e5d3db7c505314.png
    ecc8d9642604191062e5d3db7c505314.png
    284.3 KB · Views: 21
EDIT: INCREASING THE UDEV TIMEOUT DOES WORK
The boot did take longer, but it did not fail.
I have set the timeout to 600s (10min). Default is 180s (3min).
Well, actually, not here: yes it works, the thinpool is activated at last... But Proxmox started before by systemd already tried to start the VMs, already failed, so at the end the thinpool is activated but the guest VMs aren't started.
Is it expected to have systemd to start Proxmox before udev finishes to do its stuff ?
 
Hi there,

Got the same issue this night after a reboot.
Is this not yet resolve ?

I had to modify my lvm.conf to add the --skip-mappings in order to fix the issue.

Code:
# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10
pve-kernel-5.4: 6.4-17
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.4.189-1-pve: 5.4.189-1
pve-kernel-5.4.162-1-pve: 5.4.162-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksmtuned: 4.20150326
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.6-1
proxmox-backup-file-restore: 2.2.6-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-2
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1
Hey, I'm back a year or so alter and got the the same problem today after upgrading to PVE 8. I've tried the udev "event_timeout" but it looks like it is ignored (the timeout come way before the 600 or 900 seconds I put there).
Do you have any idea about that?

EDIT: I added back the --skip-mapping option to lvm.conf and that worked as expected
 
Last edited:
J'ai mis à niveau la version 7 vers la version 8 mais lorsque j'ai redémarré mes nœuds, mon volume LVM a cessé de fonctionner.

J'ai supprimé le volume pour le remettre mais j'ai ensuite eu ce message d'erreur
échec de la création du stockage : commande '/sbin/pvs --separator : --noheadings --units k --unbuffered --nosuffix --options pv_name, pv_size, vg_name, pv_uuid /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa' échec : code de sortie 5 (500)
 
I upgraded from version 7 to version 8 but when I restarted my nodes my LVM volume stopped working.

I removed the volume to put it back but then I got this error message
create storage failed: command '/sbin/pvs --separator: --noheadings --units k --unbuffered --nosuffix --options pv_name,pv_size,vg_name,pv_uuid /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa' failed: exit code 5 (500)
 
Hey, I'm back a year or so alter and got the the same problem today after upgrading to PVE 8. I've tried the udev "event_timeout" but it looks like it is ignored (the timeout come way before the 600 or 900 seconds I put there).
Do you have any idea about that?

EDIT: I added back the --skip-mapping option to lvm.conf and that worked as


Hi Can you explain to us how you solved your problem? to which line you added the text?
 
Hi,
I upgraded from version 7 to version 8 but when I restarted my nodes my LVM volume stopped working.

I removed the volume to put it back but then I got this error message
create storage failed: command '/sbin/pvs --separator: --noheadings --units k --unbuffered --nosuffix --options pv_name,pv_size,vg_name,pv_uuid /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa' failed: exit code 5 (500)
Is the device listed, when you run just pvs? What do you get when you run pvs /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa?

Is there anything interesting in the system log/journal during boot?
 
It's not an LVM bug, but should rather be considered a bug in Proxmox VE's (and likely Debian's) init configuration/handling. What (likely) happens is that the thin_check during activation takes too long and pvscan is killed (see here for more information).

Another workaround besides the one suggested by @Fidor should be setting
Code:
thin_check_options = [ "-q", "--skip-mappings" ]
in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

EDIT3: Yet another alternative is to increase the udev timeout: https://forum.proxmox.com/threads/l...fter-kernel-update-on-pve-7.97406/post-558890

EDIT2: Upstream bug report in Debian

EDIT: The workaround from @Fidor doesn't seem to work when the partial LVs are active:
Code:
Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.
It would require deactivating XYZ_tmeta and XYZ_tdata first.
I have the same issue, my Proxmox instance has issues booting into any kernel later than 5.x, Recently I have it booted into a 6.x kernel and everything seems to work except the issue described above.

Adding:
Code:
thin_check_options = [ "-q", "--skip-mappings" ]

Seems to give a vague error message about invalid config placement or options. Any solutions? Or is the Bugzilla thread leading?
 
Hi,
I have the same issue, my Proxmox instance has issues booting into any kernel later than 5.x, Recently I have it booted into a 6.x kernel and everything seems to work except the issue described above.

Adding:
Code:
thin_check_options = [ "-q", "--skip-mappings" ]

Seems to give a vague error message about invalid config placement or options. Any solutions? Or is the Bugzilla thread leading?
where exactly did you add the options? There is a commented default in the global section of /etc/lvm/lvm.conf, you can add it there, e.g.:
Code:
global {

....

        # Configuration option global/thin_check_options.
        # List of options passed to the thin_check command.
        # With thin_check version 2.1 or newer you can add the option
        # --ignore-non-fatal-errors to let it pass through ignorable errors
        # and fix them later. With thin_check version 3.2 or newer you should
        # include the option --clear-needs-check-flag.
        # This configuration option has an automatic default value.
        # thin_check_options = [ "-q", "--clear-needs-check-flag" ]
        thin_check_options = [ "-q", "--skip-mappings" ]
 
  • Like
Reactions: DaanSelen
Weighing in on an old thread, I got this error at boot time on a couple of lvm-thin provisioned host drives, after upgrading to kernel 6.5.11-8-pve.

None of the above solutions solved it, as most were targeting Proxmox-managed volume sets, not host volumes.

For anyone else who finds this, an initramfs update solved it completely. I guess the kernel update didn't complete that successfully?

Bash:
update-initramfs -u -k all

Good luck on your quest to solve this, folks.
 
Hi,
Weighing in on an old thread, I got this error at boot time on a couple of lvm-thin provisioned host drives, after upgrading to kernel 6.5.11-8-pve.

None of the above solutions solved it, as most were targeting Proxmox-managed volume sets, not host volumes.

For anyone else who finds this, an initramfs update solved it completely. I guess the kernel update didn't complete that successfully?

Bash:
update-initramfs -u -k all

Good luck on your quest to solve this, folks.
the solutions do mention that you need to run update-initramfs -u after modifying the settings. That command is not enough if you are not booting the latest installed kernel though, yours updates it for all kernels.
 
This just started happening to me, but after a power outage, not a software update.

After the outage, the server appeared to hang for a long time, and then finally displayed "Timed out for waiting the udev queue being empty."

It then continued to boot, but without my LVM partition, which caused all my containers to fail to start.

I did the trick of increasing the timeout period, and this did work, but now my server takes an absolute age to boot, and, worse, my containers all take an age to start as well, for no apparent reason.

What actually causes this huge delay, why might it have happened after a power outage, and is there anything I can do to undo the damage?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!