local-LVM not available after Kernel update on PVE 7

Taz-Matt · Jul 10, 2023

Gustavo Neves said:
Thanks for this.

I wonder if there is a way to increase the udev timeout, so we can avoid having the pvscan process killed in the first place.

Code:

1. pvscan is started by the udev 69-lvm-metad.rules. 2. pvscan activates XYZ_tmeta and XYZ_tdata. 3. pvscan starts thin_check for the pool and waits for it to complete. 4. The timeout enforced by udev is hit and pvscan is killed. 5. Some time later, thin_check completes, but the activation of the thin pool never completes.

EDIT: INCREASING THE UDEV TIMEOUT DOES WORK
The boot did take longer, but it did not fail.
I have set the timeout to 600s (10min). Default is 180s (3min).
Some people may need even more time, depending on how many disks and pools they have, but above 10min I would just use --skip-mappings

Edited

Code:

# nano /etc/udev/udev.conf

Added

Code:

event_timeout=600

Then I disabled the --skip-mappings option in lvm.conf, by commenting it
(you may skip this step if you haven't changed your lvm.conf file)

Code:

# nano /etc/lvm/lvm.conf

Disabled

Code:

# thin_check_options = [ "-q", "--skip-mappings" ]

Then updated initramfs again with both changes

Code:

# update-initramfs -u

And rebooted to test and it worked.

I think I prefer it this way. The server should not be rebooted frequently anyways. It is a longer boot with more through tests (not sure why it takes so long though).

Testing here, it took about 2m20s for the first pool to appear on the screen as "found" and 3m17s for all the pools to load. Then the boot quickly finished and the system was online. In my case, I was just above the 3min limit.

EDIT2: I wonder if there is some optimization I can do to the metadata of the pool to make this better. Also, one of my pools has two 2TB disks, but the metadata is only in one of them (it was expanded to the second disk). Not sure if this matters, but this seems to be the slow pool to check/mount.

Anyways, hope this helps someone.
Cheers"

#1 comment, this is what solved it for me. Pemanent fix. It is not "fast" but at least, if I plan a 10 minutes downtime when I update the server, I'm OK with it. Thanks for that info, really helpful!

fiona · Jul 10, 2023

Hi,

gerami said:
updated to proxmox 8 and the issue still exists.

---
Kernel Version Linux 6.2.16-2-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-2 (2023-06-13T13:30Z)
PVE Manager Version pve-manager/8.0.3/bbf3993334bfa916

unfortunately, the issue is likely here to stay for a while. Fixing it would require rather fundamental changes to Proxmox VE's/Debian's initialization or setting certain defaults that would not be ideal for most people. You can use one of the workarounds mentioned in this thread if you are affected by the issue.

valankar · Aug 10, 2023

Just came across this issue after rebooting. I did the lvchange commands and they succeed, but reboot still had the issue.

I tried event_timeout=600 but it didn't seem to work. Some of my vms failed to start with the same error. I can start them manually from the web UI. I then tried adding a startup delay of 600 to a failing vm, but that didn't work either.

--skip-mappings is the only thing that worked.

FWIW I also notice this on the console on bootup when using event_timeout=600. It definitely did not wait 10 minutes before showing that. It came in under 20 seconds. Shouldn't it wait 10 minutes or is this some other timeout?

Code:

Timed out waiting for udev queue to be empty.

CrazyH · Sep 26, 2023

hello, I have a similar problem, I cannot access the proxmox panel it starts on a VM that I created after restarting the server it is a VM-104 I do not understand, I have already run the forum commands but it doesn't work

https://gyazo.com/ba070081d751645f06163b301933017f
https://gyazo.com/074c4f22da97e70abe44f89f532a0a99

fiona · Sep 26, 2023

Hi,

CrazyH said:
hello, I have a similar problem, I cannot access the proxmox panel it starts on a VM that I created after restarting the server it is a VM-104 I do not understand, I have already run the forum commands but it doesn't work

in the screenshot, it looks like you installed cloud-init on the Proxmox host. In almost all cases, that package is intended to be used within a VM and not on the host. If it was not intentional, remove it and fix your network/hostname configuration.

CrazyH · Sep 26, 2023

fiona said:
Hi,

in the screenshot, it looks like you installed cloud-init on the Proxmox host. In almost all cases, that package is intended to be used within a VM and not on the host. If it was not intentional, remove it and fix your network/hostname configuration.

So no, that's not the problem, it's that it's a VM that asks instead of proxmox for a problem with LVM

CrazyH · Sep 26, 2023

fiona said:
Hi,

in the screenshot, it looks like you installed cloud-init on the Proxmox host. In almost all cases, that package is intended to be used within a VM and not on the host. If it was not intentional, remove it and fix your network/hostname configuration.

the proxmox works perfectly with several VMs and except that on reboot (proxmox 8) instead of booting on proxmox it reboots on the VM pve-vm--104--cloudinit while basic it is a simple VM but the server boots directly on it and not on the sda3 -> pve-root/pve-data

Symbol · Oct 23, 2023

Gustavo Neves said:
EDIT: INCREASING THE UDEV TIMEOUT DOES WORK
The boot did take longer, but it did not fail.
I have set the timeout to 600s (10min). Default is 180s (3min).

Well, actually, not here: yes it works, the thinpool is activated at last... But Proxmox started before by systemd already tried to start the VMs, already failed, so at the end the thinpool is activated but the guest VMs aren't started.
Is it expected to have systemd to start Proxmox before udev finishes to do its stuff ?

Maiko · Dec 3, 2023

Maiko said:

Hi there,

Got the same issue this night after a reboot.
Is this not yet resolve ?

I had to modify my lvm.conf to add the --skip-mappings in order to fix the issue.

Code:

# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10
pve-kernel-5.4: 6.4-17
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.4.189-1-pve: 5.4.189-1
pve-kernel-5.4.162-1-pve: 5.4.162-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksmtuned: 4.20150326
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.6-1
proxmox-backup-file-restore: 2.2.6-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-2
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

Hey, I'm back a year or so alter and got the the same problem today after upgrading to PVE 8. I've tried the udev "event_timeout" but it looks like it is ignored (the timeout come way before the 600 or 900 seconds I put there).
Do you have any idea about that?

EDIT: I added back the --skip-mapping option to lvm.conf and that worked as expected

elipso · Dec 10, 2023

Kernel Version 6.5.11-7-pve

Fixed that problem for me

Egbahan koissi · Dec 10, 2023

J'ai mis à niveau la version 7 vers la version 8 mais lorsque j'ai redémarré mes nœuds, mon volume LVM a cessé de fonctionner.

J'ai supprimé le volume pour le remettre mais j'ai ensuite eu ce message d'erreur
échec de la création du stockage : commande '/sbin/pvs --separator : --noheadings --units k --unbuffered --nosuffix --options pv_name, pv_size, vg_name, pv_uuid /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa' échec : code de sortie 5 (500)

Egbahan koissi · Dec 10, 2023

I upgraded from version 7 to version 8 but when I restarted my nodes my LVM volume stopped working.

I removed the volume to put it back but then I got this error message
create storage failed: command '/sbin/pvs --separator: --noheadings --units k --unbuffered --nosuffix --options pv_name,pv_size,vg_name,pv_uuid /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa' failed: exit code 5 (500)

Egbahan koissi · Dec 10, 2023

elipso said:
Kernel Version 6.5.11-7-pve

Fixed that problem for me

Hi Can you explain to us how you solved your problem?

Egbahan koissi · Dec 10, 2023

Maiko said:
Hey, I'm back a year or so alter and got the the same problem today after upgrading to PVE 8. I've tried the udev "event_timeout" but it looks like it is ignored (the timeout come way before the 600 or 900 seconds I put there).
Do you have any idea about that?

EDIT: I added back the --skip-mapping option to lvm.conf and that worked as

Hi Can you explain to us how you solved your problem? to which line you added the text?

fiona · Dec 11, 2023

Hi,

Egbahan koissi said:
I upgraded from version 7 to version 8 but when I restarted my nodes my LVM volume stopped working.

I removed the volume to put it back but then I got this error message
create storage failed: command '/sbin/pvs --separator: --noheadings --units k --unbuffered --nosuffix --options pv_name,pv_size,vg_name,pv_uuid /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa' failed: exit code 5 (500)

Is the device listed, when you run just pvs? What do you get when you run pvs /dev/disk/by-id/scsi-36006016024984e00cf23665e71e47afa?

Is there anything interesting in the system log/journal during boot?

DaanSelen · May 13, 2024

fiona said:
It's not an LVM bug, but should rather be considered a bug in Proxmox VE's (and likely Debian's) init configuration/handling. What (likely) happens is that the thin_check during activation takes too long and pvscan is killed (see here for more information).

Another workaround besides the one suggested by @Fidor should be setting

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

EDIT3: Yet another alternative is to increase the udev timeout: https://forum.proxmox.com/threads/l...fter-kernel-update-on-pve-7.97406/post-558890

EDIT2: Upstream bug report in Debian

EDIT: The workaround from @Fidor doesn't seem to work when the partial LVs are active:

Code:

Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.

It would require deactivating XYZ_tmeta and XYZ_tdata first.

I have the same issue, my Proxmox instance has issues booting into any kernel later than 5.x, Recently I have it booted into a 6.x kernel and everything seems to work except the issue described above.

Adding:

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

Seems to give a vague error message about invalid config placement or options. Any solutions? Or is the Bugzilla thread leading?

fiona · May 13, 2024

Hi,

celdrith said:
I have the same issue, my Proxmox instance has issues booting into any kernel later than 5.x, Recently I have it booted into a 6.x kernel and everything seems to work except the issue described above.

Adding:

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

Seems to give a vague error message about invalid config placement or options. Any solutions? Or is the Bugzilla thread leading?

where exactly did you add the options? There is a commented default in the global section of /etc/lvm/lvm.conf, you can add it there, e.g.:

Code:

global {

....

        # Configuration option global/thin_check_options.
        # List of options passed to the thin_check command.
        # With thin_check version 2.1 or newer you can add the option
        # --ignore-non-fatal-errors to let it pass through ignorable errors
        # and fix them later. With thin_check version 3.2 or newer you should
        # include the option --clear-needs-check-flag.
        # This configuration option has an automatic default value.
        # thin_check_options = [ "-q", "--clear-needs-check-flag" ]
        thin_check_options = [ "-q", "--skip-mappings" ]

bhansley · Jun 2, 2024

Weighing in on an old thread, I got this error at boot time on a couple of lvm-thin provisioned host drives, after upgrading to kernel 6.5.11-8-pve.

None of the above solutions solved it, as most were targeting Proxmox-managed volume sets, not host volumes.

For anyone else who finds this, an initramfs update solved it completely. I guess the kernel update didn't complete that successfully?

Bash:

update-initramfs -u -k all

Good luck on your quest to solve this, folks.

fiona · Jun 3, 2024

Hi,

bhansley said:
Weighing in on an old thread, I got this error at boot time on a couple of lvm-thin provisioned host drives, after upgrading to kernel 6.5.11-8-pve.

None of the above solutions solved it, as most were targeting Proxmox-managed volume sets, not host volumes.

For anyone else who finds this, an initramfs update solved it completely. I guess the kernel update didn't complete that successfully?

Bash:

update-initramfs -u -k all

Good luck on your quest to solve this, folks.

the solutions do mention that you need to run update-initramfs -u after modifying the settings. That command is not enough if you are not booting the latest installed kernel though, yours updates it for all kernels.

DavidGA · Aug 12, 2024

This just started happening to me, but after a power outage, not a software update.

After the outage, the server appeared to hang for a long time, and then finally displayed "Timed out for waiting the udev queue being empty."

It then continued to boot, but without my LVM partition, which caused all my containers to fail to start.

I did the trick of increasing the timeout period, and this did work, but now my server takes an absolute age to boot, and, worse, my containers all take an age to start as well, for no apparent reason.

What actually causes this huge delay, why might it have happened after a power outage, and is there anything I can do to undo the damage?

local-LVM not available after Kernel update on PVE 7

New Member

Proxmox Staff Member

New Member

Active Member

Attachments

Proxmox Staff Member

Active Member

Active Member

Attachments

Renowned Member

Member

Member

Member

Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

We value your privacy