local-LVM not available after Kernel update on PVE 7

Mohave County Library · Oct 4, 2021

Updated our Proxmox PVE 7 server this morning and upon reboot the local-lvm was not available and VM's would not start. Below are the updates applied:
libpve-common-perl: 7.0-6 ==> 7.0-9
pve-container: 4.0-9 ==> 4.0-10
pve-kernel-helper: 7.0-7 ==> 7.1-2
qemu-server: 7.0-13 ==> 7.0-14

PVE said reboot was required

Once rebooted, the local-lvm showed at 0GB in WebGUI and got a start failed error when I tried to start VMs. See attachment(s)

Ran pvdisplay,lvdisplay, and vgdisplay (see attachment) the lvdisplay shows all the logical volumes as "Not available" (see attachment)

Ran lvchange -ay pve/data to activate pve/data so the local-lvm now shows active and with percentage but the VM's still won't start

Ran the lvchange -ay command for each "LV Path" to each VM-disk (ex. lvchange -ay /dev/pve/vm-209002-disk-0) after this the "logical volumes" showed available and VM's started. I have to run the lvchange -ay command for each "LV Path" to each VM-disk anytime the PVE server is rebooted. Why? How can I resolve this permanently?

fiona · Oct 11, 2021

Hi,
is there anything interesting in /var/log/syslog about lvm/pvscan? Please also share your /etc/lvm/lvm.conf and the output of pveversion -v.

Mohave County Library · Oct 12, 2021

Hi Fabian

I have attached the syslog and lvm.conf. The only thing that I noted in the syslog was the following line:
Oct 12 06:50:27 pve lvm[825]: pvscan[825] VG pve skip autoactivation.

Here is the output of pvevision -v
pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-5-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-9
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-11
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-10
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-14
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

fiona · Oct 13, 2021

I also got VG pve skip autoactivation. in my logs and my /etc/lvm/lvm.conf is identical, so those should be fine.

I noticed two things. First, in the output of lvdisplay is an error:

LV Name data
VG Name pve
LV UUID 56Hxkt-8ifA-xPKr-8a5t-qW25-Nung-HEb9Pa
LV Write Access read/write
LV Creation host, time proxmox, 2021-09-09 14:40:33 -0700
Expected thin-pool segment type but got NULL instead.
LV Pool metadata data_tmeta
LV Pool data data_tdata
LV Status NOT available
LV Size 1.67 TiB
Current LE 437892
Segments 1
Allocation inherit
Read ahead sectors auto

Second, and not sure if this is related, lvm tries to read from /etc/urandom before that was even initialized

Oct 12 06:50:27 pve kernel: [ 3.432021] random: lvm: uninitialized urandom read (4 bytes read)

Please provide the output of lvs -a and lvdisplay pve/data -vvv. Maybe there's more information there.

Mohave County Library · Oct 13, 2021

Do you want the output of lvs -a and lvdisplay pve/data -vvv while the error is occurring or after I activate the logical volumes manually? Activating the logical volumes manually has been the only way that I have been able to get the VMs on the PVE to start. This is a production server so I can't keep it offline.

Mohave County Library · Oct 14, 2021

Could this be an issue between HP and Proxmox 7? Although, these servers worked fine with Proxmox 6. All of our test PVEs are on various Dell Hardware using the no-subscription Package Repository and we haven't had an issue with them. This was the reason I went ahead and upgraded two of our 5 production PVE servers to Proxmox 7. I have since done a clean re-build of these same two PVE servers and the problem persists.

fiona · Oct 14, 2021

Mohave County Library said:
Do you want the output of lvs -a and lvdisplay pve/data -vvv while the error is occurring or after I activate the logical volumes manually? Activating the logical volumes manually has been the only way that I have been able to get the VMs on the PVE to start. This is a production server so I can't keep it offline.

It appears that the Expected thin-pool segment type but got NULL instead. error is only present in the log before the activate commands, so probably it's better to do it after the next reboot.

Mohave County Library said:
Could this be an issue between HP and Proxmox 7? Although, these servers worked fine with Proxmox 6. All of our test PVEs are on various Dell Hardware using the no-subscription Package Repository and we haven't had an issue with them. This was the reason I went ahead and upgraded two of our 5 production PVE servers to Proxmox 7. I have since done a clean re-build of these same two PVE servers and the problem persists.

So both servers show the same symptoms? And the thin pool was re-created from scratch too?

I also noticed in the syslog

Code:

Oct 10 02:38:53 pve smartd[929]: Device: /dev/sda [SAT], SMART Usage Attribute: 7 Seek_Error_Rate changed from 100 to 200

which looks like the drive is not fully healthy anymore.

Mohave County Library · Oct 14, 2021

Yes, both servers are showing the same symptoms. One is a HP Proliant Gen 9 Xeon server and the other is a HP Z620 Workstation. They have both been nuked and paved so yes the thin pools were re-created too. I rebooted the HP Z620 Workstation today so I could run the requested lvs -a and lvdisplay pve/data -vvv commands. The output is attached. As for the SMART attribute, I'm still digging into that but the drive is fairly new and is enterprise grade.

Just seems funny that all the of our test PVEs using old dell PCs are running fine with PVE7 but the HP production servers are having issues with PVE7. We don't have an HP machine that isn't in production to test this. Next week, may try to swap HDD and put it in a Dell and see if the issue still happens. They worked fine with PVE6. We may need to go back to PVE6. If so, is there an easy way to do it?

bdfy · Oct 27, 2021

Same issue with pve-manager/7.0-13/7aa7e488 (running kernel: 5.11.22-5-pve) on supermicro server.
https://pastebin.com/cExkvJGb
Didn't undrestand why "VG pve skip autoactivation."

ShEV · Oct 28, 2021

bdfy said:
Same issue with pve-manager/7.0-13/7aa7e488 (running kernel: 5.11.22-5-pve) on supermicro server.
https://pastebin.com/cExkvJGb
Didn't undrestand why "VG pve skip autoactivation."

Same issue.

fiona · Oct 28, 2021

@bdfy @ShEV Did you also experience the issues after upgrading to Proxmox VE 7 or did it coincide with a kernel update? In the latter case, can you try booting with an earlier kernel and see if it works?

bdfy · Oct 28, 2021

Fabian_E said:
@bdfy @ShEV Did you also experience the issues after upgrading to Proxmox VE 7 or did it coincide with a kernel update? In the latter case, can you try booting with an earlier kernel and see if it works?

I have upgrade a month ago or little bit later and didn't run into any problems after upgarde. Now all work fine after manual `vgchange -ay`, but lvm not auto activated on boot. Don't have a chance to test with older kernel now, sorry.

ShEV · Oct 28, 2021

Fabian_E said:
Did you also experience the issues after upgrading to Proxmox VE 7 or did it coincide with a kernel update? In the latter case, can you try booting with an earlier kernel and see if it works?

I originally had Proxmox 7 installed. The problem arose after updating the system. The kernel has changed from version 5.11.22-4 to 5.11.22-5. Loading the previous kernel does nothing. After boot im have to execute the lvchange -ay command with your hands and start the machines.

Update log:

Code:

Start-Date: 2021-10-23  10:18:24
Commandline: apt dist-upgrade
Install: swtpm-libs:amd64 (0.7.0~rc1+2, automatic), swtpm-tools:amd64 (0.7.0~rc1+2, automatic), libopts25:amd64 (1:5.18.16-4, automatic), swtpm:amd64 (0.7.0~rc1+2, automatic), libjson-glib-1.0-common:amd64 (1.6.2-1, automatic), libtpms0:amd64 (0.9.0+1, automatic), gnutls-bin:amd64 (3.7.1-5, automatic), libunbound8:amd64 (1.13.1-1, automatic), libjson-glib-1.0-0:amd64 (1.6.2-1, automatic), pve-kernel-5.11.22-5-pve:amd64 (5.11.22-10, automatic), libgnutls-dane0:amd64 (3.7.1-5, automatic)
Upgrade: reportbug:amd64 (7.10.3, 7.10.3+deb11u1), libperl5.32:amd64 (5.32.1-4+deb11u1, 5.32.1-4+deb11u2), libpam-runtime:amd64 (1.4.0-9, 1.4.0-9+deb11u1), krb5-locales:amd64 (1.18.3-6, 1.18.3-6+deb11u1), libgssapi-krb5-2:amd64 (1.18.3-6, 1.18.3-6+deb11u1), corosync:amd64 (3.1.2-pve2, 3.1.5-pve1), pve-firmware:amd64 (3.3-1, 3.3-2), perl:amd64 (5.32.1-4+deb11u1, 5.32.1-4+deb11u2), tzdata:amd64 (2021a-1, 2021a-1+deb11u1), pve-qemu-kvm:amd64 (6.0.0-3, 6.0.0-4), libproxmox-acme-perl:amd64 (1.3.0, 1.4.0), python3-reportbug:amd64 (7.10.3, 7.10.3+deb11u1), libpve-storage-perl:amd64 (7.0-10, 7.0-12), libvotequorum8:amd64 (3.1.2-pve2, 3.1.5-pve1), libkrb5support0:amd64 (1.18.3-6, 1.18.3-6+deb11u1), libquorum5:amd64 (3.1.2-pve2, 3.1.5-pve1), libcmap4:amd64 (3.1.2-pve2, 3.1.5-pve1), proxmox-backup-file-restore:amd64 (2.0.9-2, 2.0.11-1), libc6:amd64 (2.31-13, 2.31-13+deb11u2), locales:amd64 (2.31-13, 2.31-13+deb11u2), libcfg7:amd64 (3.1.2-pve2, 3.1.5-pve1), libkrb5-3:amd64 (1.18.3-6, 1.18.3-6+deb11u1), libpam-modules:amd64 (1.4.0-9, 1.4.0-9+deb11u1), qemu-server:amd64 (7.0-13, 7.0-16), libpve-access-control:amd64 (7.0-4, 7.0-5), pve-container:amd64 (4.0-9, 4.1-1), libproxmox-acme-plugins:amd64 (1.3.0, 1.4.0), libcpg4:amd64 (3.1.2-pve2, 3.1.5-pve1), pve-i18n:amd64 (2.4-1, 2.5-1), base-files:amd64 (11.1, 11.1+deb11u1), libk5crypto3:amd64 (1.18.3-6, 1.18.3-6+deb11u1), rsync:amd64 (3.2.3-4, 3.2.3-4+deb11u1), proxmox-backup-client:amd64 (2.0.9-2, 2.0.11-1), libpam-modules-bin:amd64 (1.4.0-9, 1.4.0-9+deb11u1), libpve-http-server-perl:amd64 (4.0-2, 4.0-3), pve-manager:amd64 (7.0-11, 7.0-13), libpve-common-perl:amd64 (7.0-6, 7.0-10), perl-base:amd64 (5.32.1-4+deb11u1, 5.32.1-4+deb11u2), libpam0g:amd64 (1.4.0-9, 1.4.0-9+deb11u1), libc-l10n:amd64 (2.31-13, 2.31-13+deb11u2), libc-bin:amd64 (2.31-13, 2.31-13+deb11u2), pve-kernel-5.11.22-4-pve:amd64 (5.11.22-8, 5.11.22-9), pve-kernel-5.11:amd64 (7.0-7, 7.0-8), pve-firewall:amd64 (4.2-2, 4.2-4), libcorosync-common4:amd64 (3.1.2-pve2, 3.1.5-pve1), perl-modules-5.32:amd64 (5.32.1-4+deb11u1, 5.32.1-4+deb11u2), libnozzle1:amd64 (1.21-pve1, 1.22-pve1), libknet1:amd64 (1.21-pve1, 1.22-pve1), pve-edk2-firmware:amd64 (3.20200531-1, 3.20210831-1), pve-kernel-helper:amd64 (7.0-7, 7.1-2)
End-Date: 2021-10-23  10:19:56

The boot logs contain the following:

Code:

Oct 23 11:03:24 pve1 lvm[961]:   pvscan[961] PV /dev/sdc online, VG vms is complete.
Oct 23 11:03:24 pve1 lvm[961]:   pvscan[961] VG vms skip autoactivation.

fiona · Oct 29, 2021

What might be happening is the following:

pvscan is executed early during boot (initrd/initramfs stage).
1. It creates the /run/lvm/vgs_online/<vgname> file.
2. Autoactivation is attempted (and in your case it fails for some reason).
Later during boot, pvscan is started by systemd.
1. It sees that /run/lvm/vgs_online/<vgname> exists and skips autoactivation (hence the message in the syslog). This happens regardless of whether autoactivation worked or not earlier.

Could you try setting the debug parameter for the kernel command line (see here at the bottom for how to edit it), and share the contents of /run/initramfs/initramfs.debug after re-booting?

ShEV · Oct 29, 2021

Fabian_E said:
Could you try setting the debug parameter for the kernel command line (see here at the bottom for how to edit it), and share the contents of /run/initramfs/initramfs.debug after re-booting?

Mohave County Library · Oct 30, 2021

I won't be able to do the debugging until next week on the production servers with the problem however I did do the debug on one of the test servers that is running PVE 7 and kernel 5.11.22-5 but not having the LVM issue. I compared the debug file from our working test server to ShEV's debug file and found " Volume group "PVE" not found-cannot process volume group pve". I have uploaded the debug file from our working test server and will upload the debug file from the problem server early next week.

fiona · Nov 2, 2021

We are thinking about adding activation in Proxmox VE's storage layer, when a thin pool/volume group is not active for whatever reason. But to my knowledge, autoactivation always worked in the past (and still does for most other people), so it still would be interesting to further debug the issue:

For the log where it doesn't work, I think the device might just not be there/ready yet at the time the script for LVM2 is executed. The question remains why the pvscan (which should trigger once the device shows up) fails.

Could you change the following in your /etc/lvm/lvm.conf file:

Code:

# Configuration section log.
# How LVM log information is reported.
log {
----snip----
        # Configuration option log/file.
        # Write error and debug log messages to a file specified here.
        # This configuration option does not have a default value defined.
        file = /run/lvm-log

        # Configuration option log/overwrite.
        # Overwrite the log file each time the program is run.
        overwrite = 0

        # Configuration option log/level.
        # The level of log messages that are sent to the log file or syslog.
        # There are 6 syslog-like log levels currently in use: 2 to 7 inclusive.
        # 7 is the most verbose (LOG_DEBUG).
        level = 7

Afterwards run update-initramfs -u so that the new configuration will actually be available early on. Please share the contents of /run/lvm-log after the next reboot.

Mohave County Library · Nov 2, 2021

Attached are the requested logs

fiona · Nov 3, 2021

Code:

15:04:45.899648 lvchange[471] misc/lvm-flock.c:113  _do_flock /run/lock/lvm/V_pve:aux WB
15:04:45.899673 lvchange[471] misc/lvm-flock.c:113  _do_flock /run/lock/lvm/V_pve WB
15:04:45.986404 pvscan[474] device_mapper/libdm-common.c:2494  Udev cookie 0xd4d8e9e (semid 0) destroyed
15:04:45.986435 pvscan[474] device_mapper/libdm-common.c:1484  pve-swap: Skipping NODE_ADD (253,0) 0:6 0660 [trust_udev]
15:04:45.986448 pvscan[474] device_mapper/libdm-common.c:1495  pve-swap: Processing NODE_READ_AHEAD 256 (flags=1)
15:04:45.986485 pvscan[474] device_mapper/libdm-common.c:1249  pve-swap (253:0): read ahead is 256
15:04:45.986500 pvscan[474] device_mapper/libdm-common.c:1373  pve-swap: retaining kernel read ahead of 256 (requested 256)
15:04:45.986522 pvscan[474] device_mapper/libdm-common.c:1484  pve-root: Skipping NODE_ADD (253,1) 0:6 0660 [trust_udev]
15:04:45.986536 pvscan[474] device_mapper/libdm-common.c:1495  pve-root: Processing NODE_READ_AHEAD 256 (flags=1)
15:04:45.986561 pvscan[474] device_mapper/libdm-common.c:1249  pve-root (253:1): read ahead is 256
15:04:45.986573 pvscan[474] device_mapper/libdm-common.c:1373  pve-root: retaining kernel read ahead of 256 (requested 256)
15:04:45.986583 pvscan[474] device_mapper/libdm-common.c:1484  pve-data_tmeta: Skipping NODE_ADD (253,2) 0:6 0660 [trust_udev]
15:04:45.986596 pvscan[474] device_mapper/libdm-common.c:1495  pve-data_tmeta: Processing NODE_READ_AHEAD 256 (flags=1)
15:04:45.986622 pvscan[474] device_mapper/libdm-common.c:1249  pve-data_tmeta (253:2): read ahead is 256
15:04:45.986634 pvscan[474] device_mapper/libdm-common.c:1373  pve-data_tmeta: retaining kernel read ahead of 256 (requested 256)
15:04:45.986651 pvscan[474] device_mapper/libdm-common.c:1484  pve-data_tdata: Skipping NODE_ADD (253,3) 0:6 0660 [trust_udev]
15:04:45.986661 pvscan[474] device_mapper/libdm-common.c:1495  pve-data_tdata: Processing NODE_READ_AHEAD 256 (flags=1)
15:04:45.986685 pvscan[474] device_mapper/libdm-common.c:1249  pve-data_tdata (253:3): read ahead is 256
15:04:45.986697 pvscan[474] device_mapper/libdm-common.c:1373  pve-data_tdata: retaining kernel read ahead of 256 (requested 256)
15:04:45.986715 pvscan[474] device_mapper/libdm-config.c:986  global/thin_check_executable not found in config: defaulting to /usr/sbin/thin_check
15:04:45.986728 pvscan[474] activate/dev_manager.c:2297  Running check command on /dev/mapper/pve-data_tmeta
15:04:46.5731 pvscan[474] config/config.c:1474  global/thin_check_options not found in config: defaulting to thin_check_options = [ "-q" ]
15:04:46.5756 pvscan[474] misc/lvm-exec.c:71  Executing: /usr/sbin/thin_check -q /dev/mapper/pve-data_tmeta
15:04:46.5966 pvscan[474] misc/lvm-flock.c:37  _drop_shared_flock /run/lock/lvm/V_pve.
15:04:46.6054 pvscan[474] mm/memlock.c:694  memlock reset.
15:07:46.645996 lvchange[471] misc/lvm-flock.c:47  _undo_flock /run/lock/lvm/V_pve:aux

These are the last messages from pvscan with PID 474 in the log. AFAICT based on the source code, it should really print something more. The fact that it doesn't, likely means that it crashes (or is killed). There is also an lvchange process running at the same time, which might be relevant, but there are locks, so not sure.

If it does crash, I'd say that's a bug in LVM2 and might be worth reporting upstream. I tried to reproduce the issue and find a potential root cause, but wasn't successful. As for Proxmox VE, and mentioned earlier, I'll look into adding activation for volume groups/thin pools to our storage library.

ShEV · Nov 4, 2021

Fabian_E said:
Please share the contents of /run/lvm-log after the next reboot.

local-LVM not available after Kernel update on PVE 7

New Member

Attachments

Proxmox Staff Member

New Member

Attachments

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

Attachments

Member

New Member

Proxmox Staff Member

Member

New Member

Proxmox Staff Member

New Member

Attachments

New Member

Attachments

Proxmox Staff Member

New Member

Attachments

Proxmox Staff Member

New Member

Attachments