[SOLVED] Volume Group not activated after the node reboot

Vladimir Bulgaru

Well-Known Member
Jun 1, 2019
216
63
48
38
Moscow, Russia
Hello!

I experience a weird issue on Proxmox 6.1. I've added a storage as LVM-thin. The container storage was created on this storage. After the machine is rebooted, the storage showed as 0% usage and the container would not launch. After debugging this issue for quite some time I've noticed that the volume group is inactive after reboot:
Code:
  INACTIVE            '/dev/vmdata/lvol0' [104.00 MiB] inherit
  INACTIVE            '/dev/vmdata/lvol1' [100.00 MiB] inherit
  INACTIVE            '/dev/vmdata/lvol2' [100.00 MiB] inherit
  INACTIVE            '/dev/vmdata/vmstore' [1.49 TiB] inherit
  INACTIVE            '/dev/vmdata/vmstore_meta0' [96.00 MiB] inherit
  INACTIVE            '/dev/vmdata/vm-400-disk-0' [30.00 GiB] inherit
  INACTIVE            '/dev/vmdata/vm-100-disk-0' [10.00 GiB] inherit

After performing vgchange -a ay vmdata, the storage came back and the containers would start.

The storage itself is Fusion-Io ioScale2 (PCIe drive) and drivers are added as kernel module. Question is - why is the volume group not activated on boot? How can i debug the reason why the activation is not happening?

Help is much appreciated!
 
Look through the journal/syslog (journalctl -b for all since the boot).
Please post the output of pveversion -v and cat /etc/pve/storage.cfg
 
To be honest, haven't noticed anything suspicious in the journal. In any case, can provide access to Proxmox. Here is the data below:

pveversion -v
Code:
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-3 (running version: 6.1-3/37248ce6)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-14
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-3
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

cat /etc/pve/storage.cfg
Code:
dir: local
        path /var/lib/vz
        content iso,backup,vztmpl

zfspool: local-zfs
        pool rpool/data
        content rootdir,images
        sparse 1

lvmthin: local-vms
        thinpool vmstore
        vgname vmdata
        content images,rootdir
 
@mira, maybe this will help.
This is what i notice on Proxmox 5 instances:
Code:
Dec 09 05:31:50 dc1 kernel: <6>fioinf ioDrive 0000:41:00.0: Found device fct0 (HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0) on pipeline 0
Dec 09 05:31:54 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: probed fct0
Dec 09 05:31:54 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: sector_size=512
Dec 09 05:31:54 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: setting channel range data to [2 .. 2047]
Dec 09 05:31:54 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: Found metadata in EBs 372-372, loading...
Dec 09 05:31:55 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: setting recovered append point 372+749518848
Dec 09 05:31:55 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: Creating device of size 1205000000000 bytes with 2353515625 sectors of 512 bytes (467381915 mapped).
Dec 09 05:31:55 dc1 kernel: fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: Creating block device fioa: major: 252 minor: 0 sector size: 512...
Dec 09 05:31:55 dc1 kernel:  fioa: fioa1
Dec 09 05:31:55 dc1 kernel: <6>fioinf HP 1205GB MLC PCIe ioDrive2 for ProLiant Servers 0000:41:00.0: Attach succeeded.
Dec 09 05:31:55 dc1 systemd[1]: Started udev Wait for Complete Device Initialization.
Dec 09 05:31:55 dc1 systemd[1]: Starting Activation of LVM2 logical volumes...
Dec 09 05:31:57 dc1 systemd[1]: Started Device-mapper event daemon.
Dec 09 05:31:57 dc1 dmeventd[1447]: dmeventd ready for processing.
Dec 09 05:31:57 dc1 lvm[1447]: Monitoring thin pool vmdata-vmstore-tpool.
Dec 09 05:31:57 dc1 lvm[1339]:   14 logical volume(s) in volume group "vmdata" now active
Dec 09 05:31:57 dc1 systemd[1]: Started Activation of LVM2 logical volumes.
Dec 09 05:31:57 dc1 systemd[1]: Reached target Encrypted Volumes.
Dec 09 05:31:57 dc1 systemd[1]: Starting Activation of LVM2 logical volumes...
Dec 09 05:31:57 dc1 systemd[1]: Reached target ZFS pool import target.
Dec 09 05:31:57 dc1 systemd[1]: Starting Mount ZFS filesystems...
Dec 09 05:31:57 dc1 lvm[1570]:   14 logical volume(s) in volume group "vmdata" now active
Dec 09 05:31:57 dc1 systemd[1]: Started Activation of LVM2 logical volumes.
Dec 09 05:31:57 dc1 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Dec 09 05:31:58 dc1 lvm[1596]:   16 logical volume(s) in volume group "vmdata" monitored
Dec 09 05:31:58 dc1 systemd[1]: Started Mount ZFS filesystems.
Dec 09 05:31:58 dc1 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Dec 09 05:31:58 dc1 systemd[1]: Reached target Local File Systems (Pre).
Dec 09 05:31:58 dc1 systemd[1]: Reached target Local File Systems.
Dec 09 05:31:58 dc1 systemd[1]: Starting Create Volatile Files and Directories...

This is what i notice on Proxmox 6 instances:
Code:
Dec 15 01:55:18 dc4 kernel: <6>fioinf ioDrive 0000:41:00.0: Found device fct0 (Fusion-io 1.65TB ioScale2 0000:41:00.0) on pipeline 0
Dec 15 01:55:24 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: probed fct0
Dec 15 01:55:24 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: sector_size=512
Dec 15 01:55:24 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: setting channel range data to [2 .. 4095]
Dec 15 01:55:24 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: Found metadata in EBs 1302-1303, loading...
Dec 15 01:55:25 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: setting recovered append point 1303+488816640
Dec 15 01:55:25 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: Creating device of size 1650000000000 bytes with 3222656250 sectors of 512 bytes (118983884 mapped).
Dec 15 01:55:25 dc4 kernel: fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: Creating block device fioa: major: 252 minor: 0 sector size: 512...
Dec 15 01:55:25 dc4 kernel:  fioa: fioa1
Dec 15 01:55:25 dc4 kernel: <6>fioinf Fusion-io 1.65TB ioScale2 0000:41:00.0: Attach succeeded.
Dec 15 01:55:25 dc4 systemd[1]: Started Helper to synchronize boot up for ifupdown.
Dec 15 01:55:25 dc4 systemd[1]: Started udev Wait for Complete Device Initialization.
Dec 15 01:55:25 dc4 systemd[1]: Condition check resulted in Import ZFS pools by cache file being skipped.
Dec 15 01:55:25 dc4 systemd[1]: Reached target ZFS pool import target.
Dec 15 01:55:25 dc4 systemd[1]: Starting Mount ZFS filesystems...
Dec 15 01:55:25 dc4 systemd[1]: Starting Wait for ZFS Volume (zvol) links in /dev...
Dec 15 01:55:25 dc4 zvol_wait[1968]: No zvols found, nothing to do.
Dec 15 01:55:25 dc4 systemd[1]: Started Wait for ZFS Volume (zvol) links in /dev.
Dec 15 01:55:25 dc4 systemd[1]: Reached target ZFS volumes are ready.
Dec 15 01:55:25 dc4 systemd[1]: Started Mount ZFS filesystems.
Dec 15 01:55:25 dc4 systemd[1]: Reached target Local File Systems.
Dec 15 01:55:25 dc4 systemd[1]: Starting Set console font and keymap...
Dec 15 01:55:25 dc4 systemd[1]: Condition check resulted in Commit a transient machine-id on disk being skipped.
Dec 15 01:55:25 dc4 systemd[1]: Starting Load AppArmor profiles...
Dec 15 01:55:25 dc4 systemd[1]: Starting Preprocess NFS configuration...
Dec 15 01:55:25 dc4 systemd[1]: Starting Commit Proxmox VE network changes...
Dec 15 01:55:25 dc4 systemd[1]: Starting Create Volatile Files and Directories...

Looks like Activation of LVM2 logical volumes is skipped for some reason. Why would this happen?
 
What's the output of systemctl list-units | grep lvm2?
 
@mira
Code:
lvm2-monitor.service    loaded active     exited    Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling               
lvm2-lvmpolld.socket    loaded active     listening LVM2 poll daemon socket
 
Looks like you're missing the services that scan and activate the volumes.
Anything in your LVM config that enables lvmetad?
 
@mira this is a clean Proxmox 6 install. I've no idea how something can be missing.
No idea how lvmetad config looks like.
If it'd be easier, i can send you the Proxmox credentials.
Again - this is a clean instance. No idea why it would not have services enabled.
 
Usually you have files with the following content for each volume:
Code:
[Unit]
Description=LVM event activation on device %i
Documentation=man:pvscan(8)
DefaultDependencies=no
StartLimitInterval=0
BindsTo=dev-block-%i.device
Before=shutdown.target
Conflicts=shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/lvm pvscan --cache --activate ay %i
ExecStop=/sbin/lvm pvscan --cache %i
 
You can take a look at /lib/udev/rules.d/69-lvm-metad.rules. There the lvm2-pvscan service file template is instantiated for the block devices.
Could it be the Fusion-IO is not yet ready when that rule is run?
 
@mira
Is it possible to adjust the template so that a log record is issued so that i can check what is the execution order?
Going through the logs of Proxmox 5 and 6 i notice that the Fusion-IO card is added more-or-less identically, so assume it's not the case.
Moreover, the strange thing is that i see no LVM log records on boot at all. No errors nor anything related to failed attempts.
 
@Vladimir Bulgaru you could start by setting debug logging in /etc/udev/udev.conf and rebuilding your initramfs

also unloading the module, executing "udevadm monitor -p" in a shell, and then modprobing the module might give some insight (e.g., which properties udev assigns to the device, how long processing takes, etc.pp.)
 
Hey, @fabian
It is indeed the case that LVM processes fire before the device is attached.
What is strange - why does it not monitor for devices added later?
Is it possible to make sure the device is attached before the LVM processes are triggered?
What is the difference in how LVM works in Proxmox 5 and 6? Seems to work just fine in 5.
 
@fabian

I understand perfectly well that you've no time to waste on support requests arriving randomly, but I think it's a question that is important. So it happens that Fusion-IO drives are a really good alternative to expensive PCIe drives out there and up until recently i had no hope that these drives will get any support for Debian 10. Luckily, the Github community is great and there are drivers that seem to address the update issue. The only problem left prior to publishing a guide how to migrate instances to Proxmox 6 for those blocked on Proxmox 5 by the Fusion-IO drives is the LVM not activating the VG after the reboot. It seems a trivial task to automate this via a script, but i'm still confused why it works on Proxmox 5 and not on Proxmox 6. My main concern is that it might be related to:
  • potential bug within Proxmox, therefore it needs to be addressed before moving the production environments to it
  • potential bug within Fusion-IO drivers, therefore it needs to be corrected before advising any upgrades or production use
LVM-thin is important since it's one of the storage types that allows for the VMs and CTs to be stored on the drive. I can provide any assistance that i can - access to the server, testing any scenarios. I would really like this to be solved, since there are many of us who are stuck on Proxmox 5 and hope to move on finally to Proxmox 6.
 
it would help if you could provide the debugging output from my previous post ;)

udev rules also cover hotplugged devices, but the LVM activate on boot services probably do not. you could either extend them to explicitly wait for your devices to come up (see "man systemd.device"), or write your own activation unit and tie that into pve-storage.target, or add some delay in initramfs (e.g., via the "rootdelay" boot parameter - note that this would require the modules to be included in initramfs ;)) to allow the disks to become visible earlier relative to the rest of the boot process.
 
also see "man lvm2-activation-generator"
 
also see "man lvm2-activation-generator"
Success! :D

Thank you for the tip. Turns out that the answer was hidden in the diff of the lvm.conf i've posted earlier. The reason why it works on Proxmox 5 and not Proxmox 6 is a different approach for LVM activation. The activation in Proxmox 6 is based on global/event_activation=1, which, as far as i understand, means that certain events need to be triggered in order for activation to occur. In my case, i need the default activation to occur on boot and it works great with global/event_activation=0.

This is a fantastic news, since it basically means that the Fusion-Io drives from the 2nd and possibly 3rd generation can be used with Proxmox 6.

Thank you again for all the help and attention!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!