local-LVM not available after Kernel update on PVE 7

fiona · Jun 10, 2022

Hi,

Krypty said:
Hopefully there's no real downside to this.

the downside is that the check at start-up is not as thorough, quoting from man thin_check:

Code:

--skip-mappings
              Skip checking of the block mappings which make up the bulk of the metadata.

So you might want to do the full check manually from time to time. Unfortunately, that can't be done while the pool is active IIRC.

colemorgan · Jun 24, 2022

None of these solutions have worked for me. I just a have console full of :

pvestatd[1869]: activating LV 'pve/data' failed: Use --select vg_uuid=<uuid> in place of the VG name.

fiona · Jun 24, 2022

colemorgan said:
None of these solutions have worked for me. I just a have console full of :

pvestatd[1869]: activating LV 'pve/data' failed: Use --select vg_uuid=<uuid> in place of the VG name.

That's most likely a different issue. What's the full output when you manually run lvchange -ay pve/data? What's the output of lvs?

colemorgan · Jun 25, 2022

Code:

# lvchange -ay pve/data~
  WARNING: VG name pve is used by VGs JCjMnn-c2uP-tB6N-7tOq-4Nab-KNVz-fm8Ftm and Jo703t-z11X-sUen-P3r0-5VvR-D4sW-LXPRHK.
  Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
  Multiple VGs found with the same name: skipping pve
  Use --select vg_uuid=<uuid> in place of the VG name.

Code:

# lvs
  WARNING: VG name pve is used by VGs JCjMnn-c2uP-tB6N-7tOq-4Nab-KNVz-fm8Ftm and Jo703t-z11X-sUen-P3r0-5VvR-D4sW-LXPRHK.
  Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <338.36g             33.34  2.06                           
  data          pve twi---tz-- <794.79g                                                   
  root          pve -wi-ao----   96.00g                                                   
  root          pve -wi-------   96.00g                                                   
  swap          pve -wi-ao----    8.00g                                                   
  swap          pve -wi-------    8.00g                                                   
  vm-100-disk-0 pve Vwi-a-tz--   32.00g data        32.71                                 
  vm-101-disk-0 pve Vwi-a-tz--   16.00g data        53.78                                 
  vm-102-disk-0 pve Vwi-a-tz--   16.00g data        54.20                                 
  vm-103-disk-0 pve Vwi-a-tz--   32.00g data        46.58                                 
  vm-104-disk-0 pve Vwi-a-tz--   18.00g data        36.77                                 
  vm-105-disk-0 pve Vwi-a-tz--   16.00g data        92.23                                 
  vm-106-disk-0 pve Vwi-a-tz--   32.00g data        26.55                                 
  vm-107-disk-0 pve Vwi-a-tz--   32.00g data        80.63                                 
  vm-108-disk-0 pve Vwi-a-tz--    4.00m data        0.00                                   
  vm-108-disk-1 pve Vwi-a-tz--   32.00g data        45.24

colemorgan · Jun 26, 2022

Update: I removed an NVMe drive that I had installed recently and the system functions normally now. The system did boot ~2-3 times with this drive installed before it got into this bad state.

I would love to know how I can install NVMe without it porking the whole system. Is there something I can do to stop the PCIe identifiers from shifting? I assume it has something to do with that.

fiona · Jun 27, 2022

colemorgan said:
Update: I removed an NVMe drive that I had installed recently and the system functions normally now. The system did boot ~2-3 times with this drive installed before it got into this bad state.

I would love to know how I can install NVMe without it porking the whole system. Is there something I can do to stop the PCIe identifiers from shifting? I assume it has something to do with that.

It sounds like there was an other PVE installation on that drive (therefore the duplicate VG names). Likely the easiest way is to copy any data on the drive you still need and then wipe it clean.

Maiko · Sep 25, 2022

Hi there,

Got the same issue this night after a reboot.
Is this not yet resolve ?

I had to modify my lvm.conf to add the --skip-mappings in order to fix the issue.

Code:

# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-helper: 7.2-12
pve-kernel-5.15: 7.2-10
pve-kernel-5.4: 6.4-17
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.4.189-1-pve: 5.4.189-1
pve-kernel-5.4.162-1-pve: 5.4.162-2
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksmtuned: 4.20150326
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.6-1
proxmox-backup-file-restore: 2.2.6-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-2
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

fiona · Sep 26, 2022

Hi,

Maiko said:
Hi there,

Got the same issue this night after a reboot.
Is this not yet resolve ?

I had to modify my lvm.conf to add the --skip-mappings in order to fix the issue.

no, it's not yet resolved. There was no reaction from Debian developers to the bug report and if the thin_check takes too long on a system, it will run into the timeout of course.

As suggested here, two ways to fix it would be:

Add --skip-mappings in the udev rules for LVM. But since it works for most people without that, I'd argue the current "add it if you're affected" is the better approach
Switch to using systemd for auto-activation during early boot. But Debian doesn't currently do this and it would be a non-trivial change of course.

jhoncasl · Mar 9, 2023

Hello,

Last weekend we had a power failure and our server shut down improperly. It's been a long week, and I feel like I'm close to being able to solve the problem. I am not an LVM expert and this has been a somewhat complex learning experience.

I've been through a lot, so I'll give a brief summary of the situation (someone else might find it useful):

1. Oversized storage. It was not allowed to extend so I was forced to 'use what I had on hand' and I had to link a USB drive to the PV and in turn add it because I couldn't extend the drive and it wouldn't let me do any kind of extend.

Code:

Device     Boot Start        End    Sectors   Size Id Type
/dev/sdb1        2048 1953521663 1953519616 931.5G  7 HPFS/NTFS/exFAT
root@Desarrollo:~#
root@Desarrollo:~# vgextend pve /dev/sdb1
WARNING: ntfs signature detected on /dev/sdb1 at offset 3. Wipe it? [y/n]: y
  Wiping ntfs signature on /dev/sdb1.
  Physical volume "/dev/sdb1" successfully created.
  Volume group "pve" successfully extended
root@Desarrollo:~#
root@Desarrollo:~#
root@Desarrollo:~# vgs
  VG  #PV #LV #SN Attr   VSize  VFree
  pve   2  37   0 wz--n- 13.64t 932.07g
root@Desarrollo:~# lvchange -an /dev/pve/data
root@Desarrollo:~#
root@Desarrollo:~# lvconvert --repair /dev/pve/data
  WARNING: Sum of all thin volume sizes (<30.20 TiB) exceeds the size of thin pools and the size of whole volume group (13.64 TiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.

2. I tried to repair the Thin Pool with this article (https://www.unixrealm.com/repair-a-thin-pool/)

Code:

root@Desarrollo:/usr/sbin# ./thin_check /dev/pve/data_meta1
examining superblock
TRANSACTION_ID=159
METADATA_FREE_BLOCKS=4145151
examining devices tree
examining mapping tree
  thin device 3954164 is missing mappings [0, -]
    block out of bounds (2894699358584874 >= 4145152)

  thin device 3954165 is missing mappings [0, -]
    block out of bounds (2923571252822058 >= 4145152)

  thin device 3954166 is missing mappings [0, -]
    block out of bounds (2923571269599274 >= 4145152)
a many more missing....

3. I ran many times the lvconvert --repair process and generated several "meta" (in my desperate attempt to do tests)

4. Then "Transaction id mismatch" errors appeared, so it should not be manipulated or applied any procedure... I was able to repair it with (https://blog.monotok.org/lvm-transaction-id-mismatch-and-metadata-resize-error/)

From everything I did it allowed me to see the local-lvm online and I was able to debug some checkpoints from the proxmox GUI by lowering the thin-pool to half its size but after that I got this new error when trying to mount

Code:

root@0:~# lvextend -L+1G pve/data_tmeta
  Thin pool pve-data-tpool (254:10) transaction_id is 159, while expected 161.
  Failed to activate pve/data.

root@Desarrollo:~# vgchange -ay
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  device-mapper: reload ioctl on  (253:16) failed: No data available
  13 logical volume(s) in volume group "pve" now active
root@Desarrollo:~# lvscan
  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [96.00 GiB] inherit
  ACTIVE            '/dev/pve/data' [<12.59 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-7' [250.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-4' [250.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-5' [250.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-6' [200.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-8' [1.95 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-0' [500.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-9' [1.95 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-10' [1000.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-12' [500.00 GiB] inherit
  ACTIVE            '/dev/pve/data_meta0' [15.81 GiB] inherit
  inactive          '/dev/pve/vm-119-disk-0' [100.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-1' [1000.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-3' [1.95 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-2' [100.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-4' [<3.91 TiB] inherit
  ACTIVE            '/dev/pve/data_meta1' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta2' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta3' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta4' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta5' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta6' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta7' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta8' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/repaired_01' [24.00 GiB] inherit
root@Desarrollo:~# lsblk
NAME                    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                       8:0    0  12.7T  0 disk
├─sda1                    8:1    0   512M  0 part
└─sda2                    8:2    0  12.7T  0 part
  ├─pve-swap            253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root            253:1    0    96G  0 lvm  /
  ├─pve-data_meta0      253:2    0  15.8G  0 lvm
  ├─pve-data_meta1      253:3    0  15.8G  0 lvm
  ├─pve-data_meta2      253:4    0  15.8G  0 lvm
  └─pve-data_tdata      253:6    0  12.6T  0 lvm
    └─pve-data-tpool    253:7    0  12.6T  0 lvm
      ├─pve-data        253:8    0  12.6T  1 lvm
      └─pve-repaired_01 253:15   0    24G  0 lvm
sdb                       8:16   0 931.5G  0 disk
└─sdb1                    8:17   0 931.5G  0 part
  ├─pve-data_tmeta      253:5    0  15.8G  0 lvm
  │ └─pve-data-tpool    253:7    0  12.6T  0 lvm
  │   ├─pve-data        253:8    0  12.6T  1 lvm
  │   └─pve-repaired_01 253:15   0    24G  0 lvm
  ├─pve-data_meta3      253:9    0  15.8G  0 lvm
  ├─pve-data_meta4      253:10   0  15.8G  0 lvm
  ├─pve-data_meta5      253:11   0  15.8G  0 lvm
  ├─pve-data_meta6      253:12   0  15.8G  0 lvm
  ├─pve-data_meta7      253:13   0  15.8G  0 lvm
  └─pve-data_meta8      253:14   0  15.8G  0 lvm
sr0                      11:0    1  1024M  0 rom

5. Metadata repair (https://charlmert.github.io/blog/2017/06/15/lvm-metadata-repair/)

Code:

root@0:~# lvchange --yes -ay pve/data_tmeta
  Allowing activation of component LV.
  Activation of logical volume pve/data_tmeta is prohibited while logical volume pve/repaired_01 is active.
5 root@0:~# lvchange --yes -an pve/repaired_01

127 root@0:~# lvscan
  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [96.00 GiB] inherit
  ACTIVE            '/dev/pve/data' [<12.59 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-7' [250.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-4' [250.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-5' [250.00 GiB] inherit
  inactive          '/dev/pve/vm-102-disk-6' [200.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-8' [1.95 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-0' [500.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-9' [1.95 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-10' [1000.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-12' [500.00 GiB] inherit
  ACTIVE            '/dev/pve/data_meta0' [15.81 GiB] inherit
  inactive          '/dev/pve/vm-119-disk-0' [100.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-1' [1000.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-3' [1.95 TiB] inherit
  inactive          '/dev/pve/vm-101-disk-2' [100.00 GiB] inherit
  inactive          '/dev/pve/vm-101-disk-4' [<3.91 TiB] inherit
  ACTIVE            '/dev/pve/data_meta1' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta2' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta3' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta4' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta5' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta6' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta7' [15.81 GiB] inherit
  ACTIVE            '/dev/pve/data_meta8' [15.81 GiB] inherit
  inactive          '/dev/pve/repaired_01' [24.00 GiB] inherit

root@0:~# lvchange --yes -ay pve/data_tmeta
  Allowing activation of component LV.
  Activation of logical volume pve/data_tmeta is prohibited while logical volume pve/data is active.

root@0:~# lvchange --yes -an pve/data

root@0:~# lvchange --yes -ay pve/data_tmeta
  Allowing activation of component LV.
root@0:~# thin_dump  /dev/mapper/pve-data_tmeta -o thin_dump_pve-data_tmeta.xml
root@0:~# more thin_dump_pve-data_tmeta.xml
<superblock uuid="" time="42" transaction="178" flags="0" version="2" data_block_size="128" nr_data_blocks="211163456">
  <device dev_id="87" mapped_blocks="46265" transaction="175" creation_time="42" snap_time="42">
    <range_mapping origin_begin="0" data_begin="0" length="46265" time="42"/>
  </device>
</superblock>
root@0:~# thin_restore -i /root/tmeta.xml -o /dev/mapper/pve-repaired_01
Output file does not exist.

The output file should either be a block device,
or an existing file.  The file needs to be large
enough to hold the metadata.
1 root@0:~# thin_restore -i thin_dump_pve-data_tmeta.xml -o /dev/mapper/pve-repaired_01
Output file does not exist.

The output file should either be a block device,
or an existing file.  The file needs to be large
enough to hold the metadata.

6. I redid the procedure creating a new one called repaired_02 and doing this I was finally able to eliminate the "reload ioctl" errors

Code:

130 root@0:~# lvcreate -an -Zn -L40G --name repaired_02 pve
  WARNING: Sum of all thin volume sizes (13.84 TiB) exceeds the size of thin pools and the size of whole volume group (13.64 TiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  WARNING: Logical volume pve/repaired_02 not zeroed.
  Logical volume "repaired_02" created.

root@0:~# lvchange --yes -ay pve/repaired_02
root@0:~# thin_restore -i thin_dump_pve-data_tmeta.xml -o /dev/mapper/pve-repaired_02
truncating metadata device to 4161600 4k blocks
Restoring: [==================================================]   100%
root@0:~# lvconvert --thinpool pve/data --poolmetadata /dev/mapper/pve-repaired_02
Do you want to swap metadata of pve/data pool with metadata volume pve/repaired_02? [y/n]: y
root@0:~#

7. I ran lvconvert repair again (for the last time) and on reboot I was able to (finally) completely remove the ioctl errors

****Now if the reason why I reopen this thread****
After reboot it gave error that cannot mount partition because "t_data" is active (Dev pvestatd[1639]: activating LV 'pve/data' failed: Activation of logical volume pve/data is prohibited while logical volume pve/data_tdata is active.) .. I tried everything referred to in this thread:

I tried with lvchange

Code:

root@Desarrollo:~# lvchange -an pve/data_tdata
root@Desarrollo:~# lvchange -an pve/data_tmeta
root@Desarrollo:~# lvchange -ay pve/data

I tried Fidor's lvm-fix.service (https://forum.proxmox.com/threads/l...rnel-update-on-pve-7.97406/page-2#post-430860)
I checked and applied the comment in the bug opened by Fiona where I entered the line in the lvm.conf with the option "thin_check_options = [ "-q", "--skip-mappings" ]"

Then in the update-initramfs command it threw error

Code:

root@Desarrollo:/etc/lvm# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.74-1-pve

Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
oot@Desarrollo:/etc/lvm# more No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.^C
root@Desarrollo:/etc/lvm# more /etc/kernel/
install.d/  postinst.d/ postrm.d/
root@Desarrollo:/etc/lvm# more /etc/kernel/
install.d/  postinst.d/ postrm.d/
root@Desarrollo:/etc/lvm# more /etc/kernel/^C
root@Desarrollo:/etc/lvm#
root@Desarrollo:/etc/lvm#
root@Desarrollo:/etc/lvm# proxmox-boot-tool
USAGE: /usr/sbin/proxmox-boot-tool <commands> [ARGS]

  /usr/sbin/proxmox-boot-tool format <partition> [--force]
  /usr/sbin/proxmox-boot-tool init <partition>
  /usr/sbin/proxmox-boot-tool reinit
  /usr/sbin/proxmox-boot-tool clean [--dry-run]
  /usr/sbin/proxmox-boot-tool refresh [--hook <name>]
  /usr/sbin/proxmox-boot-tool kernel <add|remove> <kernel-version>
  /usr/sbin/proxmox-boot-tool kernel pin <kernel-version> [--next-boot]
  /usr/sbin/proxmox-boot-tool kernel unpin [--next-boot]
  /usr/sbin/proxmox-boot-tool kernel list
  /usr/sbin/proxmox-boot-tool status [--quiet]
  /usr/sbin/proxmox-boot-tool help
root@Desarrollo:/etc/lvm# proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
E: /etc/kernel/proxmox-boot-uuids does not exist.
root@Desarrollo:/etc/lvm#
root@Desarrollo:/etc/lvm#
root@Desarrollo:/etc/lvm# efibootmgr -v
EFI variables are not supported on this system.

I investigated a bit and it happens that as such there is no proxmox-boot-tool since this service is only for UEFI in our case I did an update-grub because the server boots with BIOS/Legacymode but I don't know if it is correct or I am missing something more for being a server with Legacymode...

And here I go... Same problem with t_data starting first and I don't know what else to try or do... I'm running out of ideas and this error is rare.

If you have any suggestions or something else I can try, I would greatly appreciate it. I feel like I'm close

My goal is to be able to remove the disks from the virtual machines to do a clean install of Proxmox.

Kind Regards

Jhon

fiona · Mar 10, 2023

jhoncasl said:

So even after running these commands the status of the thin pool is not as it should be? What does lvs say before and after running these commands?

jhoncasl said:

Code:

root@Desarrollo:/etc/lvm# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.74-1-pve

Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.

This is not necessarily an error, just informational.

jhoncasl said:
I investigated a bit and it happens that as such there is no proxmox-boot-tool since this service is only for UEFI in our case I did an update-grub because the server boots with BIOS/Legacymode but I don't know if it is correct or I am missing something more for being a server with Legacymode...

Yes, if you are not using UEFI, you can ignore the message.

jhoncasl · Mar 10, 2023

fiona said:
So even after running these commands the status of the thin pool is not as it should be? What does lvs say before and after running these commands?

This is not necessarily an error, just informational.

Yes, if you are not using UEFI, you can ignore the message.

Thanks Fiona.

I send you the before / after result. I also found interesting information https://access.redhat.com/documenta...al_volume_manager_administration/mdatarecover and tried this procedure https://forum.proxmox.com/threads/unable-to-resume -pve-data.12497/ but it didn't work.

Code:

root@Desarrollo:/etc/lvm/archive# lvs
  LV             VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data           pve twi-aotz--  <12.59t             0.02   0.43
  data_meta0     pve -wi-a-----   15.81g
  data_meta1     pve -wi-a-----   15.81g
  data_meta2     pve -wi-a-----   15.81g
  repaired_02    pve -wi-a-----   40.00g
  root           pve -wi-ao----   96.00g
  swap           pve -wi-ao----    8.00g
  vm-101-disk-0  pve Vwi---tz--  500.00g data
  vm-101-disk-1  pve Vwi---tz-- 1000.00g data
  vm-101-disk-10 pve Vwi---tz-- 1000.00g data
  vm-101-disk-12 pve Vwi---tz--  500.00g data
  vm-101-disk-2  pve Vwi---tz--  100.00g data
  vm-101-disk-3  pve Vwi---tz--    1.95t data
  vm-101-disk-4  pve Vwi---tz--   <3.91t data
  vm-101-disk-7  pve Vwi---tz--  250.00g data
  vm-101-disk-8  pve Vwi---tz--    1.95t data
  vm-101-disk-9  pve Vwi---tz--    1.95t data
  vm-102-disk-4  pve Vwi---tz--  250.00g data
  vm-102-disk-5  pve Vwi---tz--  250.00g data
  vm-102-disk-6  pve Vwi---tz--  200.00g data
  vm-119-disk-0  pve Vwi---tz--  100.00g data
root@Desarrollo:/etc/lvm/archive#
root@Desarrollo:/etc/lvm/archive# lvchange -an pve/data_tdata
  Device pve-data_tdata (253:6) is used by another device.
root@Desarrollo:/etc/lvm/archive# lvchange -an pve/data_tmeta
  Device pve-data_tmeta (253:5) is used by another device.
root@Desarrollo:/etc/lvm/archive# lvchange -ay pve/data
root@Desarrollo:/etc/lvm/archive#
root@Desarrollo:/etc/lvm/archive#
root@Desarrollo:/etc/lvm/archive# lvs
  LV             VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data           pve twi-aotz--  <12.59t             0.02   0.43
  data_meta0     pve -wi-a-----   15.81g
  data_meta1     pve -wi-a-----   15.81g
  data_meta2     pve -wi-a-----   15.81g
  repaired_02    pve -wi-a-----   40.00g
  root           pve -wi-ao----   96.00g
  swap           pve -wi-ao----    8.00g
  vm-101-disk-0  pve Vwi---tz--  500.00g data
  vm-101-disk-1  pve Vwi---tz-- 1000.00g data
  vm-101-disk-10 pve Vwi---tz-- 1000.00g data
  vm-101-disk-12 pve Vwi---tz--  500.00g data
  vm-101-disk-2  pve Vwi---tz--  100.00g data
  vm-101-disk-3  pve Vwi---tz--    1.95t data
  vm-101-disk-4  pve Vwi---tz--   <3.91t data
  vm-101-disk-7  pve Vwi---tz--  250.00g data
  vm-101-disk-8  pve Vwi---tz--    1.95t data
  vm-101-disk-9  pve Vwi---tz--    1.95t data
  vm-102-disk-4  pve Vwi---tz--  250.00g data
  vm-102-disk-5  pve Vwi---tz--  250.00g data
  vm-102-disk-6  pve Vwi---tz--  200.00g data
  vm-119-disk-0  pve Vwi---tz--  100.00g data

From what I have been able to see, it seems that the damage to the LVM was serious.

Greetings.

Jhon.

fiona · Mar 13, 2023

jhoncasl said:
root@Desarrollo:/etc/lvm/archive# lvchange -an pve/data_tdata
Device pve-data_tdata (253:6) is used by another device.

The next step is to find out what that other device is. You can run lvchange -an pve/data_tdata -vv to make the output more verbose.

pve_sepp · Mar 19, 2023

fiona said:
It's not an LVM bug, but should rather be considered a bug in Proxmox VE's (and likely Debian's) init configuration/handling. What (likely) happens is that the thin_check during activation takes too long and pvscan is killed (see here for more information).

Another workaround besides the one suggested by @Fidor should be setting

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

Many Thanks!
I had the "activating LV 'pve/data' failed: Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active." error after upgrading from PVE 6.4 to 7.3-6.
The additional entry solved the issue.

elipso · May 11, 2023

fiona said:
It's not an LVM bug, but should rather be considered a bug in Proxmox VE's (and likely Debian's) init configuration/handling. What (likely) happens is that the thin_check during activation takes too long and pvscan is killed (see here for more information).

Another workaround besides the one suggested by @Fidor should be setting

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

EDIT2: Upstream bug report in Debian

EDIT: The workaround from @Fidor doesn't seem to work when the partial LVs are active:

Code:

Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.

It would require deactivating XYZ_tmeta and XYZ_tdata first.

Still problem in 7.4-3

Linux 5.15.107-1-pve #1 SMP PVE 5.15.107-1 (2023-04-20T10:05Z)

Thanks works fine.

Gustavo Neves · May 23, 2023

fiona said:
Another workaround besides the one suggested by @Fidor should be setting

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

Thanks for this.

I wonder if there is a way to increase the udev timeout, so we can avoid having the pvscan process killed in the first place.

Code:

1. pvscan is started by the udev 69-lvm-metad.rules.
2. pvscan activates XYZ_tmeta and XYZ_tdata.
3. pvscan starts thin_check for the pool and waits for it to complete.
4. The timeout enforced by udev is hit and pvscan is killed.
5. Some time later, thin_check completes, but the activation of the
thin pool never completes.

EDIT: INCREASING THE UDEV TIMEOUT DOES WORK
The boot did take longer, but it did not fail.
I have set the timeout to 600s (10min). Default is 180s (3min).
Some people may need even more time, depending on how many disks and pools they have, but above 10min I would just use --skip-mappings

Edited

Code:

# nano /etc/udev/udev.conf

Added

Code:

event_timeout=600

Then I disabled the --skip-mappings option in lvm.conf, by commenting it
(you may skip this step if you haven't changed your lvm.conf file)

Code:

# nano /etc/lvm/lvm.conf

Disabled

Code:

        # thin_check_options = [ "-q", "--skip-mappings" ]

Then updated initramfs again with both changes

Code:

# update-initramfs -u

And rebooted to test and it worked.

I think I prefer it this way. The server should not be rebooted frequently anyways. It is a longer boot with more through tests (not sure why it takes so long though).

Testing here, it took about 2m20s for the first pool to appear on the screen as "found" and 3m17s for all the pools to load. Then the boot quickly finished and the system was online. In my case, I was just above the 3min limit.

EDIT2: I wonder if there is some optimization I can do to the metadata of the pool to make this better. Also, one of my pools has two 2TB disks, but the metadata is only in one of them (it was expanded to the second disk). Not sure if this matters, but this seems to be the slow pool to check/mount.

Anyways, hope this helps someone.
Cheers"

gerami · Jun 21, 2023

Hi, wish you all great day

Increasing event_timeout that @Gustavo Neves suggested did not work for me. I tried to set it to 600, 1200 and 1800 but none of them fixed this bug for me.
Adding --skip-mappings to thin_check_options did not work either!

But manually running these commands works:

Code:

lvchange -an pve/data_tdata
lvchange -an pve/data_tmeta
vgchange -ay

vgchange -ay takes a looooong time to finish. maybe there's a problem in here.

I was using pve 7.2 then upgraded to 7.4. Tested Linux kernel 5.13 to 5.15.107. And this is happening on all my HP DL360 G8 servers. 4 so far. I have NVMe connected to them via PCI port and 1 or more disk via raid controller.

How can I check if after running update-initramfs -u command event_timeout and --skip-mappings options have been applied? Is there any command to verify the options in initramfs?
Any other suggestion to fix this problem permanently? now each time I reboot the server I have to run these three commands.

here I post the output of some commands showing problem:
# lvscan
ACTIVE '/dev/pve/swap' [8.00 GiB] inherit
ACTIVE '/dev/pve/root' [96.00 GiB] inherit
inactive '/dev/pve/data' [<7.13 TiB] inherit
inactive '/dev/pve/base-9003-disk-0' [2.00 GiB] inherit
inactive '/dev/pve/vm-9003-cloudinit' [4.00 MiB] inherit
inactive '/dev/pve/vm-119-cloudinit' [4.00 MiB] inherit
inactive '/dev/pve/vm-119-disk-0' [2.00 TiB] inherit
inactive '/dev/pve/base-9023-disk-0' [<2.20 GiB] inherit
...

# vgchange -ay
Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.
Activation of logical volume pve/vm-9003-cloudinit is prohibited while logical volume pve/data_tmeta is active.
Activation of logical volume pve/vm-119-cloudinit is prohibited while logical volume pve/data_tmeta is active.
Activation of logical volume pve/vm-119-disk-0 is prohibited while logical volume pve/data_tmeta is active.
Activation of logical volume pve/vm-9023-cloudinit is prohibited while logical volume pve/data_tmeta is active.
Activation of logical volume pve/vm-1003-cloudinit is prohibited while logical volume pve/data_tmeta is active.
...

muner · Jun 22, 2023

L1243 said:
Now I updated to 5.13.19-2-pve
After that both workarounds arent working anymore.

Does anyone has an Idea?

Following error appears, even when I am using one of the workarounds
Activation of logical volume vm/vm is prohibited while logical volume vm/vm_tmeta is active.

SOLVED: By running these commands manually after a reboot

Code:

lvchange -a n lvm/lvm_tmeta lvchange -a n lvm/lvm_tdata lvchange -ay (can take some time to run)

root@pve1:~# lvchange -a n lvm/lvm_tmeta
Volume group "lvm" not found
Cannot process volume group lvm
root@pve1:~#

muner · Jun 23, 2023

fiona said:
It's not an LVM bug, but should rather be considered a bug in Proxmox VE's (and likely Debian's) init configuration/handling. What (likely) happens is that the thin_check during activation takes too long and pvscan is killed (see here for more information).

Another workaround besides the one suggested by @Fidor should be setting

Code:

thin_check_options = [ "-q", "--skip-mappings" ]

in your /etc/lvm/lvm.conf and running update-initramfs -u afterwards.

EDIT3: Yet another alternative is to increase the udev timeout: https://forum.proxmox.com/threads/l...fter-kernel-update-on-pve-7.97406/post-558890

EDIT2: Upstream bug report in Debian

EDIT: The workaround from @Fidor doesn't seem to work when the partial LVs are active:

Code:

Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.

It would require deactivating XYZ_tmeta and XYZ_tdata first.

muner · Jun 23, 2023

gerami · Jul 9, 2023

updated to proxmox 8 and the issue still exists.

---
Kernel Version Linux 6.2.16-2-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-2 (2023-06-13T13:30Z)
PVE Manager Version pve-manager/8.0.3/bbf3993334bfa916

local-LVM not available after Kernel update on PVE 7

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

Member

Proxmox Staff Member

New Member

Attachments

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Member

Active Member

Member

New Member

New Member

New Member

Member

We value your privacy