[SOLVED] Bug in LVM tools? Not really

hase · Apr 14, 2025

Moin,
$Something happened and now my pve will no longer start any VMs or containers.
Two of the four Pvs of my VG "pve" are still present, still working (apparently) and yet, pvscan outputs:

root@pve2:~# pvscan
WARNING: Couldn't find device with uuid U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX.
WARNING: Couldn't find device with uuid ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi.
WARNING: VG pve is missing PV U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX (last written to /dev/sde).
WARNING: VG pve is missing PV ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi (last written to /dev/sdc).
PV /dev/nvme1n1p3 VG pve lvm2 [237.47 GiB / 0 free]
PV /dev/nvme0n1 VG pve lvm2 [238.47 GiB / 0 free]
PV [unknown] VG pve lvm2 [<223.57 GiB / 0 free]
PV [unknown] VG pve lvm2 [<1.82 TiB / 0 free]
Total: 4 [2.50 TiB] / in use: 4 [2.50 TiB] / in no VG: 0 [0 ]

The disks /dev/sdc and /dev/sde respectively are present and have the UUIDs that LVM can not find:

root@pve2:~# blkid
/dev/mapper/pve-root: UUID="9b845542-e254-48f2-87c1-dd5c27b201b7" BLOCK_SIZE="4096" TYPE="ext4"
/dev/nvme0n1: UUID="GJjzAX-T9XA-rbkq-0f7Z-aFB2-OZ2H-8FXXAN" TYPE="LVM2_member"
/dev/sdd1: UUID="625af590-2d8c-4685-9c91-4491f162e9ad" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="6d431942-5d5c-469e-85af-696c96e4e282"
/dev/mapper/pve-swap: UUID="ca27ecec-32d8-4220-99e5-f67356ba1564" TYPE="swap"
/dev/sde: UUID="U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX" TYPE="LVM2_member"
/dev/sdc: UUID="ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi" TYPE="LVM2_member"
/dev/nvme1n1p2: UUID="6FD2-E474" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="29e59fcb-46bf-4e91-8c4f-93fb7e27ee55"
/dev/nvme1n1p3: UUID="nVbdwp-7vlk-lx4z-tgBK-ZlYg-ZQXZ-ZzBtCP" TYPE="LVM2_member" PARTUUID="5976bd78-f420-4002-a8b7-d25b2c603ff0"
/dev/sda: UUID="be555ad3-fb38-4fda-8e00-a5e1e177f678" TYPE="crypto_LUKS"
/dev/sdg1: LABEL_FATBOOT="EFI" LABEL="EFI" UUID="67E3-17ED" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="9129b7ad-3f79-458b-a5a8-230a97308331"
/dev/sdg2: UUID="db93f7bb-2e92-4949-9aa2-4d26eadcada0" UUID_SUB="0798f179-9c46-48a3-9880-d909c0b1acb6" BLOCK_SIZE="4096" TYPE="btrfs" PARTLABEL="stick" PARTUUID="f086c4db-9194-475f-b248-a08c69bafda6"
/dev/nvme1n1p1: PARTUUID="ad3ce44d-8aed-46f1-a976-db56245da2c4"

I did find a pointer to
https://www.learnitguide.net/2017/12/couldnt-find-device-with-uuid-recover.html
but I can not meka heads or tails of this post.
First it greps for the UUID in files unter /etc/lvm/archive, then never somehow files ind /etc/lvm/backup are used -- I am confused.

When I try the pvcreate --uuid [...] --restore /etc/lvm/backup/pve /dev/sde
it does not work:

root@pve2:~# pvcreate --uuid "U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX" --restore /etc/lvm/backup/pve /dev/sde
WARNING: Couldn't find device with uuid nVbdwp-7vlk-lx4z-tgBK-ZlYg-ZQXZ-ZzBtCP.
WARNING: Couldn't find device with uuid GJjzAX-T9XA-rbkq-0f7Z-aFB2-OZ2H-8FXXAN.
WARNING: Couldn't find device with uuid U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX.
WARNING: Couldn't find device with uuid ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi.
Cannot use /dev/sde: device is rejected by filter config

Since the pvcreate command comlpains about "not finding" all 4 physical columes - including the two that pvscan considers fine - I am thinking that all PVs currently online and in-use are "not found2 by pvcreate.

Is there a way to fix this without loosing all data?

merci
hase

hase · Apr 15, 2025

Digging a little deeper, learning about lvm on the way.

Observation 1: The Proxmox VE Installer ISO rescue mode boots the installed system. This does not help with the problem.
Observation 2: my stupid motherboard will not boot a Debian LiveCD. But the old Knoppix Live-stick works

Knoppix (kernel 5.10.10-64) can see the PVs and the volume group just fine.

root@Microknoppix:/home/knoppix# uname -a
Linux Microknoppix 5.10.10-64 #3 SMP PREEMPT Sun Feb 7 09:26:54 CET 2021 x86_64 GNU/Linux
root@Microknoppix:/home/knoppix# pvscan
PV /dev/nvme1n1p3 VG pve lvm2 [237,47 GiB / 0 free]
PV /dev/nvme0n1 VG pve lvm2 [238,47 GiB / 0 free]
PV /dev/sde VG pve lvm2 [<223,57 GiB / 0 free]
PV /dev/sdc VG pve lvm2 [<1,82 TiB / 0 free]
Total: 4 [2,50 TiB] / in use: 4 [2,50 TiB] / in no VG: 0 [0 ]
root@Microknoppix:/home/knoppix# vgscan
Found volume group "pve" using metadata type lvm2
root@Microknoppix:/home/knoppix#

So I would conclude: all data is there.

Booting the pve server again, the problem is reproduced: the pve-root and pve-swap volumes work as evidenced by the pve actually booting fine.

But the LVM sees the VG as degraded/inoperational:

root@pve2:~# pvscan
WARNING: Couldn't find device with uuid U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX.
WARNING: Couldn't find device with uuid ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi.
WARNING: VG pve is missing PV U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX (last written to /dev/sde).
WARNING: VG pve is missing PV ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi (last written to /dev/sdc).
PV /dev/nvme1n1p3 VG pve lvm2 [237.47 GiB / 0 free]
PV /dev/nvme0n1 VG pve lvm2 [238.47 GiB / 0 free]
PV [unknown] VG pve lvm2 [<223.57 GiB / 0 free]
PV [unknown] VG pve lvm2 [<1.82 TiB / 0 free]

How do I get the pve LVM to assemble the volume group corretly again?

merci
hase

hase · Friday at 14:46

Still laboring on this issue.
I finally got seome debian live system to boot (my BIOS rejects all the Debian LiveCD images but I can start a rescue system from a Debian-12 installer).

The istaller uses kernel 6.1.66-1.
As rescue options I can start a shell in /dev/pve/root (the root of my PVE installation) or in the installer environment.

A shell in the pve root shows the same warning avout the volume group: "couldn't find device with uuid ..." twice fo the two sata physical volumes.
A shell in the installer environment simply shows the vg as online and all pvs found.

When I add another (sata) disk in the installer environement
pvcreate /dev/sdb
vgextend pve /dev/sdb

my volume group gets extended as expected.
I can also pvmove the non-thin volumes off the two PVs affected in the pve/root environment. Sadly the thin-provisioned volumes can not be move, as the installer lacks the dm-thin-pool module :-(

But the kicker is: when I switch to the pve/root environment again after extending the VG, the added PV is also not found.
And again: blkid clearly lists the newly added disk with the proper UUID, still the lvm tools from the pve/root environment do not recognize it.

I am getting a picture here: the pve-supplied lvm tooling is - äh - defective?

Still missing in that picture: a way to get my data back...

waltar · Friday at 15:10

Try this steps:
pvscan
vgscan
vgcfgrestore --list
vgcfgrestore pve
vgscan
vgs
lvscan
lvs
thin_check /dev/pve/data
thin_check --auto-repair /dev/pve/data
thin_check /dev/pve/data
maybe this also if not ok after the above cmd's: vgchange -ay pve
lvscan
lvs

Kingneutron · Friday at 17:45

> Still missing in that picture: a way to get my data back...

Just pointing out here, you should have backups that can be restored...

hase · Saturday at 10:01

Thanks for the hints.

I tried them in order.
The first failing is the pvscan, still warning me about the missing sata disks by uuid.

vgcfgrestore --list pve
gives me a long list of backup files.

vgcfgrestore pve
first warns about restoring a config for a VG with active volumes (the pve root is active there).
Ignoring the warning the command fails - again unable to locate the sata disks that blkid can find..
root@pve2:~# vgcfgrestore pve
Volume group pve has active volume: root.
Volume group pve has active volume: swap.
WARNING: Found 2 active volume(s) in volume group "pve".
Restoring VG with active LVs, may cause mismatch with its metadata.
Do you really want to proceed with restore of volume group "pve", while 2 volume(s) are active? [y/n]: y
WARNING: Couldn't find device with uuid U2JOLe-2G7S-MtuH-zp3p-8gbP-Awxi-pr2OpX.
WARNING: Couldn't find device with uuid ZbrUMQ-wEnk-IHH5-lolX-GLrg-Xm5d-02hyVi.
Consider using option --force to restore Volume Group pve with thin volumes.
Restore failed.
root@pve2:~#

What really puzzles me: the Ubuntu 24.04 Live system can mount/read/manipulate the VG without any problems.
I even tried adding another PV to the VG and remove the two affected disks (pvmove /dev/sdc and pvmove /dev/sde respectively), but I did not remove the "missing" PVs from my VG pve (so no vgreduce).

Result after booting into pve again:
Now three PVs are listed as "missing" by their UUID: the newly added sata disk is now also "missing" while blkid again finds it.

I am guessing there is something wrong with the filters employed by the lvm-tools installed in my pve.

Well, it is a rainy easter weekend in Berlin, so why not manually scrape all the affected VM images from the server (running the Ubuntu Live) and see if I can recover some data...

This is the second time that my pve ran for a while just fine and then suddenly started "missing" the sata PVs in the VG "pve".
This may be a problem with my hardware or my setup (I have a pretty aggressive spindown timer on the magnetic disks).
Lets see how to recover and then engineer around the problem. The nvme PVs seem to be working just fine...

hase · Saturday at 11:36

Another datapoint to consider.

I booted into a live system (debian-12 LiveCD) and the VG assembles fine, all LVs are shown as ACTIVE.
I can read the data from the vm-xxx-disk-n files in /dev/pve.

In this live environment, I also created a new PV
pvcreate /dev/sdb

And put a VG and an LV on that
vgcreate sata-disk /dev/sdb
lvcreate -L 4G --name satadisk /dev/sata-disk
mkfs.ext4 /dev/satadisk

this new LV is usable in the debian live system.

When I now reboot into pve, the sata members of VG pve are still missing and the new satadisk VG seems not to exist.
pvscan does not show /dev/sdb as a PV.
vgscan and lvscan also do not find the new volumes (sata-disk and satadisk respectively).

So this is a problem with the lvm tooling filtering out sata disks.
Hence the nvme members of /dev/pve are not affected.

pvscan -vvv confirms this hypothesis:
[...]
/dev/loop7: Skipping: Too small to hold a PV
/dev/sdb: Skipping (regex)
/dev/sdc: Skipping (regex)
/dev/sdd: Skipping (regex)
/dev/sdd1: Skipping (regex)
/dev/sde: Skipping (regex)
/dev/sdf: Skipping (regex)
[...]

But where the heck is this filter configured?
This is the filter that probably also on boot makes pvscan skip the "missing" members of /dev/pve...

waltar · Saturday at 12:22

All lvm logic is in /etc/lvm/* and you are living massive risky as building your (until sata-disk) single pve vg out of more and more single (unraided) disks.

hase · Saturday at 14:47

Ja, its a bit risky, but...
and btw: the pve worked fine until I ran a backup job - afterwards the VG stopped working.

I'll be damned.

Out of sheer desperation I commented out the line
global_filter=["r|/dev/zd.*|","r|/dev/rbd.*|", "r|/dev/sda*|", "r|/dev/sdd*|", "r|/dev/sdf*|"]
in my lvm.conf.

And now pvscan finds all drives.

root@pve2:/etc/lvm# pvscan
PV /dev/sdb VG sata-disk lvm2 [<3.64 TiB / 3.63 TiB free]
PV /dev/nvme0n1p3 VG pve lvm2 [237.47 GiB / 0 free]
PV /dev/nvme1n1 VG pve lvm2 [238.47 GiB / 0 free]
PV /dev/sde VG pve lvm2 [<223.57 GiB / 0 free]
PV /dev/sdc VG pve lvm2 [<1.82 TiB / 0 free]
Total: 5 [6.14 TiB] / in use: 5 [6.14 TiB] / in no VG: 0 [0 ]

I am somewhat new to regular expressions - am using them for a mere 35 years now - so I clearly do not understand how /dev/sdb or /dev/sdf matches the line above...

And I am also a bit puzzled why the pve worked fine through a couple of reboots for months before suddenly loosing interest in the sata drives...

Anyway, seems fixed now.
Using
global_filter=["r|/dev/zd.*|","r|/dev/rbd.*|", "r|/dev/sda|", "r|/dev/sdd|", "r|/dev/sdf|"]

Thanks for the hints.

hase
PS: If anyone is wondering, why does this noob modify the global_filter in the first place?
I got the hint from another forum thread when trying to stop the periodic pvscan to wake up my sleeping hard disks...

fiona · 2025-04-22T15:55:58+0200

Hi,

hase said:
so I clearly do not understand how /dev/sdb or /dev/sdf matches the line above...

I guess the match is because of the folllowing:

* is a special modifier and means: zero or more
/dev/sda* means: match /dev/sd followed by zero or more occurrences of a. This matches /dev/sdb because /dev/sd (i.e. with zero occurrences of a) is a substring.

Note that the pre-existing regexes use the * modifier in combination with the special character . (arbitrary character).

Note that /dev/sda.* can also be problematic if you have too many devices, because then you might have /dev/sdaa, /dev/sdab etc.

hase · 2025-04-22T18:55:11+0200

Jetzt wo Du's sagst...
Indeed, I still get + and * confused.
As they say: regex is the prime example for a language that is much easier to write than read.

What sent me down the wrong path of thinking was the delay beween cause and effect.
I started my power-tuning about 3 month ago and exempting the magnetic disks from the pvscan was about 2 month ago.
That is probably the point where the volume group borked out - but everything kept running fine.
Only after I changed the mode for the daily backups from "snapshot" to "stop" (I had consistency issues in some VMs), I saw the full effect: snapshotting VMs and containers still worked with the VG missing two members.
But as soon as such a VM or container got stopped, the disk image was no longer accessible.
Probably the same as "deleted file is still there as log as it is open in any process" effect we know and love...

hase

Search

Search

[SOLVED] Bug in LVM tools? Not really

hase

New Member

hase

New Member

hase

New Member

waltar

Renowned Member

Kingneutron

Renowned Member

hase

New Member

hase

New Member

waltar

Renowned Member

hase

New Member

fiona

Proxmox Staff Member

hase

New Member

We value your privacy