lvm filter configured but multipath underlying device still in use

oasis9

New Member
Jan 20, 2024
4
0
1
Hi all, I have three server machines, each connected to a SAS array. One server has two paths, two servers only have one, as my DS3524 only has four SAS ports in total. I'd like to expand its setup in the future to support two paths from each machine, but for the time being I'm setting up each server's configuration in pretty much the same way; the servers with one path will still create a multipath device, but it only relies on one path.

With this in mind, when I boot any of these machines, three out of four of my multipath mappings are successfully created on the underlying path devices. I am 100% reliably having issues with a device mappings with names relating to lvm LVs within the storage being created on the underlying path devices, without waiting for multipath to map these devices then reference the LVs from there.

The following is the output from one of the machines with just one path, as I'm using a temporary workaround on my primary to keep the storage accessible by services that require it. The behaviour seems nearly identical across all three machines.

Code:
~# dmsetup ls --tree -o blkdevname
ceph--2f6caf24--554d--480f--a8c3--bf78a5f5b59d-osd--block--2c1f05b4--200e--417f--9a7b--7cae620d2fbd <dm-5> (252:5)
 └─ <sda> (8:0)
ceph--44d1ad0f--7d8f--4f20--993f--71673edef60c-osd--block--b12d69fc--44a6--4d77--a89d--2845d25af166 <dm-1> (252:1)
 └─ <sdg> (8:96)
ceph--5f34a207--e507--4c33--ae7c--06049bd9e049-osd--block--4e9fdc99--8524--4c0e--a6b9--4d08ce78a131 <dm-2> (252:2)
 └─ <sdb> (8:16)
ceph--7c2c0ebc--0e94--4e3d--be82--2852fe2c5468-osd--block--a17b2583--bd12--4368--bb2b--7b7498ea2be7 <dm-0> (252:0)
 └─ <sdh> (8:112)
ceph--88c1c487--404b--41bd--9d0e--35d76da424d7-osd--block--df4cca48--08a8--44b9--8ebd--091202228374 <dm-3> (252:3)
 └─ <sdd> (8:48)
ceph--de105c41--bf35--4c97--8183--ddb1595ba27e-osd--block--b9c81e38--e012--4d57--b590--a5ba7de0ac3d <dm-4> (252:4)
 └─ <sdc> (8:32)
vg0-lv0 <dm-9> (252:9)
 └─ds0 <dm-8> (252:8)
    └─ <sdi> (8:128)
vg1-lv0 <dm-11> (252:11)
 └─ds1 <dm-10> (252:10)
    └─ <sdj> (8:144)
vg2-lv0 <dm-13> (252:13)
 └─ds2 <dm-12> (252:12)
    └─ <sdk> (8:160)
vg99-vm--201--disk--0 <dm-7> (252:7)
 └─ <sdl1> (8:177)
vg99-vm--251--disk--0 <dm-6> (252:6)
 └─ <sdl1> (8:177)

Code:
~# lsscsi
[0:0:14:0]   disk    IBM-ESXS ST91000640SS     BD2E  /dev/sda
[0:0:15:0]   disk    IBM-ESXS ST91000640SS     BD2E  /dev/sdb
[0:0:16:0]   disk    IBM-ESXS ST91000640SS     BD2K  /dev/sdc
[0:0:17:0]   disk    IBM-ESXS ST91000640SS     BD2K  /dev/sdd
[0:0:18:0]   disk    LENOVO-X HUC101830CSS20   K2HA  /dev/sde
[0:0:19:0]   disk    LENOVO-X HUC101830CSS20   K2HA  /dev/sdf
[0:0:20:0]   disk    IBM-207x ST600MM0006      B56J  /dev/sdg
[0:0:21:0]   disk    IBM-207x ST600MM0006      B56J  /dev/sdh
[1:0:0:0]    disk    IBM      1746      FAStT  1070  /dev/sdi
[1:0:0:3]    disk    IBM      1746      FAStT  1070  /dev/sdj
[1:0:0:4]    disk    IBM      1746      FAStT  1070  /dev/sdk
[1:0:0:99]   disk    IBM      1746      FAStT  1070  /dev/sdl
[3:0:0:0]    cd/dvd  Lenovo   SATA ODD 81Y3691 IB00  /dev/sr0

Code:
~# cat /etc/lvm/lvm.conf
devices {
  filter = [ "a|/dev/mapper/ds.*|", "a|/dev/vg.*|", "a|/dev/disk/by-id/scsi-35000c500567ae2bb|", "a|/dev/disk/by-id/scsi-35000c5005705cb9f|", "a|/dev/disk/by-id/scsi-35000c5005771890f|", "a|/dev/disk/by-id/scsi-35000c5005786874b|", "a|/dev/disk/by-id/scsi-35000c5006c271efb|", "a|/dev/disk/by-id/scsi-35000c5006c272f37|", "r|.*|" ]
  global_filter = [ "a|/dev/mapper/ds.*|", "a|/dev/vg.*|", "a|/dev/disk/by-id/scsi-35000c500567ae2bb|", "a|/dev/disk/by-id/scsi-35000c5005705cb9f|", "a|/dev/disk/by-id/scsi-35000c5005771890f|", "a|/dev/disk/by-id/scsi-35000c5005786874b|", "a|/dev/disk/by-id/scsi-35000c5006c271efb|", "a|/dev/disk/by-id/scsi-35000c5006c272f37|", "r|.*|" ]
  external_device_info_source = "udev"
  preferred_names = [ "^/dev/mapper/ds" ]
}
global {
  system_id_source = "none"
}

I'm not sure what exactly is creating device mappings "vg99-vm--251--disk--0" or "vg99-vm--201--disk--0", as these are lvs from within a disk that should not have been scanned.

/dev/disk/by-id/ symlinks related to sdl1:
Code:
~# ls -l /dev/disk/by-id/ | grep sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 lvm-pv-uuid-RqZgcj-P6L7-gve5-sT3x-gbGz-PqCh-PFoA7x -> ../../sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 scsi-360080e500037e5b400000b836588a081-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 scsi-SIBM_1746_FAStT_SV31319238-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 wwn-0x60080e500037e5b400000b836588a081-part1 -> ../../sdl1

Code:
~# pvs /dev/sdl1
Cannot use /dev/sdl1: device is rejected by filter config

LVM reports the device is not suitable for opening. Is it, or something else like udev, responsible for the symlink and mapping?

It concerns me that there is a symlink called "lvm-pv-uuid-...." pointing to the underlying multipath device. I thought the rules in /lib/udev/rules.d/ would ignore this device because of its involvement with multipath? Not sure how it determines that, but from reading 60-multipath.rules and 69-lvm.rules it looks like it attempts to filter devices based on their involvement with multipath when processing LVM devices. I couldn't find anything useful in dmesg about lvm, do I need to change logging settings?

Code:
~# dmesg | grep -i device-mapper
[    1.367188] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    1.367220] device-mapper: uevent: version 1.0.3
[    1.367275] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
[   11.563678] device-mapper: multipath round-robin: version 1.2.0 loaded
[   30.598427] device-mapper: table: 252:14: multipath: error getting device (-EBUSY)
[   30.608241] device-mapper: ioctl: error adding target to table

Code:
~# dmesg | grep -i lvm
[   28.258641] systemd[1]: Listening on lvm2-lvmpolld.socket - LVM2 poll daemon socket.
[   28.857560] systemd[1]: Starting lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...

Code:
~# journalctl -u multipathd
-- Boot 2713e91348134d278db08dc446b0ee75 --
Jan 20 10:59:38 h3 systemd[1]: Starting multipathd.service - Device-Mapper Multipath Device Controller...
Jan 20 10:59:43 h3 multipathd[965]: multipathd v0.9.4: start up
Jan 20 10:59:43 h3 multipathd[965]: reconfigure: setting up paths and maps
Jan 20 10:59:43 h3 multipathd[965]: ds0: reload [0 1160585216 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1]
Jan 20 10:59:43 h3 multipathd[965]: ds1: reload [0 1160585216 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:144 1]
Jan 20 10:59:43 h3 multipathd[965]: ds2: reload [0 1160585216 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:160 1]
Jan 20 10:59:43 h3 multipathd[965]: ds99: addmap [0 2297610240 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:176 1]
Jan 20 10:59:43 h3 multipathd[965]: libdevmapper: ioctl/libdm-iface.c(1980): device-mapper: reload ioctl on ds99 (252:14) failed: Device or resource busy
Jan 20 10:59:43 h3 multipathd[965]: dm_addmap: libdm task=0 error: Success
Jan 20 10:59:43 h3 multipathd[965]: ds99: ignoring map
Jan 20 10:59:39 h3 systemd[1]: Started multipathd.service - Device-Mapper Multipath Device Controller.

Code:
~# ls /dev/vg99
vm-201-disk-0  vm-251-disk-0
The /dev/vg99 folder is populated by the LVs expected to be found within it. However, LVM does not list the PV, VG or LVs.

So with all of this in mind, if I run the following:
Code:
~# dmsetup remove vg99-vm--201--disk--0
~# dmsetup remove vg99-vm--251--disk--0
~# systemctl restart multipathd
...everything starts working, the VG shows as accessible within the GUI and VMs are able to start. I'm not sure what is detecting one underlying disk as a PV but my LVM filter tells me it's excluded, when it's acting like it's not.

This also started occurring after a mishandled cluster shutdown. Could that have anything to do with it? My other multipath devices are behaving fine on reboots. What should I check next? I could easily write a crontab @reboot directive or a service to run before pve-guests that hacks the problem away, but I'd prefer to understand what's gone wrong.

Some additional details:

Code:
~# cat /etc/multipath.conf
defaults {
        user_friendly_names yes
}

devices {
        device {
                vendor "IBM"
                product "^1746"
                product_blacklist "Universal Xport"
                path_grouping_policy "group_by_prio"
                path_selector "round-robin 0"
                failback "immediate"
                no_path_retry 5
        }
}

Code:
~# cat /etc/multipath/wwids
/360080e500037e5b400000b5565724309/
/360080e500037e5b400000b5865724316/
/360080e500037e5b400000b5b65724324/
/360080e500037e5b400000b836588a081/

Code:
~# cat /etc/multipath/bindings
ds0 360080e500037e5b400000b5565724309
ds1 360080e500037e5b400000b5865724316
ds2 360080e500037e5b400000b5b65724324
ds99 360080e500037e5b400000b836588a081

Cheers :)
~ Lia
 
Last edited:
I'd appreciate any suggestions for what to check next. I'm perplexed that my other multipath devices are not having the same issue. How do I find out what's creating this mapping and symlink for the lvm pv and make it ignore this device?
 
I think an update fixed it, I was getting ready to go through udev rules to try to find out what order things were getting handled in to try to find the cause. Sorry to hear it's happening to you, I'd suggest updating but I imagine you have already. I just checked `lsblk -f` and `multipath -l` output, both are correct. I'm sorry I don't have an answer for you
 
I think I got it fixed finally. It seemed to be a combination of 3 things.

1. I updated my lvm.conf filter to filter the LVs from my VMs from being picked up. These are the ones you removed via dmsetup as a workaround to get it working again.

2. I installed multipath-tools-boot package

3. I noticed update-initramfs was not updating the kernel I was actually using. I forced it to update the correct one after updating my lvm.conf.

After that things seem to be working well. I can reboot all 3 of my nodes and multipath is correctly working again. I no longer the dreaded message from multipath:

/libdevmapper reload ioctl failed device or resource busy
 
Glad to hear it works! Thanks for sharing your findings, will be helpful if I or someone else runs into this again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!