Hi all, I have three server machines, each connected to a SAS array. One server has two paths, two servers only have one, as my DS3524 only has four SAS ports in total. I'd like to expand its setup in the future to support two paths from each machine, but for the time being I'm setting up each server's configuration in pretty much the same way; the servers with one path will still create a multipath device, but it only relies on one path.
With this in mind, when I boot any of these machines, three out of four of my multipath mappings are successfully created on the underlying path devices. I am 100% reliably having issues with a device mappings with names relating to lvm LVs within the storage being created on the underlying path devices, without waiting for multipath to map these devices then reference the LVs from there.
The following is the output from one of the machines with just one path, as I'm using a temporary workaround on my primary to keep the storage accessible by services that require it. The behaviour seems nearly identical across all three machines.
I'm not sure what exactly is creating device mappings "vg99-vm--251--disk--0" or "vg99-vm--201--disk--0", as these are lvs from within a disk that should not have been scanned.
/dev/disk/by-id/ symlinks related to sdl1:
LVM reports the device is not suitable for opening. Is it, or something else like udev, responsible for the symlink and mapping?
It concerns me that there is a symlink called "lvm-pv-uuid-...." pointing to the underlying multipath device. I thought the rules in /lib/udev/rules.d/ would ignore this device because of its involvement with multipath? Not sure how it determines that, but from reading 60-multipath.rules and 69-lvm.rules it looks like it attempts to filter devices based on their involvement with multipath when processing LVM devices. I couldn't find anything useful in dmesg about lvm, do I need to change logging settings?
The /dev/vg99 folder is populated by the LVs expected to be found within it. However, LVM does not list the PV, VG or LVs.
So with all of this in mind, if I run the following:
...everything starts working, the VG shows as accessible within the GUI and VMs are able to start. I'm not sure what is detecting one underlying disk as a PV but my LVM filter tells me it's excluded, when it's acting like it's not.
This also started occurring after a mishandled cluster shutdown. Could that have anything to do with it? My other multipath devices are behaving fine on reboots. What should I check next? I could easily write a crontab @reboot directive or a service to run before pve-guests that hacks the problem away, but I'd prefer to understand what's gone wrong.
Some additional details:
Cheers
~ Lia
With this in mind, when I boot any of these machines, three out of four of my multipath mappings are successfully created on the underlying path devices. I am 100% reliably having issues with a device mappings with names relating to lvm LVs within the storage being created on the underlying path devices, without waiting for multipath to map these devices then reference the LVs from there.
The following is the output from one of the machines with just one path, as I'm using a temporary workaround on my primary to keep the storage accessible by services that require it. The behaviour seems nearly identical across all three machines.
Code:
~# dmsetup ls --tree -o blkdevname
ceph--2f6caf24--554d--480f--a8c3--bf78a5f5b59d-osd--block--2c1f05b4--200e--417f--9a7b--7cae620d2fbd <dm-5> (252:5)
└─ <sda> (8:0)
ceph--44d1ad0f--7d8f--4f20--993f--71673edef60c-osd--block--b12d69fc--44a6--4d77--a89d--2845d25af166 <dm-1> (252:1)
└─ <sdg> (8:96)
ceph--5f34a207--e507--4c33--ae7c--06049bd9e049-osd--block--4e9fdc99--8524--4c0e--a6b9--4d08ce78a131 <dm-2> (252:2)
└─ <sdb> (8:16)
ceph--7c2c0ebc--0e94--4e3d--be82--2852fe2c5468-osd--block--a17b2583--bd12--4368--bb2b--7b7498ea2be7 <dm-0> (252:0)
└─ <sdh> (8:112)
ceph--88c1c487--404b--41bd--9d0e--35d76da424d7-osd--block--df4cca48--08a8--44b9--8ebd--091202228374 <dm-3> (252:3)
└─ <sdd> (8:48)
ceph--de105c41--bf35--4c97--8183--ddb1595ba27e-osd--block--b9c81e38--e012--4d57--b590--a5ba7de0ac3d <dm-4> (252:4)
└─ <sdc> (8:32)
vg0-lv0 <dm-9> (252:9)
└─ds0 <dm-8> (252:8)
└─ <sdi> (8:128)
vg1-lv0 <dm-11> (252:11)
└─ds1 <dm-10> (252:10)
└─ <sdj> (8:144)
vg2-lv0 <dm-13> (252:13)
└─ds2 <dm-12> (252:12)
└─ <sdk> (8:160)
vg99-vm--201--disk--0 <dm-7> (252:7)
└─ <sdl1> (8:177)
vg99-vm--251--disk--0 <dm-6> (252:6)
└─ <sdl1> (8:177)
Code:
~# lsscsi
[0:0:14:0] disk IBM-ESXS ST91000640SS BD2E /dev/sda
[0:0:15:0] disk IBM-ESXS ST91000640SS BD2E /dev/sdb
[0:0:16:0] disk IBM-ESXS ST91000640SS BD2K /dev/sdc
[0:0:17:0] disk IBM-ESXS ST91000640SS BD2K /dev/sdd
[0:0:18:0] disk LENOVO-X HUC101830CSS20 K2HA /dev/sde
[0:0:19:0] disk LENOVO-X HUC101830CSS20 K2HA /dev/sdf
[0:0:20:0] disk IBM-207x ST600MM0006 B56J /dev/sdg
[0:0:21:0] disk IBM-207x ST600MM0006 B56J /dev/sdh
[1:0:0:0] disk IBM 1746 FAStT 1070 /dev/sdi
[1:0:0:3] disk IBM 1746 FAStT 1070 /dev/sdj
[1:0:0:4] disk IBM 1746 FAStT 1070 /dev/sdk
[1:0:0:99] disk IBM 1746 FAStT 1070 /dev/sdl
[3:0:0:0] cd/dvd Lenovo SATA ODD 81Y3691 IB00 /dev/sr0
Code:
~# cat /etc/lvm/lvm.conf
devices {
filter = [ "a|/dev/mapper/ds.*|", "a|/dev/vg.*|", "a|/dev/disk/by-id/scsi-35000c500567ae2bb|", "a|/dev/disk/by-id/scsi-35000c5005705cb9f|", "a|/dev/disk/by-id/scsi-35000c5005771890f|", "a|/dev/disk/by-id/scsi-35000c5005786874b|", "a|/dev/disk/by-id/scsi-35000c5006c271efb|", "a|/dev/disk/by-id/scsi-35000c5006c272f37|", "r|.*|" ]
global_filter = [ "a|/dev/mapper/ds.*|", "a|/dev/vg.*|", "a|/dev/disk/by-id/scsi-35000c500567ae2bb|", "a|/dev/disk/by-id/scsi-35000c5005705cb9f|", "a|/dev/disk/by-id/scsi-35000c5005771890f|", "a|/dev/disk/by-id/scsi-35000c5005786874b|", "a|/dev/disk/by-id/scsi-35000c5006c271efb|", "a|/dev/disk/by-id/scsi-35000c5006c272f37|", "r|.*|" ]
external_device_info_source = "udev"
preferred_names = [ "^/dev/mapper/ds" ]
}
global {
system_id_source = "none"
}
I'm not sure what exactly is creating device mappings "vg99-vm--251--disk--0" or "vg99-vm--201--disk--0", as these are lvs from within a disk that should not have been scanned.
/dev/disk/by-id/ symlinks related to sdl1:
Code:
~# ls -l /dev/disk/by-id/ | grep sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 lvm-pv-uuid-RqZgcj-P6L7-gve5-sT3x-gbGz-PqCh-PFoA7x -> ../../sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 scsi-360080e500037e5b400000b836588a081-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 scsi-SIBM_1746_FAStT_SV31319238-part1 -> ../../sdl1
lrwxrwxrwx 1 root root 10 Jan 20 10:59 wwn-0x60080e500037e5b400000b836588a081-part1 -> ../../sdl1
Code:
~# pvs /dev/sdl1
Cannot use /dev/sdl1: device is rejected by filter config
LVM reports the device is not suitable for opening. Is it, or something else like udev, responsible for the symlink and mapping?
It concerns me that there is a symlink called "lvm-pv-uuid-...." pointing to the underlying multipath device. I thought the rules in /lib/udev/rules.d/ would ignore this device because of its involvement with multipath? Not sure how it determines that, but from reading 60-multipath.rules and 69-lvm.rules it looks like it attempts to filter devices based on their involvement with multipath when processing LVM devices. I couldn't find anything useful in dmesg about lvm, do I need to change logging settings?
Code:
~# dmesg | grep -i device-mapper
[ 1.367188] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[ 1.367220] device-mapper: uevent: version 1.0.3
[ 1.367275] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
[ 11.563678] device-mapper: multipath round-robin: version 1.2.0 loaded
[ 30.598427] device-mapper: table: 252:14: multipath: error getting device (-EBUSY)
[ 30.608241] device-mapper: ioctl: error adding target to table
Code:
~# dmesg | grep -i lvm
[ 28.258641] systemd[1]: Listening on lvm2-lvmpolld.socket - LVM2 poll daemon socket.
[ 28.857560] systemd[1]: Starting lvm2-monitor.service - Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Code:
~# journalctl -u multipathd
-- Boot 2713e91348134d278db08dc446b0ee75 --
Jan 20 10:59:38 h3 systemd[1]: Starting multipathd.service - Device-Mapper Multipath Device Controller...
Jan 20 10:59:43 h3 multipathd[965]: multipathd v0.9.4: start up
Jan 20 10:59:43 h3 multipathd[965]: reconfigure: setting up paths and maps
Jan 20 10:59:43 h3 multipathd[965]: ds0: reload [0 1160585216 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:128 1]
Jan 20 10:59:43 h3 multipathd[965]: ds1: reload [0 1160585216 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:144 1]
Jan 20 10:59:43 h3 multipathd[965]: ds2: reload [0 1160585216 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:160 1]
Jan 20 10:59:43 h3 multipathd[965]: ds99: addmap [0 2297610240 multipath 3 pg_init_retries 50 queue_if_no_path 1 rdac 1 1 round-robin 0 1 1 8:176 1]
Jan 20 10:59:43 h3 multipathd[965]: libdevmapper: ioctl/libdm-iface.c(1980): device-mapper: reload ioctl on ds99 (252:14) failed: Device or resource busy
Jan 20 10:59:43 h3 multipathd[965]: dm_addmap: libdm task=0 error: Success
Jan 20 10:59:43 h3 multipathd[965]: ds99: ignoring map
Jan 20 10:59:39 h3 systemd[1]: Started multipathd.service - Device-Mapper Multipath Device Controller.
Code:
~# ls /dev/vg99
vm-201-disk-0 vm-251-disk-0
So with all of this in mind, if I run the following:
Code:
~# dmsetup remove vg99-vm--201--disk--0
~# dmsetup remove vg99-vm--251--disk--0
~# systemctl restart multipathd
This also started occurring after a mishandled cluster shutdown. Could that have anything to do with it? My other multipath devices are behaving fine on reboots. What should I check next? I could easily write a crontab @reboot directive or a service to run before pve-guests that hacks the problem away, but I'd prefer to understand what's gone wrong.
Some additional details:
Code:
~# cat /etc/multipath.conf
defaults {
user_friendly_names yes
}
devices {
device {
vendor "IBM"
product "^1746"
product_blacklist "Universal Xport"
path_grouping_policy "group_by_prio"
path_selector "round-robin 0"
failback "immediate"
no_path_retry 5
}
}
Code:
~# cat /etc/multipath/wwids
/360080e500037e5b400000b5565724309/
/360080e500037e5b400000b5865724316/
/360080e500037e5b400000b5b65724324/
/360080e500037e5b400000b836588a081/
Code:
~# cat /etc/multipath/bindings
ds0 360080e500037e5b400000b5565724309
ds1 360080e500037e5b400000b5865724316
ds2 360080e500037e5b400000b5b65724324
ds99 360080e500037e5b400000b836588a081
Cheers
~ Lia
Last edited: