[SOLVED] SAN Multipath/LVM // systemd-udev-settle hangs

kokel

Member
Mar 9, 2021
34
5
13
37
Hi,

I'm struggling with multipath and lvm configuration. I know it is not directly related to PBS, but on this there runs PBS 1.0.11 (Enterprise subscription).

I want to use a SAN LUN (Multipath with 2 links) as a physical volume, to run LVM on that and use it as a Datastore for PBS.

Server: HPE ProLiant DL360 G9
FC-Adapter: HP StorageWorks 82Q PCI-E FC HBA Dual Port (Qlogic)
SAN: HP MSA 2040 with D2700 Expansion

At Boot the systemd unit systemd-udev-settle times out:
Bash:
● systemd-udev-settle.service - udev Wait for Complete Device Initialization
   Loaded: loaded (/lib/systemd/system/systemd-udev-settle.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2021-04-07 08:56:06 CEST; 12min ago
     Docs: man:udev(7)
           man:systemd-udevd.service(8)
 Main PID: 1413 (code=exited, status=1/FAILURE)

Apr 07 08:54:05 xxx systemd[1]: Starting udev Wait for Complete Device Initialization...
Apr 07 08:56:06 xxx systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
Apr 07 08:56:06 xxx systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
Apr 07 08:56:06 xxx systemd[1]: Failed to start udev Wait for Complete Device Initialization.

After that the server comes up, but the network does not. I have to restart the networking unit to get the networking up and running again.

Bash:
~# systemd-analyze blame
      2min 638ms systemd-udev-settle.service
          6.205s networking.service
           862ms postfix@-.service
           775ms systemd-udev-trigger.service
           648ms systemd-journald.service
           536ms systemd-remount-fs.service
           535ms dev-mqueue.mount

I can mask the systemd-udev-settle unit and everything seems running again, but this is just a workaround I think and no solution. So, maybe I have some multipath or lvm misconfiguration and I ask for help. I have read in other threads that a failed hard drive can lead to a timeout with the systemd-udev-settle unit, but my drives (internal) and other server components are healthy, I have double checked that.

I have created one partiton on /dev/mapper/mpath0 for PV use, so the PV should be /dev/mapper/mpath0-part1 (sdc1 or sde1).

SAN devices:
Code:
[1:0:0:0]    disk    HP       MSA 2040 SAN     G22x  /dev/sdc <- 3600c0ff00025a9232bc3506001000000
[1:0:0:1]    disk    HP       MSA 2040 SAN     G22x  /dev/sdd
[2:0:0:0]    disk    HP       MSA 2040 SAN     G22x  /dev/sde <- 3600c0ff00025a9232bc3506001000000
[2:0:0:1]    disk    HP       MSA 2040 SAN     G22x  /dev/sdf

Only the devices sdc + sde are relevant here. The second LUN isn't used, yet.

So, here is my multipath config, for configuration I followed this article: https://pve.proxmox.com/wiki/ISCSI_Multipath

/etc/multipath.conf
Code:
defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        uid_attribute           ID_SERIAL
        rr_min_io               100
        failback                immediate
        no_path_retry           queue
        user_friendly_names     yes
        find_multipaths    yes
}

blacklist {
        wwid .*
}

blacklist_exceptions {
    wwid "3600c0ff00025a9232bc3506001000000"
    wwid "3600c0ff00025ab622cc3506001000000"
}

multipaths {
  multipath {
    wwid "3600c0ff00025a9232bc3506001000000"
        alias mpath0
  }
  multipath {
    wwid "3600c0ff00025ab622cc3506001000000"
        alias mpath1
  }
}

Bash:
~# lvmconfig --typeconfig diff
devices {
    preferred_names=["^/dev/dm-*","^/dev/mapper/mpath*"]
    filter=["r|/dev/sd[cdef]|","a/.*/"]
}

Bash:
~# pvs
  PV         VG        Fmt  Attr PSize  PFree
  /dev/sdc1  vg_san_01 lvm2 a--  10.82t 5.82t
 
~# vgs
  VG        #PV #LV #SN Attr   VSize  VFree
  vg_san_01   1   1   0 wz--n- 10.82t 5.82t
 
~# lvs
  LV           VG        Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lv_backup_01 vg_san_01 -wi-ao---- 5.00t

Bash:
~# multipath -ll -v3
Apr 07 09:24:02 | set open fds limit to 1048576/1048576
Apr 07 09:24:02 | loading //lib/multipath/libchecktur.so checker
Apr 07 09:24:02 | checker tur: message table size = 3
Apr 07 09:24:02 | loading //lib/multipath/libprioconst.so prioritizer
Apr 07 09:24:02 | foreign library "nvme" loaded successfully
Apr 07 09:24:02 | sda: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sda: mask = 0x7
Apr 07 09:24:02 | sda: dev_t = 8:0
Apr 07 09:24:02 | sda: size = 781422768
Apr 07 09:24:02 | sda: vendor = HP
Apr 07 09:24:02 | sda: product = MO0400JFFCF
Apr 07 09:24:02 | sda: rev = HPD9
Apr 07 09:24:02 | sda: h:b:t:l = 0:0:1:0
Apr 07 09:24:02 | sda: tgt_node_name = 0x50000f0a06b4c922
Apr 07 09:24:02 | sda: path state = running
Apr 07 09:24:02 | sda: 48641 cyl, 255 heads, 63 sectors/track, start at 0
Apr 07 09:24:02 | sda: serial =       S2GYNA0HB08371
Apr 07 09:24:02 | sda: get_state
Apr 07 09:24:02 | sda: detect_checker = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sda: path_checker = tur (setting: multipath internal)
Apr 07 09:24:02 | sda: checker timeout = 30 s (setting: kernel sysfs)
Apr 07 09:24:02 | sda: tur state = up
Apr 07 09:24:02 | sdb: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sdb: mask = 0x7
Apr 07 09:24:02 | sdb: dev_t = 8:16
Apr 07 09:24:02 | sdb: size = 781422768
Apr 07 09:24:02 | sdb: vendor = HP
Apr 07 09:24:02 | sdb: product = MO0400JFFCF
Apr 07 09:24:02 | sdb: rev = HPD9
Apr 07 09:24:02 | sdb: h:b:t:l = 0:0:2:0
Apr 07 09:24:02 | sdb: tgt_node_name = 0x50000f0a06b4c9e2
Apr 07 09:24:02 | sdb: path state = running
Apr 07 09:24:02 | sdb: 48641 cyl, 255 heads, 63 sectors/track, start at 0
Apr 07 09:24:02 | sdb: serial =       S2GYNA0HB08383
Apr 07 09:24:02 | sdb: get_state
Apr 07 09:24:02 | sdb: detect_checker = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sdb: path_checker = tur (setting: multipath internal)
Apr 07 09:24:02 | sdb: checker timeout = 30 s (setting: kernel sysfs)
Apr 07 09:24:02 | sdb: tur state = up
Apr 07 09:24:02 | sdc: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sdc: mask = 0x7
Apr 07 09:24:02 | sdc: dev_t = 8:32
Apr 07 09:24:02 | sdc: size = 23242186752
Apr 07 09:24:02 | sdc: vendor = HP
Apr 07 09:24:02 | sdc: product = MSA 2040 SAN
Apr 07 09:24:02 | sdc: rev = G22x
Apr 07 09:24:02 | sdc: h:b:t:l = 1:0:0:0
Apr 07 09:24:02 | SCSI target 1:0:0 -> FC rport 1:0-0
Apr 07 09:24:02 | sdc: tgt_node_name = 0x208000c0ff25b092
Apr 07 09:24:02 | sdc: path state = running
Apr 07 09:24:02 | sdc: 65535 cyl, 255 heads, 63 sectors/track, start at 0
Apr 07 09:24:02 | sdc: serial = 00c0ff25a92300002bc3506001000000
Apr 07 09:24:02 | sdc: get_state
Apr 07 09:24:02 | sdc: detect_checker = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sdc: path_checker = tur (setting: storage device autodetected)
Apr 07 09:24:02 | sdc: checker timeout = 30 s (setting: kernel sysfs)
Apr 07 09:24:02 | sdc: tur state = up
Apr 07 09:24:02 | sdd: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sdd: mask = 0x7
Apr 07 09:24:02 | sdd: dev_t = 8:48
Apr 07 09:24:02 | sdd: size = 19509764096
Apr 07 09:24:02 | sdd: vendor = HP
Apr 07 09:24:02 | sdd: product = MSA 2040 SAN
Apr 07 09:24:02 | sdd: rev = G22x
Apr 07 09:24:02 | sdd: h:b:t:l = 1:0:0:1
Apr 07 09:24:02 | SCSI target 1:0:0 -> FC rport 1:0-0
Apr 07 09:24:02 | sdd: tgt_node_name = 0x208000c0ff25b092
Apr 07 09:24:02 | sdd: path state = running
Apr 07 09:24:02 | sdd: 65535 cyl, 255 heads, 63 sectors/track, start at 0
Apr 07 09:24:02 | sdd: serial = 00c0ff25ab6200002cc3506001000000
Apr 07 09:24:02 | sdd: get_state
Apr 07 09:24:02 | sdd: detect_checker = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sdd: path_checker = tur (setting: storage device autodetected)
Apr 07 09:24:02 | sdd: checker timeout = 30 s (setting: kernel sysfs)
Apr 07 09:24:02 | sdd: tur state = up
Apr 07 09:24:02 | sde: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sde: mask = 0x7
Apr 07 09:24:02 | sde: dev_t = 8:64
Apr 07 09:24:02 | sde: size = 23242186752
Apr 07 09:24:02 | sde: vendor = HP
Apr 07 09:24:02 | sde: product = MSA 2040 SAN
Apr 07 09:24:02 | sde: rev = G22x
Apr 07 09:24:02 | sde: h:b:t:l = 2:0:0:0
Apr 07 09:24:02 | SCSI target 2:0:0 -> FC rport 2:0-0
Apr 07 09:24:02 | sde: tgt_node_name = 0x208000c0ff25b092
Apr 07 09:24:02 | sde: path state = running
Apr 07 09:24:02 | sde: 65535 cyl, 255 heads, 63 sectors/track, start at 0
Apr 07 09:24:02 | sde: serial = 00c0ff25a92300002bc3506001000000
Apr 07 09:24:02 | sde: get_state
Apr 07 09:24:02 | sde: detect_checker = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sde: path_checker = tur (setting: storage device autodetected)
Apr 07 09:24:02 | sde: checker timeout = 30 s (setting: kernel sysfs)
Apr 07 09:24:02 | sde: tur state = up
Apr 07 09:24:02 | sdf: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sdf: mask = 0x7
Apr 07 09:24:02 | sdf: dev_t = 8:80
Apr 07 09:24:02 | sdf: size = 19509764096
Apr 07 09:24:02 | sdf: vendor = HP
Apr 07 09:24:02 | sdf: product = MSA 2040 SAN
Apr 07 09:24:02 | sdf: rev = G22x
Apr 07 09:24:02 | sdf: h:b:t:l = 2:0:0:1
Apr 07 09:24:02 | SCSI target 2:0:0 -> FC rport 2:0-0
Apr 07 09:24:02 | sdf: tgt_node_name = 0x208000c0ff25b092
Apr 07 09:24:02 | sdf: path state = running
Apr 07 09:24:02 | sdf: 65535 cyl, 255 heads, 63 sectors/track, start at 0
Apr 07 09:24:02 | sdf: serial = 00c0ff25ab6200002cc3506001000000
Apr 07 09:24:02 | sdf: get_state
Apr 07 09:24:02 | sdf: detect_checker = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sdf: path_checker = tur (setting: storage device autodetected)
Apr 07 09:24:02 | sdf: checker timeout = 30 s (setting: kernel sysfs)
Apr 07 09:24:02 | sdf: tur state = up
Apr 07 09:24:02 | sdg: blacklisted, udev property missing
Apr 07 09:24:02 | loop0: blacklisted, udev property missing
Apr 07 09:24:02 | loop1: blacklisted, udev property missing
Apr 07 09:24:02 | loop2: blacklisted, udev property missing
Apr 07 09:24:02 | loop3: blacklisted, udev property missing
Apr 07 09:24:02 | loop4: blacklisted, udev property missing
Apr 07 09:24:02 | loop5: blacklisted, udev property missing
Apr 07 09:24:02 | loop6: blacklisted, udev property missing
Apr 07 09:24:02 | loop7: blacklisted, udev property missing
Apr 07 09:24:02 | dm-0: blacklisted, udev property missing
Apr 07 09:24:02 | dm-1: blacklisted, udev property missing
===== paths list =====
uuid hcil    dev dev_t pri dm_st chk_st vend/prod/rev   dev_st
     0:0:1:0 sda 8:0   -1  undef undef  HP,MO0400JFFCF  unknown
     0:0:2:0 sdb 8:16  -1  undef undef  HP,MO0400JFFCF  unknown
     1:0:0:0 sdc 8:32  -1  undef undef  HP,MSA 2040 SAN unknown
     1:0:0:1 sdd 8:48  -1  undef undef  HP,MSA 2040 SAN unknown
     2:0:0:0 sde 8:64  -1  undef undef  HP,MSA 2040 SAN unknown
     2:0:0:1 sdf 8:80  -1  undef undef  HP,MSA 2040 SAN unknown
Apr 07 09:24:02 | libdevmapper version 1.02.155 (2018-12-18)
Apr 07 09:24:02 | DM multipath kernel driver v1.13.0
Apr 07 09:24:02 | params = 1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:48 1 round-robin 0 1 1 8:80 1
Apr 07 09:24:02 | status = 2 0 1 0 2 1 A 0 1 0 8:48 A 0 E 0 1 0 8:80 A 0
Apr 07 09:24:02 | mpath1: disassemble map [1 queue_if_no_path 1 alua 2 1 round-robin 0 1 1 8:48 1 round-robin 0 1 1 8:80 1 ]
Apr 07 09:24:02 | sdd: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sdd: mask = 0x8
Apr 07 09:24:02 | sdd: path state = running
Apr 07 09:24:02 | sdd: detect_prio = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | loading //lib/multipath/libprioalua.so prioritizer
Apr 07 09:24:02 | sdd: prio = alua (setting: storage device autodetected)
Apr 07 09:24:02 | sdd: prio args = "" (setting: storage device autodetected)
Apr 07 09:24:02 | sdd: reported target port group is 1
Apr 07 09:24:02 | sdd: aas = 80 [active/optimized] [preferred]
Apr 07 09:24:02 | sdd: alua prio = 50
Apr 07 09:24:02 | sdf: udev property ID_WWN whitelisted
Apr 07 09:24:02 | sdf: mask = 0x8
Apr 07 09:24:02 | sdf: path state = running
Apr 07 09:24:02 | sdf: detect_prio = yes (setting: multipath internal)
Apr 07 09:24:02 | failed to issue vpd inquiry for pgc9
Apr 07 09:24:02 | sdf: prio = alua (setting: storage device autodetected)
Apr 07 09:24:02 | sdf: prio args = "" (setting: storage device autodetected)
Apr 07 09:24:02 | sdf: reported target port group is 0
Apr 07 09:24:02 | sdf: aas = 01 [active/non-optimized]
Apr 07 09:24:02 | sdf: alua prio = 10
Apr 07 09:24:02 | mpath1: disassemble status [2 0 1 0 2 1 A 0 1 0 8:48 A 0 E 0 1 0 8:80 A 0 ]
mpath1 (3600c0ff00025ab622cc3506001000000) dm-1 HP,MSA 2040 SAN
size=9.1T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 1:0:0:1 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 2:0:0:1 sdf 8:80 active ready running
Apr 07 09:24:02 | alua prioritizer refcount 2
Apr 07 09:24:02 | alua prioritizer refcount 1
Apr 07 09:24:02 | unloading alua prioritizer
Apr 07 09:24:02 | unloading const prioritizer
Apr 07 09:24:02 | unloading tur checker

I'm wondering multipath doesn't show the mpath0 device anymore and pvs shows the device /dev/sdc1 and not /dev/mapper/mpath0-part1.

Any help appreciated. Thanks!
 

Attachments

  • dmesg.txt
    112 KB · Views: 5
Hello. I have exactly the same problem. I've been trying to solve it for 3 days already. I create a lvm volume, the system stops loading. Did you find a solution to the problem?
No, we didn't find any solution, yet. We are running the workaround.
Are you booting from SAN? Or what do mean with "system stops loading"?
 
Hello, I have the same problem. Do we know anything more?
I am using the latest version 6.4-4
 
Last edited:
This is an issue with an udev rule and FC multipath LUNs.

We have done the following to resolve it:

cp /lib/udev/rules.d/69-lvm-metad.rules /etc/udev/rules.d/69-lvm-metad.rules

and then remove or comment line 116 - 119 in /etc/udev/rules.d/69-lvm-metad.rules like this:

#ACTION!="remove", ENV{LVM_PV_GONE}=="1", RUN+="/bin/systemd-run /sbin/lvm pvscan --cache $major:$minor", GOTO="lvm_end"
#ENV{SYSTEMD_ALIAS}="/dev/block/$major:$minor"
#ENV{SYSTEMD_WANTS}+="lvm2-pvscan@$major:$minor.service"
#GOTO="lvm_end"


Then you do not need to mask systemd-udev-settle.service.
 
  • Like
Reactions: almku and kokel
Finally I tried Proxmox 7.1 where this solution is not enough. Googling further I came across the solution.
Setting children_max=8 within /etc/udev/udev.conf solved the problem.
 
Same problem here. Sometimes missing network, sometimes can't boot because missing root device (LVM).
cfg: 3PAR, 19 LUNs * 8 active FC paths

I had to:

* mask half of path on FC switch (it was essentials)
* systemctl mask systemd-udev-settle
* /etc/udev/udev.conf: children_max=32 (didn't boot with 2 children for example)
* GRUB_CMDLINE_LINUX_DEFAULT="rootdelay=20 rootwait" (bonus only)

Unfortunately i suppose the problem may reappear after adding more LUNs:-(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!