multipath: reinstating/failing path

jerry

Renowned Member
Apr 7, 2015
28
2
68
Dear Proxmox'ers,
We've got a problem with multipath, because one of the paths is failed:
Code:
#root@pve:~# multipath -ll
360050763808102622f00000000000006 dm-3 IBM,2145
size=4.9T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=enabled
| `- 7:0:0:1  sdc 8:32 failed ready running
`-+- policy='service-time 0' prio=10 status=active
  `- 16:0:0:1 sde 8:64 active ready running
360050763808102622f00000000000005 dm-2 IBM,2145
size=4.9T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 16:0:0:0 sdd 8:48 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 7:0:0:0  sdb 8:16 active ready running
It's very strange because second LUN is working properly.

Log shows that path is reinstating every 5 sec:
Code:
Jun 21 14:51:23 pve kernel: [ 3167.394219] sd 16:0:0:1: alua: port group 01 state N non-preferred supports tolusna
Jun 21 14:51:23 pve kernel: [ 3167.394325] sd 16:0:0:1: alua: port group 01 state N non-preferred supports tolusna
Jun 21 14:51:28 pve kernel: [ 3172.350846] device-mapper: multipath: Reinstating path 8:32.
Jun 21 14:51:28 pve kernel: [ 3172.359335] device-mapper: multipath: Failing path 8:32.
Jun 21 14:51:28 pve kernel: [ 3172.390122] sd 16:0:0:1: alua: port group 01 state N non-preferred supports tolusna
Jun 21 14:51:28 pve kernel: [ 3172.390203] sd 16:0:0:1: alua: port group 01 state N non-preferred supports tolusna
Jun 21 14:51:33 pve kernel: [ 3177.351432] device-mapper: multipath: Reinstating path 8:32.
Jun 21 14:51:33 pve kernel: [ 3177.360429] device-mapper: multipath: Failing path 8:32.

Here is output from 'multipathd show paths' command:
Code:
hcil     dev dev_t pri dm_st  chk_st dev_st  next_check
1:0:0:0  sda 8:0   1   undef  ready  running orphan
7:0:0:0  sdb 8:16  10  active ready  running XXXXXXXXX. 19/20
7:0:0:1  sdc 8:32  50  failed faulty running XXXXXX.... 3/5
16:0:0:0 sdd 8:48  50  active ready  running XXXXXXX... 15/20
16:0:0:1 sde 8:64  10  active ready  running XXXXXXXX.. 16/20
which every 5 sec indicates that path is healthy:
Code:
hcil     dev dev_t pri dm_st  chk_st dev_st  next_check
1:0:0:0  sda 8:0   1   undef  ready  running orphan
7:0:0:0  sdb 8:16  10  active ready  running XXXXXXXXXX 20/20
7:0:0:1  sdc 8:32  50  active ready  running XXXXXXXXXX 5/5
16:0:0:0 sdd 8:48  50  active ready  running XXXXXXXX.. 16/20
16:0:0:1 sde 8:64  10  active ready  running XXXXXXXX.. 17/20

We experienced it on fresh install of Proxmox-VE 5.4-1 with kernel: 4.15.18-12-pve and multipath-tools package (0.6.4-5+deb9u1) installed with default settings - we don't use multipath.conf
HBAs are QLogic QLE2690 16Gb FC Single-port HBA

Any idea?
 
Last edited:
After another Proxmox installation I discovered the strange behavior of the multipath package. When I install this package and NOT reboot server I've got all paths active but hwhandler isn't alua - it has "0" value. When I reboot server one of the paths become faulty as above.