multipath: Could not failover the device

Yuri.JZ

New Member
Jan 24, 2019
5
0
1
38
Hello!

We are using SAN from Infortrend(DS S24F-R2851) and our servers connected to it within the FC HBA's from QLogic (ISP2532-based).
Server with Proxmox 5.2-2 connected to it and multipathd set up.
Recently this server remounted root filesystem as read-only and refused to remount it for read-write.

Any ideas how to fix it?

Logs and basic info about this server is attached below.
BTW Another similar server with the same version of Proxmox(and connected to the same SAN) works like a charm.

Code:
[Thu Jan 24 16:52:04 2019] device-mapper: multipath: Reinstating path 8:0.
[Thu Jan 24 16:52:04 2019] device-mapper: multipath: Could not failover the device: Handler scsi_dh_alua Error 16.
[Thu Jan 24 16:52:04 2019] device-mapper: multipath: Failing path 8:0.
[Thu Jan 24 16:52:04 2019] sd 8:0:0:0: alua: port group 02 state N non-preferred supports tolusNA
[Thu Jan 24 16:52:04 2019] sd 8:0:0:0: alua: port group 02 state N non-preferred supports tolusNA
[Thu Jan 24 16:52:06 2019] device-mapper: multipath: Reinstating path 8:0.
[Thu Jan 24 16:52:06 2019] device-mapper: multipath: Could not failover the device: Handler scsi_dh_alua Error 16.
[Thu Jan 24 16:52:06 2019] device-mapper: multipath: Failing path 8:0.
[Thu Jan 24 16:52:06 2019] sd 8:0:0:0: alua: port group 02 state N non-preferred supports tolusNA
[Thu Jan 24 16:52:06 2019] sd 8:0:0:0: alua: port group 02 state N non-preferred supports tolusNA

Code:
uname -a

Linux hypervisor8 4.15.17-1-pve #1 SMP PVE 4.15.17-9 (Wed, 9 May 2018 13:31:43 +0200) x86_64 GNU/Linux

Code:
root@hypervisor8:~# dpkg -l |grep -i  proxmox
ii  libpve-access-control                5.0-8                          amd64        Proxmox VE access control library
ii  libpve-apiclient-perl                2.0-4                          all          Proxmox VE API client library
ii  libpve-common-perl                   5.0-31                         all          Proxmox VE base library
ii  libpve-guest-common-perl             2.0-16                         all          Proxmox VE common guest-related modules
ii  libpve-http-server-perl              2.0-8                          all          Proxmox Asynchrounous HTTP Server Implementation
ii  libpve-storage-perl                  5.0-23                         all          Proxmox VE storage management library
ii  proxmox-ve                           5.2-2                          all          The Proxmox Virtual Environment
ii  proxmox-widget-toolkit               1.0-18                         all          ExtJS Helper Classes for Proxmox
ii  pve-cluster                          5.0-27                         amd64        Cluster Infrastructure for Proxmox Virtual Environment
ii  pve-container                        2.0-23                         all          Proxmox VE Container management tool
ii  pve-docs                             5.2-3                          all          Proxmox VE Documentation
ii  pve-firewall                         3.0-8                          amd64        Proxmox VE Firewall
ii  pve-ha-manager                       2.0-5                          amd64        Proxmox VE HA Manager
ii  pve-i18n                             1.0-5                          all          Internationalization support for Proxmox VE
ii  pve-kernel-4.15                      5.2-1                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-4.15.17-1-pve             4.15.17-9                      amd64        The Proxmox PVE Kernel Image
ii  pve-manager                          5.2-1                          amd64        Proxmox Virtual Environment Management Tools

Code:
root@hypervisor8:~# dpkg -l |egrep -i "lvm|multipath"
ii  liblvm2app2.2:amd64                  2.02.168-pve6                  amd64        LVM2 application library
ii  liblvm2cmd2.02:amd64                 2.02.168-pve6                  amd64        LVM2 command library
ii  lvm2                                 2.02.168-pve6                  amd64        Linux Logical Volume Manager
ii  multipath-tools                      0.6.4-5+deb9u1                 amd64        maintain multipath block device access
ii  multipath-tools-boot                 0.6.4-5+deb9u1                 all          Support booting from multipath devices

Code:
cat multipath.conf

blacklist {
        wwid .*
}

blacklist_exceptions {
        wwid "3600d02310001760b091fe78528088a12"
        wwid "3600d02310001760b23f070981bbbc68d"
}
multipaths {
  multipath {
        wwid "3600d02310001760b091fe78528088a12"
        alias mpath0
  }
  multipath {
        wwid "3600d02310001760b23f070981bbbc68d"
        alias mpath1
  }
}

defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        uid_attribute           ID_SERIAL
        rr_min_io               100
        failback                immediate
        no_path_retry           queue
        user_friendly_names     yes
}

Code:
cat /etc/multipath/bindings

mpath0 3600d02310001760b091fe78528088a12
mpath1 3600d02310001760b23f070981bbbc68d

Code:
cat
/etc/multipath/wwids

/3600d02310001760b091fe78528088a12/
/3600d02310001760b23f070981bbbc68d/
 
Hi Udo and LnxBil !

This is the output of non-working server:
(mpath0 has a problem)
Code:
root@hypervisor8:~# multipath -ll
mpath1 (3600d02310001760b23f070981bbbc68d) dm-4 IFT,DS S24F-R2851
size=100G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 8:0:0:1  sdd 8:48 active ready  running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 1:0:0:1  sdb 8:16 active ready  running
mpath0 (3600d02310001760b091fe78528088a12) dm-0 IFT,DS S24F-R2851
size=50G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 1:0:0:0  sda 8:0  failed ready  running
`-+- policy='round-robin 0' prio=10 status=active
  `- 8:0:0:0  sdc 8:32 active ready  running

And this one is from working one

Code:
root@hypervisor7:~# multipath -ll
mpath2 (3600d02310001760b092902600c90356d) dm-7 IFT,DS S24F-R2851
size=3.2T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:2 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:2  sdc 8:32  active ready running
mpath1 (3600d02310001760b1745e951548fda45) dm-4 IFT,DS S24F-R2851
size=400G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:1 sdg 8:96  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:1  sdb 8:16  active ready running
mpath0 (3600d02310001760b6d1578ad6a8df000) dm-0 IFT,DS S24F-R2851
size=100G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 4:0:0:0  sda 8:0   active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 11:0:0:0 sdf 8:80  active ready running
mpath4 (3600d02310001760b431b2af567564045) dm-28 IFT,DS S24F-R2851
size=5.0T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 4:0:0:4  sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 11:0:0:4 sdj 8:144 active ready running
mpath3 (3600d02310001760b2351932836fc46df) dm-8 IFT,DS S24F-R2851
size=40G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:3 sdi 8:128 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:3  sdd 8:48  active ready running
 
Hi Udo and LnxBil !

This is the output of non-working server:
(mpath0 has a problem)
Code:
root@hypervisor8:~# multipath -ll
mpath1 (3600d02310001760b23f070981bbbc68d) dm-4 IFT,DS S24F-R2851
size=100G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 8:0:0:1  sdd 8:48 active ready  running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 1:0:0:1  sdb 8:16 active ready  running
mpath0 (3600d02310001760b091fe78528088a12) dm-0 IFT,DS S24F-R2851
size=50G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 1:0:0:0  sda 8:0  failed ready  running
`-+- policy='round-robin 0' prio=10 status=active
  `- 8:0:0:0  sdc 8:32 active ready  running

And this one is from working one

Code:
root@hypervisor7:~# multipath -ll
mpath2 (3600d02310001760b092902600c90356d) dm-7 IFT,DS S24F-R2851
size=3.2T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:2 sdh 8:112 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:2  sdc 8:32  active ready running
mpath1 (3600d02310001760b1745e951548fda45) dm-4 IFT,DS S24F-R2851
size=400G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:1 sdg 8:96  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:1  sdb 8:16  active ready running
mpath0 (3600d02310001760b6d1578ad6a8df000) dm-0 IFT,DS S24F-R2851
size=100G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 4:0:0:0  sda 8:0   active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 11:0:0:0 sdf 8:80  active ready running
mpath4 (3600d02310001760b431b2af567564045) dm-28 IFT,DS S24F-R2851
size=5.0T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 4:0:0:4  sde 8:64  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 11:0:0:4 sdj 8:144 active ready running
mpath3 (3600d02310001760b2351932836fc46df) dm-8 IFT,DS S24F-R2851
size=40G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 11:0:0:3 sdi 8:128 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 4:0:0:3  sdd 8:48  active ready running
Hi,
you use different configs - perhaps this is the problem?
hw_handler' hwhandler='1 alua'
hw_handler' hwhandler='0'
Udo
 
I didn't specify hwhandler on neither of these servers.
And configs are mostly identical on both servers(by "mostly" I mean section "defaults" is definitely equal, and wwids/mpathX names are just different).

I've read manual on multipath.conf(5) about handler and it's not clear if it's possible or not to specify hwhandler='0' (or only hwhandler='1 xxx').

Any suggestions?
 
I've found mention out there about this line in config:
hardware_handler "0"
I'm going to apply it this evening to test.

But I realised I have no idea how to do it - my /etc/multipath.conf is read-only (just like the whole root file system)...
 
After reboot it works without ALUA (which was a problem mode for Infortrend SAN hardware).
Now hwhandler='0', but there's no guarantee that multipathd will not choose ALUA mode again on the next restart.
So I'm going to ask SAN vendor's support team how to configure it properly.
(One example I've found is to specify hardware_handler "0" in device section like it mentioned here
http://download.vikis.lt/doc/device-mapper-multipath-0.4.9/multipath.conf.defaults, however I don't know how to configure "device" section properly exactly for my SAN hardware, so it's better to ask)

Code:
root@hypervisor8:~# multipath -ll
mpath1 (3600d02310001760b23f070981bbbc68d) dm-4 IFT,DS S24F-R2851
size=100G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 8:0:0:1 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 0:0:0:1 sdb 8:16 active ready running
mpath0 (3600d02310001760b091fe78528088a12) dm-0 IFT,DS S24F-R2851
size=50G features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:0 sda 8:0  active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 8:0:0:0 sdc 8:32 active ready running
 
Last edited:
Hi. Here is my config. Storage - Dell SC5020. HBA's - QLogic FC Mezzanine 2B: QME2662. Seems to all work fine.

Rich (BB code):
/etc/multipath.conf
defaults {
    user_friendly_names no
    find_multipaths yes
    rr_min_io 10
    no_path_retry fail
}

blacklist {
}

devices {
device {
vendor "COMPELNT"
product "Compellent *"
path_checker tur
prio alua
path_selector "service-time 0"
path_grouping_policy group_by_prio
no_path_retry fail
hardware_handler "1 alua"
failback immediate
rr_weight priorities
}
}

multipath -ll
36000d310058282000000000000000049 dm-2 COMPELNT,Compellent Vol
size=7.8T features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 1:0:1:1 sdc 8:32 active ready running
|- 1:0:3:1 sdd 8:48 active ready running
|- 17:0:0:1 sde 8:64 active ready running
`- 17:0:3:1 sdf 8:80 active ready running

With default config has a similar issue.

36000d310058282000000000000000049 dm-2 COMPELNT,Compellent Vol
size=7.8T features='2 queue_if_no_path retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=active
|- 1:0:1:1 sdc 8:32 failed faulty running
|- 1:0:3:1 sdd 8:48 failed faulty running
|- 17:0:0:1 sde 8:64 active faulty running
`- 17:0:3:1 sdf 8:80 active faulty running

Mar 9 14:16:01 srv1 multipathd[1187]: 8:32: reinstated
Mar 9 14:16:01 srv1 multipathd[1187]: 36000d310058282000000000000000049: remaining active paths: 3
Mar 9 14:16:01 srv1 multipathd[1187]: 8:48: reinstated
Mar 9 14:16:01 srv1 multipathd[1187]: 36000d310058282000000000000000049: remaining active paths: 4
Mar 9 14:16:01 srv1 multipathd[1187]: sdc: mark as failed
Mar 9 14:16:01 srv1 multipathd[1187]: 36000d310058282000000000000000049: remaining active paths: 3
Mar 9 14:16:01 srv1 multipathd[1187]: sdd: mark as failed
Mar 9 14:16:01 srv1 multipathd[1187]: 36000d310058282000000000000000049: remaining active paths: 2
Mar 9 14:16:01 srv1 kernel: [11431.942436] device-mapper: multipath: Reinstating path 8:32.
Mar 9 14:16:01 srv1 kernel: [11431.942443] device-mapper: multipath: Could not failover the device: Handler scsi_dh_alua Error 16.
Mar 9 14:16:01 srv1 kernel: [11431.942686] device-mapper: multipath: Failing path 8:32.
Mar 9 14:16:01 srv1 kernel: [11431.942758] device-mapper: multipath: Reinstating path 8:48.
Mar 9 14:16:01 srv1 kernel: [11431.942763] device-mapper: multipath: Could not failover the device: Handler scsi_dh_alua Error 16.
Mar 9 14:16:01 srv1 kernel: [11431.942974] device-mapper: multipath: Failing path 8:48.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!