Proxmox 4.2-11 multipath issue

coppola_f · Jun 8, 2016

hi everybody,

we've a 3 nodes cluster running on v.4.x using a shared storage base on HP MSA2040FC...

our issue have started after update of 1 node....
we've done upgrades to packages using the enterprise repository (on june 7th 2016!)

this upgrades has touched kernel and a lot of submodules....

actually when we run:

multipath -ll

we don't see any multipath configured, the daemon is running (some warnings at boot but daemon starts!)

we've keep one of the node without upgrades, here the same command reports all multipath configured volumes running fine!!

we're also unable to see any multipath related device (mpath*) on /dev/mapper dir...

this is the config report for the updated node:

proxmox-ve: 4.2-52 (running kernel: 4.2.6-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie

this is the config report from non updated node:

proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

sure you can see there some major differences....

digging deeper, we've found on the updated node, that a new package was added!

thin-provisioning-tools v.0.3.2-1

we're available to give other infos if requested!!

waiting your suggestions....

regards,
francesco

jandro · Jun 8, 2016

Are you checked if multipath.conf is ok?

coppola_f · Jun 8, 2016

Hi Jandro,

thanks for your attention!

well,

yes,
i've checked it,
i've not changed the file since restart.....

here is the multipath.conf dump:

blacklist {
wwid .*
}

blacklist_exceptions {
wwid "3600c0ff000271f5d02ff105701000000"
wwid "3600c0ff00027217d28ff105701000000"
}

multipaths {
multipath {
wwid "3600c0ff000271f5d02ff105701000000"
alias mpath0
}
multipath {
wwid "3600c0ff00027217d28ff105701000000"
alias mpath1
}
}

defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
rr_min_io 100
failback immediate
no_path_retry queue
}

i receive an error on multipath load:

Jun 07 11:03:13 piva-pve1 multipath-tools-boot[1326]: Discovering and coalescing multipaths...Jun 07 11:03:13 | error parsing config file
Jun 07 11:03:13 piva-pve1 systemd[1]: multipath-tools-boot.service: control process exited, code=exited status=1
Jun 07 11:03:13 piva-pve1 kernel: device-mapper: multipath: version 1.10.0 loaded

i've searched on the web,
this error is solved removing the 'blacklist' section at the very beginning of multipath.conf file...
(i've tested it, but without solution, so i've reverted the .conf file to his original state!)

but the result is still the same:
un the 'updated' machine when i try to run:

multipath -ll

or

look inside /dev/mapper folder...
i don't see any mpath* device....

on the other node, the one without updates applied, all is running fine!!

i suspect some incorrect setting inside the .conf file is preventing multipath to correctly load and then map the multipathed volumes

or

some conflict between kernel modules/drivers/libraries
as i wrote in previous post, i've found one module that was'nt present in config before updates (thin-provisioning-tools), i think this works on the same chain because thin provisioning is a storage related feature, so my suspects are oriented on this direction!
i'm unable to find any other difference (excluding kernel versions and related libraries such as drivers and kernel objects with .ko extension)

waiting any other suggestion....

regards,

francesco

jandro · Jun 8, 2016

Try setting exactly the "multipath.conf" as follows:

Code:

defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        failback                immediate
        no_path_retry           queue
        rr_min_io               100
}
multipaths {
  multipath {
        wwid "wwid"
        alias "disk name"
  }
}

Remember restart multipathd daemon after apply the changes.

coppola_f · Jun 8, 2016

Jandro,

done exactly as you requested (see below!):

defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
failback immediate
no_path_retry queue
rr_min_io 100
}
multipaths {
multipath {
wwid "3600c0ff000271f5d02ff105701000000"
alias mpath0
}
multipath {
wwid "3600c0ff00027217d28ff105701000000"
alias mpath1
}
}

now, only 'default' and 'multipaths' sections are present....

then i've run

service multipath-tools restart

here is a dump of syslog immediately after command execution:

Jun 08 16:27:32 piva-pve1 multipath-tools[9020]: Stopping multipath daemon: multipathd.
Jun 08 16:27:32 piva-pve1 multipathd[2061]: --------shut down-------
Jun 08 16:27:32 piva-pve1 multipath-tools[9032]: Starting multipath daemon: multipathd.
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sda: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdb: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdc: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdd: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sde: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdf: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:32 piva-pve1 multipathd[9037]: 3600508b1001030393632423838300700: ignoring map
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:32 piva-pve1 multipathd[9037]: 3600508b1001030393632423838300800: ignoring map
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath0: ignoring map
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath1: ignoring map
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath0: ignoring map
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath1: ignoring map
Jun 08 16:27:33 piva-pve1 multipathd[9037]: path checkers start up
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)

results:

multipath -ll

returns nothing!!

ls /dev/mapper

no mpath* devs are listed....

other suggestions?!?!

many thanks again,

francesco

jandro · Jun 8, 2016

In multipath.conf try change:

Code:

getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"

by

Code:

uid_attribute          "ID_SERIAL"

coppola_f · Jun 8, 2016

Jandro,

replaced the multipath.conf row as you specified....

service multipath-tools restart

here a syslog dump:

Jun 08 16:54:54 piva-pve1 multipath-tools[902]: Stopping multipath daemon: multipathd.
Jun 08 16:54:54 piva-pve1 multipathd[9037]: --------shut down-------
Jun 08 16:54:54 piva-pve1 multipath-tools[914]: Starting multipath daemon: multipathd.
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: 3600508b1001030393632423838300700: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: 3600508b1001030393632423838300800: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: mpath0: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: mpath1: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: mpath0: ignoring map
Jun 08 16:54:55 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:55 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:55 piva-pve1 multipathd[921]: mpath1: ignoring map
Jun 08 16:54:55 piva-pve1 multipathd[921]: path checkers start up
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)

seems this modfications has dropped the deprecation warnings related to getuid tool....

again:

mutipath -ll

returns no output and ls /dev/mapper ot reporting mpath* devs!!

waiting......

francesco

jandro · Jun 9, 2016

Run:

multipath -v3

And paste the output.

coppola_f · Jun 9, 2016

Jandro,

i'm not onsite at present,
but our customer's IT manager will provide this info ASAP!

many thanks again

regards,
francesco

Matteo Costantini · Jun 9, 2016

Hi Jandro, Hi everybody! I'm Matteo, Francesco's colleague.
I tried to give the command "lsmod | grep dm_multipath" on both nodes
the output is different

on upgraded node (pve 4.2-11 ; 4.4.8-1-pve)
dm_multipath 24576 1 dm_round_robin

on not-upgraded node (pve 4.1.1; 4.2.6-1-pve)
dm_multipath 24576 3 dm_round_robin
scsi_dh 16384 1 dm_multipath

Then I checked if the file "scsi_dh" is present in the folder "/ lib / modules / 'uname -r' / kernel / drivers / scsi / device_handler"

on upgraded node (pve 4.2-11 ; 4.4.8-1-pve)
ls -la /lib/modules/4.4.8-1-pve/kernel/drivers/scsi/device_handler
-rw-r--r-- 1 root root 17104 May 31 07:18 scsi_dh_alua.ko
-rw-r--r-- 1 root root 15904 May 31 07:18 scsi_dh_emc.ko
-rw-r--r-- 1 root root 10152 May 31 07:18 scsi_dh_hp_sw.ko
-rw-r--r-- 1 root root 17504 May 31 07:18 scsi_dh_rdac.ko

on not-upgraded node (pve 4.1.1; 4.2.6-1-pve)
ls -la /lib/modules/4.2.6-1-pve/kernel/drivers/scsi/device_handler
-rw-r--r-- 1 root root 17504 Dec 9 2015 scsi_dh_alua.ko
-rw-r--r-- 1 root root 16744 Dec 9 2015 scsi_dh_emc.ko
-rw-r--r-- 1 root root 11448 Dec 9 2015 scsi_dh_hp_sw.ko
-rw-r--r-- 1 root root 19360 Dec 9 2015 scsi_dh.ko
-rw-r--r-- 1 root root 19904 Dec 9 2015 scsi_dh_rdac.ko

On upgraded node i try command modprobe scsi_dh .
The result?
modprobe: FATAL: Module scsi_dh not found.

Any suggestions?

Thanks, waiting ...

Matteo

Matteo Costantini · Jun 9, 2016

OPen two links to see the multipath -v3 output

http://www.matteo-costantini.it/notupgraded_node_multipath_v3.txt
http://www.matteo-costantini.it/upgraded_node_multipath_v3.txt

Thank you!

jandro · Jun 9, 2016

mmm I see in your config report that you have proxmox-ve 4.2-52, but your running kernel is 4.2.6-1-pve.

Try running your system using kernel 4.4.8-1-pve to see if the problem persists using a new kernel version.

Matteo Costantini · Jun 9, 2016

Hi Jandro,
the same problem for 4.4.8-1, here the system report

proxmox-ve: 4.2-52 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie

coppola_f · Jun 15, 2016

guys,

we're still in trouble here!!!
anyone is looking for a solution or is able to help us in focusing origin of this issue?!?

i'm looking around over the web without being able to find a rapid solution,

seems someone is having similar issues on other envis

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1429356.html

my suspects are always on low level kernel modules....

note: any other info about configs, check or test operation may be requested to me or Matteo (he's always on site, near physical servers!)
we actually have a 3 node cluster, node1 and node3 have been updated and area actually presenting the same issue!
node2 was untouched and actually working fine...

waiting...

regards,
francesco

adamb · Jun 15, 2016

Having a system which you can test updates is key imo. We don't apply updates until they are very well tested inhouse for atleast a few weeks and its for reasons like this.

I would start by confirming you actually have 2 disks being presented? What is the output of fdisk -l on the hosts?

You could also try increasing the verbosity of multipath output with "-v 3"

coppola_f · Jun 16, 2016

Adam,

many thanks for your interest in our issue.....
i'm actually off-site....
i think Matteo (he's the IT manager at customer's site) willl answer you soon sending these two infos (from node1 that was already updated, and from node2 not yet updated!)

i confirm that:

we're presenting to the cluster 2 volumes.. (as you can see in multipath.conf)
on the working node, they're mapped as mpath0 and mpath1
on the updated nodes (node1 and node3) the mapper fails, and the volumes are presented as /dev/sd(c-d-e-f)

pls note,
if we run

multipath -d -v3
the multipath configuration, seems to work fine, it reports all paths and correct configuration...
but no actions are saved due to dry-run action....

waiting for Matteo's reply to this thread....

thank you again for your time!

regards,

francesco

adamb · Jun 16, 2016

What if you run "multipath -v2", then "multipath -ll"?

coppola_f · Jun 17, 2016

@adamb

as you requested:

i've uploaded some files, you can find here the execution results of requested commands

node1 --> updated --> multipath not working as expected
node2 --> not updated yet --> multipath working fine!

here i've omitted the dry-run multipath command over the well working node

many thanks again for your time and...
pls excuse me for the delay in reporting infos you've requested..
(i'm still off-site and Matteo was unable to answer you yesterday, he was stuck on some other heavy office issues!!)

regards,
francesco

jamacdon_hwy97.com · Jun 21, 2016

We were having some multipath issues as well - for us it turns out not all ports on the switch were set for jumbo packets.

coppola_f · Jun 22, 2016

@Jamacdon_hwy97.com,

our storage is directly connected using fiber patches,
we don't have any switch between servers and storage....

for us, all was running fine prior to kernel updates....

we're still waiting some other suggestions,
our issue is still present...

regards,
francesco

Proxmox 4.2-11 multipath issue

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

Active Member

Active Member

New Member

Active Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Attachments

New Member

Renowned Member