Proxmox 4.2-11 multipath issue

coppola_f

Renowned Member
Apr 2, 2012
64
8
73
Italy
hi everybody,

we've a 3 nodes cluster running on v.4.x using a shared storage base on HP MSA2040FC...

our issue have started after update of 1 node....
we've done upgrades to packages using the enterprise repository (on june 7th 2016!)

this upgrades has touched kernel and a lot of submodules....

actually when we run:

multipath -ll

we don't see any multipath configured, the daemon is running (some warnings at boot but daemon starts!)

we've keep one of the node without upgrades, here the same command reports all multipath configured volumes running fine!!

we're also unable to see any multipath related device (mpath*) on /dev/mapper dir...


this is the config report for the updated node:

proxmox-ve: 4.2-52 (running kernel: 4.2.6-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie


this is the config report from non updated node:

proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)
pve-kernel-4.2.6-1-pve: 4.2.6-26
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

sure you can see there some major differences....

digging deeper, we've found on the updated node, that a new package was added!

thin-provisioning-tools v.0.3.2-1

we're available to give other infos if requested!!

waiting your suggestions....

regards,
francesco
 
Hi Jandro,

thanks for your attention!

well,

yes,
i've checked it,
i've not changed the file since restart.....

here is the multipath.conf dump:

blacklist {
wwid .*
}

blacklist_exceptions {
wwid "3600c0ff000271f5d02ff105701000000"
wwid "3600c0ff00027217d28ff105701000000"
}

multipaths {
multipath {
wwid "3600c0ff000271f5d02ff105701000000"
alias mpath0
}
multipath {
wwid "3600c0ff00027217d28ff105701000000"
alias mpath1
}
}

defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
rr_min_io 100
failback immediate
no_path_retry queue
}

i receive an error on multipath load:

Jun 07 11:03:13 piva-pve1 multipath-tools-boot[1326]: Discovering and coalescing multipaths...Jun 07 11:03:13 | error parsing config file
Jun 07 11:03:13 piva-pve1 systemd[1]: multipath-tools-boot.service: control process exited, code=exited status=1
Jun 07 11:03:13 piva-pve1 kernel: device-mapper: multipath: version 1.10.0 loaded

i've searched on the web,
this error is solved removing the 'blacklist' section at the very beginning of multipath.conf file...
(i've tested it, but without solution, so i've reverted the .conf file to his original state!)

but the result is still the same:
un the 'updated' machine when i try to run:

multipath -ll

or

look inside /dev/mapper folder...
i don't see any mpath* device....

on the other node, the one without updates applied, all is running fine!!

i suspect some incorrect setting inside the .conf file is preventing multipath to correctly load and then map the multipathed volumes

or

some conflict between kernel modules/drivers/libraries
as i wrote in previous post, i've found one module that was'nt present in config before updates (thin-provisioning-tools), i think this works on the same chain because thin provisioning is a storage related feature, so my suspects are oriented on this direction!
i'm unable to find any other difference (excluding kernel versions and related libraries such as drivers and kernel objects with .ko extension)

waiting any other suggestion....

regards,

francesco
 
Try setting exactly the "multipath.conf" as follows:

Code:
defaults {
        polling_interval        2
        path_selector           "round-robin 0"
        path_grouping_policy    multibus
        getuid_callout          "/lib/udev/scsi_id -g -u -d /dev/%n"
        failback                immediate
        no_path_retry           queue
        rr_min_io               100
}
multipaths {
  multipath {
        wwid "wwid"
        alias "disk name"
  }
}

Remember restart multipathd daemon after apply the changes.
 
Jandro,

done exactly as you requested (see below!):

defaults {
polling_interval 2
path_selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
failback immediate
no_path_retry queue
rr_min_io 100
}
multipaths {
multipath {
wwid "3600c0ff000271f5d02ff105701000000"
alias mpath0
}
multipath {
wwid "3600c0ff00027217d28ff105701000000"
alias mpath1
}
}

now, only 'default' and 'multipaths' sections are present....

then i've run

service multipath-tools restart

here is a dump of syslog immediately after command execution:

Jun 08 16:27:32 piva-pve1 multipath-tools[9020]: Stopping multipath daemon: multipathd.
Jun 08 16:27:32 piva-pve1 multipathd[2061]: --------shut down-------
Jun 08 16:27:32 piva-pve1 multipath-tools[9032]: Starting multipath daemon: multipathd.
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sda: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdb: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdc: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdd: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sde: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 multipathd[9037]: sdf: using deprecated getuid callout
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:32 piva-pve1 multipathd[9037]: 3600508b1001030393632423838300700: ignoring map
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:32 piva-pve1 multipathd[9037]: 3600508b1001030393632423838300800: ignoring map
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:32 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath0: ignoring map
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath1: ignoring map
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath0: ignoring map
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:27:33 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:27:33 piva-pve1 multipathd[9037]: mpath1: ignoring map
Jun 08 16:27:33 piva-pve1 multipathd[9037]: path checkers start up
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)
Jun 08 16:27:33 piva-pve1 multipathd[9037]: dm-21: remove map (uevent)

results:

multipath -ll

returns nothing!!

ls /dev/mapper

no mpath* devs are listed....

other suggestions?!?!

many thanks again,

francesco
 
Jandro,

replaced the multipath.conf row as you specified....

service multipath-tools restart

here a syslog dump:

Jun 08 16:54:54 piva-pve1 multipath-tools[902]: Stopping multipath daemon: multipathd.
Jun 08 16:54:54 piva-pve1 multipathd[9037]: --------shut down-------
Jun 08 16:54:54 piva-pve1 multipath-tools[914]: Starting multipath daemon: multipathd.
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: 3600508b1001030393632423838300700: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: 3600508b1001030393632423838300800: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: mpath0: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: mpath1: ignoring map
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:54 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:54 piva-pve1 multipathd[921]: mpath0: ignoring map
Jun 08 16:54:55 piva-pve1 kernel: device-mapper: table: 251:21: multipath: error getting device
Jun 08 16:54:55 piva-pve1 kernel: device-mapper: ioctl: error adding target to table
Jun 08 16:54:55 piva-pve1 multipathd[921]: mpath1: ignoring map
Jun 08 16:54:55 piva-pve1 multipathd[921]: path checkers start up
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)
Jun 08 16:54:55 piva-pve1 multipathd[921]: dm-21: remove map (uevent)

seems this modfications has dropped the deprecation warnings related to getuid tool....

again:

mutipath -ll

returns no output and ls /dev/mapper ot reporting mpath* devs!!

waiting......

francesco
 
  • Like
Reactions: Matteo Costantini
Jandro,

i'm not onsite at present,
but our customer's IT manager will provide this info ASAP!

many thanks again

regards,
francesco
 
Hi Jandro, Hi everybody! I'm Matteo, Francesco's colleague.
I tried to give the command "lsmod | grep dm_multipath" on both nodes
the output is different

on upgraded node (pve 4.2-11 ; 4.4.8-1-pve)
dm_multipath 24576 1 dm_round_robin

on not-upgraded node (pve 4.1.1; 4.2.6-1-pve)
dm_multipath 24576 3 dm_round_robin
scsi_dh 16384 1 dm_multipath

Then I checked if the file "scsi_dh" is present in the folder "/ lib / modules / 'uname -r' / kernel / drivers / scsi / device_handler"

on upgraded node (pve 4.2-11 ; 4.4.8-1-pve)
ls -la /lib/modules/4.4.8-1-pve/kernel/drivers/scsi/device_handler
-rw-r--r-- 1 root root 17104 May 31 07:18 scsi_dh_alua.ko
-rw-r--r-- 1 root root 15904 May 31 07:18 scsi_dh_emc.ko
-rw-r--r-- 1 root root 10152 May 31 07:18 scsi_dh_hp_sw.ko
-rw-r--r-- 1 root root 17504 May 31 07:18 scsi_dh_rdac.ko

on not-upgraded node (pve 4.1.1; 4.2.6-1-pve)
ls -la /lib/modules/4.2.6-1-pve/kernel/drivers/scsi/device_handler
-rw-r--r-- 1 root root 17504 Dec 9 2015 scsi_dh_alua.ko
-rw-r--r-- 1 root root 16744 Dec 9 2015 scsi_dh_emc.ko
-rw-r--r-- 1 root root 11448 Dec 9 2015 scsi_dh_hp_sw.ko
-rw-r--r-- 1 root root 19360 Dec 9 2015 scsi_dh.ko
-rw-r--r-- 1 root root 19904 Dec 9 2015 scsi_dh_rdac.ko

On upgraded node i try command modprobe scsi_dh .
The result?
modprobe: FATAL: Module scsi_dh not found.

Any suggestions?

Thanks, waiting ...

Matteo
 
mmm I see in your config report that you have proxmox-ve 4.2-52, but your running kernel is 4.2.6-1-pve.

Try running your system using kernel 4.4.8-1-pve to see if the problem persists using a new kernel version.
 
Hi Jandro,
the same problem for 4.4.8-1, here the system report

proxmox-ve: 4.2-52 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
 
guys,

we're still in trouble here!!!
anyone is looking for a solution or is able to help us in focusing origin of this issue?!?

i'm looking around over the web without being able to find a rapid solution,

seems someone is having similar issues on other envis

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1429356.html

my suspects are always on low level kernel modules....

note: any other info about configs, check or test operation may be requested to me or Matteo (he's always on site, near physical servers!)
we actually have a 3 node cluster, node1 and node3 have been updated and area actually presenting the same issue!
node2 was untouched and actually working fine...

waiting...


regards,
francesco
 
Having a system which you can test updates is key imo. We don't apply updates until they are very well tested inhouse for atleast a few weeks and its for reasons like this.

I would start by confirming you actually have 2 disks being presented? What is the output of fdisk -l on the hosts?

You could also try increasing the verbosity of multipath output with "-v 3"
 
Adam,

many thanks for your interest in our issue.....
i'm actually off-site....
i think Matteo (he's the IT manager at customer's site) willl answer you soon sending these two infos (from node1 that was already updated, and from node2 not yet updated!)

i confirm that:

we're presenting to the cluster 2 volumes.. (as you can see in multipath.conf)
on the working node, they're mapped as mpath0 and mpath1
on the updated nodes (node1 and node3) the mapper fails, and the volumes are presented as /dev/sd(c-d-e-f)

pls note,
if we run

multipath -d -v3
the multipath configuration, seems to work fine, it reports all paths and correct configuration...
but no actions are saved due to dry-run action....

waiting for Matteo's reply to this thread....

thank you again for your time!

regards,

francesco
 
@adamb

as you requested:

i've uploaded some files, you can find here the execution results of requested commands

node1 --> updated --> multipath not working as expected
node2 --> not updated yet --> multipath working fine!

here i've omitted the dry-run multipath command over the well working node

many thanks again for your time and...
pls excuse me for the delay in reporting infos you've requested..
(i'm still off-site and Matteo was unable to answer you yesterday, he was stuck on some other heavy office issues!!)

regards,
francesco
 

Attachments

  • node1-multipath-v3.txt
    22.3 KB · Views: 13
  • node1-multipath-v2-ll.txt
    874 bytes · Views: 4
  • node2-multipath-v3.txt
    12.8 KB · Views: 3
  • node2-multipath-v2-ll.txt
    939 bytes · Views: 6
Last edited:
@Jamacdon_hwy97.com,

our storage is directly connected using fiber patches,
we don't have any switch between servers and storage....

for us, all was running fine prior to kernel updates....

we're still waiting some other suggestions,
our issue is still present...

regards,
francesco
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!