[SOLVED] Multipathing: Huawei OceanStor Dorado 8000 V6 with HyperMetro

Verulam

New Member
Jul 29, 2024
12
1
3
Hi there

I've searched through the forums and found some info but wanted to run this past people to see if my understanding is correct please?

I've got a 3-node PVE 8.3.5 cluster hooked up over FC to two Huawei OceanStor SANs in two separate sites. I've presented a 20TB LUN - configured for HyperMetro (same LUN mirrored/synced across the two SANs) - to the cluster. All three nodes can see the LUN fine, however the output of multipath -ll differs across the hosts.

Two of the hosts output this:

mpathb (366413ab100268d63bcb0d25b0000011f) dm-0 HUAWEI,XSG1
size=20T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:2:2 sdd 8:48 active ready running
| |- 1:0:0:2 sdh 8:112 active ready running
| |- 0:0:3:2 sdf 8:80 active ready running
| `- 1:0:2:2 sdk 8:160 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
|- 0:0:0:2 sda 8:0 active ready running
|- 1:0:1:2 sdi 8:128 active ready running
|- 0:0:1:2 sdb 8:16 active ready running
`- 1:0:3:2 sdl 8:176 active ready running

Which if I understand correctly shows the host has it's preferred (active) paths to the LUN on one SAN and the non-preferred (enabled) paths to the LUN on the other SAN.

However one of the hosts outputs this:

mpathb (366413ab100268d63bcb0d25b0000011f) dm-4 HUAWEI,XSG1
size=20T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
|- 0:0:1:2 sdc 8:32 active ready running
|- 2:0:0:2 sdg 8:96 active ready running
|- 0:0:2:2 sde 8:64 active ready running
`- 2:0:2:2 sdi 8:128 active ready running

Just the (active) paths, no (enabled) paths.

All three hosts share the same multipath.conf file and the LUN is presented by them all being in the same group on the SAN. I've triple checked all three host's config on the SAN and cannot see any difference between any of them.

Would this indicate that the third host cannot see the second paths as the others can? Something must be different about this host, I just can't seem to see what it is.

Any thoughts would be much appreciated.

Thanks.
 
Thanks for getting back to me.

The odd host can see the LUN (sdc, sdg, sde, sdi):

Bash:
sdc                      8:32   0    20T  0 disk
└─mpathb               252:4    0    20T  0 mpath
sdd                      8:48   0   200G  0 disk
└─mpatha               252:0    0   200G  0 mpath
  ├─mpatha-part1       252:1    0  1007K  0 part
  ├─mpatha-part2       252:2    0     1G  0 part  /boot/efi
  └─mpatha-part3       252:3    0   199G  0 part
    ├─pve-swap         252:5    0     8G  0 lvm   [SWAP]
    ├─pve-root         252:6    0  59.7G  0 lvm   /
    ├─pve-data_tmeta   252:7    0   1.2G  0 lvm
    │ └─pve-data-tpool 252:9    0 112.9G  0 lvm
    │   └─pve-data     252:10   0 112.9G  1 lvm
    └─pve-data_tdata   252:8    0 112.9G  0 lvm
      └─pve-data-tpool 252:9    0 112.9G  0 lvm
        └─pve-data     252:10   0 112.9G  1 lvm
sde                      8:64   0    20T  0 disk
└─mpathb               252:4    0    20T  0 mpath
sdf                      8:80   0   200G  0 disk
└─mpatha               252:0    0   200G  0 mpath
  ├─mpatha-part1       252:1    0  1007K  0 part
  ├─mpatha-part2       252:2    0     1G  0 part  /boot/efi
  └─mpatha-part3       252:3    0   199G  0 part
    ├─pve-swap         252:5    0     8G  0 lvm   [SWAP]
    ├─pve-root         252:6    0  59.7G  0 lvm   /
    ├─pve-data_tmeta   252:7    0   1.2G  0 lvm
    │ └─pve-data-tpool 252:9    0 112.9G  0 lvm
    │   └─pve-data     252:10   0 112.9G  1 lvm
    └─pve-data_tdata   252:8    0 112.9G  0 lvm
      └─pve-data-tpool 252:9    0 112.9G  0 lvm
        └─pve-data     252:10   0 112.9G  1 lvm
sdg                      8:96   0    20T  0 disk
└─mpathb               252:4    0    20T  0 mpath
sdh                      8:112  0   200G  0 disk
└─mpatha               252:0    0   200G  0 mpath
  ├─mpatha-part1       252:1    0  1007K  0 part
  ├─mpatha-part2       252:2    0     1G  0 part  /boot/efi
  └─mpatha-part3       252:3    0   199G  0 part
    ├─pve-swap         252:5    0     8G  0 lvm   [SWAP]
    ├─pve-root         252:6    0  59.7G  0 lvm   /
    ├─pve-data_tmeta   252:7    0   1.2G  0 lvm
    │ └─pve-data-tpool 252:9    0 112.9G  0 lvm
    │   └─pve-data     252:10   0 112.9G  1 lvm
    └─pve-data_tdata   252:8    0 112.9G  0 lvm
      └─pve-data-tpool 252:9    0 112.9G  0 lvm
        └─pve-data     252:10   0 112.9G  1 lvm
sdi                      8:128  0    20T  0 disk
└─mpathb               252:4    0    20T  0 mpath

Bash:
root@hdc-m37-pve-prod-01:~# lsscsi
[0:0:0:0]    disk    HUAWEI   XSG1             6000  -
[0:0:1:0]    disk    HUAWEI   XSG1             6000  -
[0:0:1:1]    disk    HUAWEI   XSG1             6000  /dev/sdb
[0:0:1:2]    disk    HUAWEI   XSG1             6000  /dev/sdc
[0:0:2:0]    disk    HUAWEI   XSG1             6000  -
[0:0:2:1]    disk    HUAWEI   XSG1             6000  /dev/sdd
[0:0:2:2]    disk    HUAWEI   XSG1             6000  /dev/sde
[0:0:3:0]    disk    HUAWEI   XSG1             6000  -
[1:0:0:0]    disk    HP iLO   Internal SD-CARD 2.10  /dev/sda
[2:0:0:0]    disk    HUAWEI   XSG1             6000  -
[2:0:0:1]    disk    HUAWEI   XSG1             6000  /dev/sdf
[2:0:0:2]    disk    HUAWEI   XSG1             6000  /dev/sdg
[2:0:1:0]    disk    HUAWEI   XSG1             6000  -
[2:0:1:1]    disk    HUAWEI   XSG1             6000  /dev/sdh
[2:0:1:2]    disk    HUAWEI   XSG1             6000  /dev/sdi
[2:0:2:0]    disk    HUAWEI   XSG1             6000  -
[2:0:3:0]    disk    HUAWEI   XSG1             6000  -

I've not tried to configure it yet, but the OS can see it?
 
Thank you.

This is the multipath.conf used by all hosts:

Code:
defaults {
        find_multipaths yes
        user_friendly_names yes
}

blacklist {
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]"
        devnode "^hd[a-z]"
        devnode "^cciss!c[0-9]d[0-9][p[0-9]]"
}

devices {
      device {
               vendor                  "HUAWEI"
               product                 "XSG1"
               path_grouping_policy    group_by_prio
               path_checker            tur
               prio                    alua
               path_selector           "round-robin 0"
               failback                immediate
               no_path_retry           15
}
}

Trouble host:

Bash:
root@hdc-m37-pve-prod-01:~# lsscsi -ss
[0:0:0:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:1:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:1:1]    disk    HUAWEI   XSG1             6000  /dev/sdb   200GiB
[0:0:1:2]    disk    HUAWEI   XSG1             6000  /dev/sdc   20.0TiB
[0:0:2:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:2:1]    disk    HUAWEI   XSG1             6000  /dev/sdd   200GiB
[0:0:2:2]    disk    HUAWEI   XSG1             6000  /dev/sde   20.0TiB
[0:0:3:0]    disk    HUAWEI   XSG1             6000  -               -
[1:0:0:0]    disk    HP iLO   Internal SD-CARD 2.10  /dev/sda   29.1GiB
[2:0:0:0]    disk    HUAWEI   XSG1             6000  -               -
[2:0:0:1]    disk    HUAWEI   XSG1             6000  /dev/sdf   200GiB
[2:0:0:2]    disk    HUAWEI   XSG1             6000  /dev/sdg   20.0TiB
[2:0:1:0]    disk    HUAWEI   XSG1             6000  -               -
[2:0:1:1]    disk    HUAWEI   XSG1             6000  /dev/sdh   200GiB
[2:0:1:2]    disk    HUAWEI   XSG1             6000  /dev/sdi   20.0TiB
[2:0:2:0]    disk    HUAWEI   XSG1             6000  -               -
[2:0:3:0]    disk    HUAWEI   XSG1             6000  -               -


Not seeing the other paths.


Happy Host:

Code:
root@hdc-p12-pve-prod-01:~# lsscsi -ss
[0:0:0:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:0:2]    disk    HUAWEI   XSG1             6000  /dev/sda   20.0TiB
[0:0:1:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:1:2]    disk    HUAWEI   XSG1             6000  /dev/sdb   20.0TiB
[0:0:2:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:2:1]    disk    HUAWEI   XSG1             6000  /dev/sdc   200GiB
[0:0:2:2]    disk    HUAWEI   XSG1             6000  /dev/sdd   20.0TiB
[0:0:3:0]    disk    HUAWEI   XSG1             6000  -               -
[0:0:3:1]    disk    HUAWEI   XSG1             6000  /dev/sde   200GiB
[0:0:3:2]    disk    HUAWEI   XSG1             6000  /dev/sdf   20.0TiB
[1:0:0:0]    disk    HUAWEI   XSG1             6000  -               -
[1:0:0:1]    disk    HUAWEI   XSG1             6000  /dev/sdg   200GiB
[1:0:0:2]    disk    HUAWEI   XSG1             6000  /dev/sdh   20.0TiB
[1:0:1:0]    disk    HUAWEI   XSG1             6000  -               -
[1:0:1:2]    disk    HUAWEI   XSG1             6000  /dev/sdi   20.0TiB
[1:0:2:0]    disk    HUAWEI   XSG1             6000  -               -
[1:0:2:1]    disk    HUAWEI   XSG1             6000  /dev/sdj   200GiB
[1:0:2:2]    disk    HUAWEI   XSG1             6000  /dev/sdk   20.0TiB
[1:0:3:0]    disk    HUAWEI   XSG1             6000  -               -
[1:0:3:2]    disk    HUAWEI   XSG1             6000  /dev/sdl   20.0TiB

I have noticed that the trouble host is running slightly older firmware on the FC HBA so will update them all to the level I know definitely works.

Have rescanned & restarted multipath as well as restarting the host. Fortunately these are all empty at the moment so can kick them about.

Then will take a look over the SAN config again - but it's all group based, so if it works for 2 of the 3 members I don't think it would be a SAN side issue. But maybe I've missed something.
 
Your output confirmed that while "good" hosts see 8 paths for 20TB LUN, the "bad" one only sees 4 paths.

Keep in mind that the "/dev/*" devices are handled by the Linux Kernel. The PVE Linux Kernel is Ubuntu derived and does not have any storage subsystem modifications that could affect FC connectivity.

As such, if your Kernel does not see the raw device - its not there. It could be a cable, FC card, switch port, etc. What it is not - a PVE issue.

Check the "dmesg" output, compare between "good" and "bad".
You always have an option of reaching out to your Storage/Switch vendor for help with basic device presentation to a Linux host.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Understood, thanks for the pointers. I think I've been approaching this from the wrong direction. I'll check the fabric connectivity, perhaps it's not zoned correctly.
 
Sorted.

Bash:
mpathb (366413ab100268d63bcb0d25b0000011f) dm-4 HUAWEI,XSG1
size=20T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:0:2 sdd 8:48  active ready running
| |- 3:0:1:2 sdh 8:112 active ready running
| |- 0:0:1:2 sdf 8:80  active ready running
| `- 3:0:2:2 sdj 8:144 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  |- 3:0:0:2 sdn 8:208 active ready running
  |- 0:0:2:2 sdl 8:176 active ready running
  |- 3:0:3:2 sdm 8:192 active ready running
  `- 0:0:3:2 sdk 8:160 active ready running

Yup, host on the "missing" SAN had wrong initiators assigned to it.

Definitely not a PVE thing!

Thanks.