Shared Multipath LVM (FC SAN) No Longer Connects on 1 of 3 Nodes

JeffNordy

New Member
Feb 8, 2024
5
1
3
I am POCing Proxmox to possibly replace VMware in our enterprise. I built a three-node cluster, and installed multipath tools to present my Pure Storage LUNs to the cluster. I followed the guide here to get it working properly. Everything was working fine, and I was testing VM migrations between the two shared volumes I created, and the three hosts.

I then decided to test scenarios like a host crashing while running a VM. While the VM was running, I simply hit "Reboot" from the Proxmox UI (which I now realize isn't a great test, since it sent a shutdown command to the VM). Either way, once the host came back up, now the shared LVM that the VM was running on has a question mark next to it in the UI. The VM cannot start because of the following error:
TASK ERROR: can't activate LV '/dev/shared_s001/vm-100-disk-0'

I can recover the VM by migrating it to another host, then powering it on there. However, the storage remains down on the original host.

If I SSH into the host, and run kpartx -a /dev/mapper/mpatha, it fails saying failed to stat() /dev/mapper/mpatha.

I noticed there is a lock file for the VM in /var/lock/qemu-server; however, I tried running qm unlock 100 and even eventually deleting the empty file lock-100.conf with no improvement.

I tried completely uninstalling multipath-tools and reinstalling, but the same situation exists.

I'm at a loss, and am hoping I'm overlooking something simple, otherwise I'm concluding that Proxmox may not be a good choice for Mutlipath FC SAN.
 
Keep in mind that Proxmox is a set of packages/appliance running on top of Debian Linux with Ubuntu Kernel (maintained by PVE developers).

You have a few layers involved here, each one needs to be checked for correct functionality.
a) FC connecting. This is handled by OS/Kernel. PVE does not introduce anything custom here. Things to check - are disks in place, are they accessible (lsblk, lsscsi, dd read, logs, etc)
b) Multipath - again standard Linux tooling. Is the service running? Does "multipath -ll" return expected results? Any errors in logs
c) LVM - standard Linux packaging. Can you list Physical Volumes, Volume Groups, Logical Volumes, are there errors during boot, etc
d) Finally we get to PVE layer where PVE arbitrates which host has access to a specific LVM slice by activating/deactivating it as needed. This is very different from ESX vmfs. Somewhat similar at 10000 feet view to VVols.

If any of the (a-c) layers are not working correctly, then the (d) layer will fail. The lock you found is at (d) layer, its not going to fix anything below it. The fact that kpartx fails indicates that at least (b) layer is not well. The physical devices/multipath will always be present on each host and expected to be health.

When properly setup, the stack is usually stable. You have a few options: engage a trained PVE partner to help run the POC and engage PVE as needed, purchase subscription so you can work directly with support, use Storage Company that supports Proxmox and will provide assistance.

You might get help in the forum from volunteers. You'd need to provide your entire configuration, logs, command outputs so that someone might spot a problem.

Good luck


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
When properly setup, the stack is usually stable.
Yes, using it for almost a decade with PVE, general linux multipath on > 100 hosts with RHEL-based systems for even longer.


In addition to the good answer from @bbgeek17:
I can recover the VM by migrating it to another host, then powering it on there. However, the storage remains down on the original host.
You may also want to have the VM as HA, so that it'll automatically started on another node if one node goes down. You don't need any manual intervention.
 
If I SSH into the host, and run kpartx -a /dev/mapper/mpatha, it fails saying failed to stat() /dev/mapper/mpatha.
This is very likely because you're using relative paths, and possibly because you have different multipath naming policies set on your nodes. set your multipath policy to use wwns instead (and do it on all three of your nodes.) check multipath -ll on all to make sure all volumes are seen and named the same.
 
I ended up figuring this out, and it seems the issue was with multipath. I suspect that Proxmox is very stable in certain circumstances, but am concerned about using it specifically with multipath (because as you mentioned, it adds another layer, and Proxmox doesn't seem to be involved in managing that layer).

Also, yes, I realize I can use VM HA, and I have configured that. It is a bit slow to restart the VM on another host, so resetting my Cisco UCS server wasn't a good test, since it came back online before HA decided to reboot the VM. I did test that by Powering Off the UCS server, and HA did eventually boot the VM on another host as expected. However, the host where it was running, still gets the issue where the multipath device is no longer accessible.

Here's what I did to troubleshoot and fix this:
  1. Ran dmsetup info -c on the affected host, and looked for any devices pertaining to my faulty volume.
  2. Removed the faulty device with dmsetup remove shared_s001-vm--100--disk--0.
  3. Reloaded the multipath devmap using multipath -r.
  4. Validated that I now see all my multipath devices and paths using multipath -ll.
So my conclusion is that a lock of some kind got placed on the multipath device since the VM was running there (on shared storage) when it crashed. I wish there was a multipath setting to clear that automatically, but it seems I will need to manually intervene and do some cleanup any time a host crashes.

Thank you all for your replies!
 
  • Like
Reactions: bbgeek17
This is very likely because you're using relative paths, and possibly because you have different multipath naming policies set on your nodes. set your multipath policy to use wwns instead (and do it on all three of your nodes.) check multipath -ll on all to make sure all volumes are seen and named the same.
I do indeed have the policy set to use WWIDs via /etc/multipath/wwids. When working properly, multipath -ll shows all 16 of my paths over two LUNs.

I also have some exclusions setup for the physical drives attached to the host. Here's my /etc/multipath.conf file for reference:
Bash:
defaults {
#    udev_dir                /dev
    polling_interval        2
    path_selector           "round-robin 0"
    path_grouping_policy    multibus
#    prio_callout            none
    path_checker            tur
    rr_min_io               100
    rr_weight               priorities
    failback                immediate
    find_multipaths         yes
    user_friendly_names yes
}


blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^(hd|xvd)[a-z]*"
    wwid    "3618e72837283d8002d4eeca313c66ebd"
}
 
If there is a better way for me to configure multipath and my LVM volumes, I'm very open to suggestions. Perhaps there are *.conf options that would handle this for me, and I'm just not aware.
 
My recommendation is to follow your Vendor documentation for best practices of configuring multipath on Linux, rather than random person's github write up.

https://support.purestorage.com/Solutions/Linux/Linux_Reference/Linux_Recommended_Settings


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
That's fair. Thanks for the link. I did update my settings to Pure's spec, but unfortunately that didn't affect the behavior mentioned above.

Also, when I originally setup multipath following standard documentation, I couldn't create the volumes in Proxmox, so I found that GitHub writeup which helped explain how to manually setup the volumes, since Proxmox can't.
 
Last edited:
user_friendly_names yes
When the user_friendly_names configuration option is set to ‘yes’, the name of the multipath device is set to mpathn. this can cause problems in some cases on a san, so should be set to "no" which will result in your device names reverting to the wwns.
If there is a better way for me to configure multipath and my LVM volumes
better than...? just make sure that the device your vg sits atop is present before your system attempts to activate volume groups, which normally happens by default.
 
When the user_friendly_names configuration option is set to ‘yes’, the name of the multipath device is set to mpathn. this can cause problems in some cases on a san, so should be set to "no" which will result in your device names reverting to the wwns.
Pure's best practices agree with you.


Blockbridge: Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox