[SOLVED] Problems with Proxmox 5.4 with FC multipath and lvm backend

May 17, 2019
50
4
8
Hi,

yesterday our manufacturer of our SAN did an upgrade of the firmware which should be interrupt free according to him. Sadly this was not the case. I had a lot of VMs running on proxmox which had read only filesystems which we could mostly fix by doing an fsck on the filesystem. We are running fibre channel with multipath and lvm backend.

But looking closer we now have issues with our lvm/volume groups. For example this:
Code:
WARNING: PV MyCoVk-DYuY-vPWc-Tbbo-l3Ay-CNfH-tix0f3 on /dev/mapper/pm-cluster01-online was already found on /dev/mapper/pm-cluster01-voice.
WARNING: PV MyCoVk-DYuY-vPWc-Tbbo-l3Ay-CNfH-tix0f3 prefers device /dev/mapper/pm-cluster01-online because device size is correct.

I'm seeing this in pvs on all cluster nodes.

The webui of proxmox can't access all of them and only shows an question mark on them. Some VMs aren't even booting after shutting them down and i get "
Error: can't activate LV '/dev/vg-cluster01-online/vm-190-disk0': cannot process volume group vg-cluser01-online"

Also some VG are missing when running pvs
Code:
  /dev/mapper/pm-cluster01-online                                                    lvm2 ---    1,50t    1,50t

Is it possible to to recover this or am i f*cked?
Would an vgchange -ay help or destroy even more?

Update: vgchange -ay didn't change anything.

Thanks & Cheers,
Daniel
 
Last edited:
Code:
WARNING: PV MyCoVk-DYuY-vPWc-Tbbo-l3Ay-CNfH-tix0f3 on /dev/mapper/pm-cluster01-online was already found on /dev/mapper/pm-cluster01-voice.
the duplicate PV warnings are usually related to a wrong multipath config - check your multipath conf - and ask the SAN manufacturer for the correct settings

I hope this helps!
 
Hi,

thanks for the reply. We fixed this. Somehow during the upgrade one of the PVs received a wrong header/PVID so we had a duplicate. Luckily enough it was only the header of the PV. This problem is fixed now. Never seen this before and i hope we never see this again. Fixing PV headers with dd ...

Cheers,
Daniel
 
  • Like
Reactions: Stoiko Ivanov
Thanks for reporting back - this will help others who end up in a similarly weird situation!

please mark the thread as 'SOLVED'

Thanks!
 
We are still trying to understand what really happened because the /dev/mapper/pm-cluster01-online is 1.5T in size and the /dev/mapper/pm-cluster01-voice volume is 5T in size and how the wrong PV header got there. Something went wrong with multipath during the update of the storage. I have an open call with the vendor but they do not support Debian and do only have experience/support for RHEL/SLES/Oracle Linux.

By any chance do you know what version of RHEL Kernel / multipath is compatible with Proxmox 5.4 Kernel / multipath (and with Proxmox 6.1 since we plan to upgrade)? We are currently using the multipath.conf for RHEL 7.3 from https://support.purestorage.com/Solutions/Linux/Reference/Linux_Recommended_Settings. I'm reevaluating the udev rules shown there later today.
 
Could anyone acknowledge that the following multipath.conf is supported by Proxmox 5.4 (and 6.1?):

For only one Pure Storage:
Code:
devices {
    device {
        vendor "PURE"
               product "FlashArray"
               path_grouping_policy "multibus"
               path_selector "queue-length 0"
               path_checker "tur"
               features "0"
               hardware_handler "0"
               prio "const"
               failback immediate
               fast_io_fail_tmo 10
               dev_loss_tmo 60
               user_friendly_names no
    }
}

For Pure Storage with active sync:
Code:
#defaults {
#   polling_interval      10
#}
#devices {
#   device {
#       vendor                "PURE"
#       path_selector         "queue-length 0"
#       path_grouping_policy  group_by_prio
#       path_checker          tur
#       fast_io_fail_tmo      10
#       dev_loss_tmo          60
#       no_path_retry         0
#       hardware_handler      "1 alua"
#       prio                  alua
#       failback              immediate
#   }
#}

To be complete here are the udev rules (i verified that they work):
Code:
cat /lib/udev/rules.d/99-pure-storage.rules
# Recommended settings for Pure Storage FlashArray.

# Use noop scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="noop"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add|change", SUBSYSTEMS=="scsi", ATTRS{model}=="FlashArray      ", ATTR{timeout}="60"

# Set max secrots to 4096 kb (was already the default)
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/max_sectors_kb}="4096"

Edit: short update:
I unplugged an FC cable and plugged it back in but multipath did not recover:
Code:
pm-cluster01-demovolume (3624a9370b9f225dcede645970001142e) dm-10 PURE,FlashArray
size=200G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
`-+- policy='queue-length 0' prio=50 status=active
  |- 18:0:1:250 sdh  8:112  failed ready running
  |- 18:0:3:250 sds  65:32  failed ready running
  |- 19:0:3:250 sdao 66:128 active ready running
  `- 19:0:0:250 sdad 65:208 active ready running
This test failed.

Thanks,
Daniel
 
Last edited:
Just a heads up:

For Proxmox 5.4 the multipath.conf for RHEL 6.2+ from https://support.purestorage.com/Solutions/Linux/Reference/Linux_Recommended_Settings works

Single node Pure storage:
Code:
defaults {
   polling_interval      10
   find_multipaths       yes
}
devices {
   device {
       vendor                "PURE"
       path_selector         "queue-length 0"
       path_grouping_policy  group_by_prio
       path_checker          tur
       fast_io_fail_tmo      10
       dev_loss_tmo          60
       no_path_retry         0
       hardware_handler      "1 alua"
       prio                  alua
       failback              immediate
   }
}

Update: the above multipath.conf is also valid for Pure Storage active cluster combined with this udev rules do work for Proxmox 5.4
Code:
cat /lib/udev/rules.d/99-pure-storage.rules
# Recommended settings for Pure Storage FlashArray.

# Use noop scheduler for high-performance solid-state storage
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/scheduler}="noop"

# Reduce CPU overhead due to entropy collection
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/add_random}="0"

# Spread CPU load by redirecting completions to originating CPU
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/rq_affinity}="2"

# Set the HBA timeout to 60 seconds
ACTION=="add|change", SUBSYSTEMS=="scsi", ATTRS{model}=="FlashArray      ", ATTR{timeout}="60"

# Set max secrots to 4096 kb (was already the default)
ACTION=="add|change", KERNEL=="sd*[!0-9]", SUBSYSTEM=="block", ENV{ID_VENDOR}=="PURE", ATTR{queue/max_sectors_kb}="4096"

I'll add SOVLED to the title.

Cheers,
Daniel
 
Last edited:
Great that the config works in your environment!

Sadly multipath configurations are very much dependent on the company, model, firmware version a particular SAN has - hence it's difficult for us to reproduce them. I would always recommend to try new kernels and major distribution upgrades in a virtualized PVE first - and get the config there right, before going to a bare-metal node.

One thing to keep in mind for PVE 5-> 6 is that the version of multipath-tools changed and that in many cases some modification to the config is necessary - as pointed out by @LnxBil in the following post: https://forum.proxmox.com/threads/m...e-is-not-created-proxmox-6.57272/#post-264048

I hope this helps!
 
  • Like
Reactions: Stoiko Ivanov

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!