multipath.conf setup makes volumes appear as duplicate, crashes NAS?

FuriousGeorge

Renowned Member
Sep 25, 2012
84
2
73
I tried to set up multipath today on a cluster of old servers that I've been using for testing.

I have a four bay nas with two raid-1 targets/luns. Currently there's one lun per target, and both interfaces are enabled for each target.

The multipathd.conf file I came up with is as follows:

Code:
defaults
{
        polling_interval                2
        path_selector                   "round-robin 0"
        path_grouping_policy            multibus
        uuid_attribute                  ID_SERIAL
        rr_min_io                       100
        failback                        immediate
        no_path_retry                   queue
        user_friendly_names             yes
}

blacklist
{
        wwid .*
}

blacklist_exceptions
{
        wwid "35000c500a33316d0"
        wwid "35000c500a33313e8"
}

I pulled the wwids using /lib/udev/scsi_id -g -u -d /dev.sdX. those are the two targets on my NAS. I copied this file to all the server and restarted the multipath and iscsi related services, and when that didn't work I restarted the servers.

The only thing I can say for sure is that this setup does not work. I get errors suggesting that LVs are showing up on multiple PVs and proxmox refuses to start them for that reason. I start getting block devices like sdm, and I don't have nearly that many disks in my system, so the disks may be getting duplicated.

Unfortunately, the entry in the PM GUI log is now blank, and I'm not able to find the entry in any of the server logs. Renaming the multipath.conf file to .bak and restarting the server solved that problem. If someone sees the problem with my config, please advise.

Also, and most confusingly, my NAS started crashing around the same time I made this change, and keeps crashing after I reverted it. Reverting the change did allow me to start my VMs, but the NAS kept going off line. Right now I shut down 3 of 5 nodes, and am running on two nodes and minimal vms with some success. I have 30 minutes of uptime, which is the most I've had all day.

I realize this is probably just a fantastic coincidence, but on the off chance that somehow my change in PM multipath setup is mrelated, I figured I'd mention it here. It seems that shutting down 3/5 nodes has helped my NAS stay online, but it could just be that the device is failing under load. There is nothing in the NAS' logs to indicate what is going on. The problem manifests as the NAS device simply becoming unreachable, until it is cold restarted. Nothing is connected to it, aside from PM VMs.

So, not having a known good NAS or a known good multipath config, I'm not sure what the problem might be.

This is my storage.cfg file:

Code:
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso,images,rootdir
        maxfiles 25
        shared 0

zfspool: local-zfs
        pool rpool/data
        content images,rootdir
        sparse 1

iscsi: iscsi_backend
        portal 10.5.0.20
       target iqn.2000-01.com.synology:ADSNAS.Target-1.72d5c4b37e
        content images

lvm: hapool
        vgname HA
        base iscsi_backend:0.0.0.scsi-3600140529fca684d358ad4823daeabd7
        content images,rootdir
        shared 1

zfspool: local-vmdata
        pool local-vmdata
        content rootdir,images
        sparse 0

iscsi: iscsi_backend_2
        portal 10.5.0.21
        target iqn.2000-01.com.synology:ADSNAS.Target-2.72d5c4b37e
        content images

lvm: hapool_2
        vgname HA_2
        base iscsi_backend_2:0.0.1.scsi-36001405e757b1a8d9d5ed47badaf55d3
        content images,rootdir
        shared 1

Any thoughts or advice are much appreciated.


UPDATE: I ssh'ed into the NAS to see if I could identify anything in the device's logs. One thing I notice is a lot of this in dmesg:

Code:
[ 4817.857939] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4817.867149] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.252:43228], T[][10.5.0.20:3260]
[ 4820.581714] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4820.590903] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.254:36212], T[][10.5.0.20:3260]
[ 4820.803964] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4820.813143] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.254:59680], T[][10.5.0.21:3260]
[ 4821.632146] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4821.641338] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.250:35730], T[][10.5.0.21:3260]
[ 4821.697788] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4821.706977] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.250:34676], T[][10.5.0.20:3260]
[ 4827.004090] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4827.013269] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.252:58856], T[][10.5.0.21:3260]
[ 4827.038026] iSCSI:iscsi_target_parameters.c:44:iscsi_login_rx_data rx_data returned 0, expecting 48.
[ 4827.047227] iSCSI_F:iscsi_target_login.c:1253:iscsi_target_login_sess_out iSCSI Login negotiation failed - I[][10.5.0.252:43232], T[][10.5.0.20:3260]

The odd thing about that is that all my running nodes are successfully connected to their targets.

Speaking of running nodes, I have 3 nodes on right now, and the NAS still hasn't crashed. One node has always been offline, so that means 3/4 available nodes are online now, albeit only running a couple of VMs. I plan on methodically starting VMs, and then finally starting the last node, and seeing if the NAS starts crashing again.
 
Last edited:
I hope this is resolved. If not, than please open up a new thread with a specific question and use this as a reference. This will make you post more visible and more likely to be replied to.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!