Proxmox Cluster and Nimble iSCSI issues

Christopheric1

New Member
Jun 3, 2025
4
0
1
Hello all,

Hoping to get some help with an issue I have been having with my Nimble and setting up multipathing. I have probably spent at least 10-12 hours just researching and trying alone and I am finally ready to get some help. I have followed a number of multipath tutorials and they have helped to a point, but I am stuck with the following issue:

My Nimble iSCSI Discovery address is: 10.32.0.28 which then points to 10.32.0.25 and 10.32.0.26. When I try to setup multipathing to discover on the .28 address, it sometimes duplicates or even adds 3 or 4 different iSCSI targets.

1749477343974.png

This picture is from the Nimble side. As you can see, it splits and picks which one it wants, but every time I try to do round robin, it never works correctly. If I do a multipath -ll, this is what I get:

Code:
size=10T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 14:0:0:0 sdi 8:128 active ready running
  |- 15:0:0:0 sdj 8:144 active ready running
  |- 16:0:0:0 sdk 8:160 active ready running
  |- 17:0:0:0 sdl 8:176 active ready running
  |- 18:0:0:0 sdm 8:192 active ready running
  |- 19:0:0:0 sdn 8:208 active ready running
  `- 20:0:0:0 sdo 8:224 active ready running

Here are the iSCSI sessions (redacted a little bit):

Code:
root@proxmox-3:/# iscsiadm -m session
tcp: [28] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [29] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [30] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [31] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [32] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [33] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [34] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [35] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [36] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [37] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [38] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [39] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [40] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [41] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-g773bb7315XXXXXXX (non-flash)
tcp: [42] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)
tcp: [43] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)
tcp: [44] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)
tcp: [45] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)
tcp: [46] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)
tcp: [47] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)
tcp: [48] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-v773bb7315aXXXXXX (non-flash)


I have tried clearing all of the connections out and re-adding it and it will just keep doing this. I would love to setup multipathing, but it is proving to be really difficult. I will take any help that anyone can give me. Thanks!
 
Last edited:
Hi @Christopheric1 , welcome to the forum.

Your submission is a bit confusing.

My Nimble iSCSI Discovery address is: 10.32.0.28 which then points to 10.32.0.25 and 10.30.0.26.
There is no further mention of 10.30.0 in your post.

You appear to have multiple/all IPs on the same subnet, i.e. 10.32.0 .
The best practices from almost any vendor advise to use different subnets for the paths. Have you checked with Nimble support on what their recommendation is?

Other than that, I would perform manual iSCSI discoveries and study the output returned by them. Then, if things are still not clear, I would submit technical details of your host (IPs, interfaces, connection diagrams) and your storage (same as host). Along with the results of your iSCSI discovery commands.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi @Christopheric1 , welcome to the forum.

Your submission is a bit confusing.


There is no further mention of 10.30.0 in your post.

You appear to have multiple/all IPs on the same subnet, i.e. 10.32.0 .
The best practices from almost any vendor advise to use different subnets for the paths. Have you checked with Nimble support on what their recommendation is?

Other than that, I would perform manual iSCSI discoveries and study the output returned by them. Then, if things are still not clear, I would submit technical details of your host (IPs, interfaces, connection diagrams) and your storage (same as host). Along with the results of your iSCSI discovery commands.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I adjusted the IP address. It was supposed to be 10.32. Sorry about that. I have spent too many hours, so I was hoping someone on here very specific with nimble could've helped since I have seen some posts. I will see if I can reach out to Nimble support too.
 
I adjusted the IP address. It was supposed to be 10.32. Sorry about that. I have spent too many hours, so I was hoping someone on here very specific with nimble could've helped since I have seen some posts. I will see if I can reach out to Nimble support too.
You may want to review this article, and in particular the section linked: https://kb.blockbridge.com/technote...nderstand-multipath-reporting-for-your-device

Running those commands will help you understand what your storage reports


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: carles89
according to https://support.hpe.com/hpesc/publi...6-45BB-9F58-B89C0CEEE262.html&docLocale=en_US

the recommended method is group_by_prio. post your multipath.conf for further analysis.

You didnt ask, but best practices suggests keeping controllers host ports on separate subnets (the article @bbgeek17 shows this explicitly.) Depending on how many initiators you have, maybe each host port on each controller as well. You should have (initiator ports x controllers) paths, so 4, or up to 16 if ports are on the same subnet.
 
according to https://support.hpe.com/hpesc/publi...6-45BB-9F58-B89C0CEEE262.html&docLocale=en_US

the recommended method is group_by_prio. post your multipath.conf for further analysis.

You didnt ask, but best practices suggests keeping controllers host ports on separate subnets (the article @bbgeek17 shows this explicitly.) Depending on how many initiators you have, maybe each host port on each controller as well. You should have (initiator ports x controllers) paths, so 4, or up to 16 if ports are on the same subnet.
Thank you for this, however, Nimble does not require them to be on separate VLANs/Subnets. I got their documentation that says you just have to put in a few commands to adjust this:


Code:
All interfaces used for iSCSI traffic need to be added to the 99-sysctl.conf file located in /etc/sysctl.d/
a. ARP adjustments required for each interface:
net.ipv4.conf.ens192.arp_ignore=1
net.ipv4.conf.ens192.arp_announce=2
net.ipv4.conf.ens192.rp_filter=2

I have added this to all my hosts. I finally figured out that the NICs were not bound correctly. Once I got that working, multipath started working with a caveat...It all looks active and fine until I move something from 1 storage to this multipath. The path is currently /dev/mapper/mpatha. I can see multiple iscsi connections which looks great. I am able to move from my Synology to the Nimble storage, but after that, I can't do anything with it. The VM will stay on, but I can't move it back to the Synology or Migrate the VM across to a different node. The multipath links go to "faulty". I have used the recommended multipath.conf file that came directly from Nimble Support and added the Synology storage to it as well:


Code:
defaults {
    user_friendly_names yes
    find_multipaths yes
}

blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
    device {
        vendor ".*"
        product ".*"
    }
}

blacklist_exceptions {
    device {
        vendor "Nimble"
        product "Server"
    }
    device {
        vendor "SYNOLOGY"
        product "Storage"
    }
}

devices {
    device {
        vendor "Nimble"
        product "Server"
        path_grouping_policy group_by_prio
        prio "alua"
        hardware_handler "1 alua"
        path_selector "service-time 0"
        path_checker tur
        no_path_retry 30
        failback immediate
        fast_io_fail_tmo 5
        dev_loss_tmo infinity
        rr_min_io_rq 1
        rr_weight uniform
    }
    device {
        vendor "Synology"
        product "*"
        path_grouping_policy multibus
        prio "const"
        path_checker readsector0
        no_path_retry queue
    }
}


Here is an output of the multipath -ll without the synology stuff:


Code:
mpatha (xxxxxxxxxxxxxxxxxxxxxxxx) dm-16 Nimble,Server
size=10T features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=0 status=enabled
  |- 13:0:0:0 sdh 8:112 failed faulty running
  |- 14:0:0:0 sdi 8:128 failed faulty running
  |- 15:0:0:0 sdj 8:144 failed faulty running
  `- 16:0:0:0 sdk 8:160 failed faulty running

Here is an output of the iscsiadm -m session output (without everything):

Code:
root@proxmox-2:/dev/mapper# iscsiadm -m session
tcp: [30] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-xxxxxxxxxxx (non-flash)
tcp: [31] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-xxxxxxxxxxx (non-flash)
tcp: [32] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-xxxxxxxxxxx (non-flash)
tcp: [33] 10.32.0.28:3260,2460 iqn.2007-11.com.nimblestorage:proxmox-xxxxxxxxxxx (non-flash)
tcp: [34] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [35] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [36] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [37] 10.32.0.26:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [38] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [39] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [40] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)
tcp: [41] 10.32.0.25:3260,2461 iqn.2007-11.com.nimblestorage:nimble-g02-xxxxxxxxxxxxx (non-flash)

I will take any help I can get here and if you need any other things
 
What exactly are you obscuring in your report? Are all 12 entries using the same suffix? Three different ones? Or are they all unique?

Typically, I/O errors result in a path being marked as down or faulty, these events are usually logged in the system log. Have you checked there?

While it's not strictly required for each path to be on a separate VLAN, it’s strongly recommended, precisely to avoid dealing with unpredictable routing behavior.

Try this : bring down three out of the four paths (e.g., ip link set <iface> down) and verify whether the system continues to function normally with just a single path.

You can also run multipathd in the foreground with extra debugging enabled.

Most likely, this isn’t a multipath issue in itself, rather it is a network misconfiguration somewhere in the path stack.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
That's what I noticed too. While dealing with more than one interface in one subnet, I try to configure policy-based routing, but if one Interface fails, the whole connection failed. (That depends on the time to unconfigure the failed interface)
If there is no routing active, its a normal behavior, that the connection fails. With only one interface online, it should work.
 
What exactly are you obscuring in your report? Are all 12 entries using the same suffix? Three different ones? Or are they all unique?

Typically, I/O errors result in a path being marked as down or faulty, these events are usually logged in the system log. Have you checked there?

While it's not strictly required for each path to be on a separate VLAN, it’s strongly recommended, precisely to avoid dealing with unpredictable routing behavior.

Try this : bring down three out of the four paths (e.g., ip link set <iface> down) and verify whether the system continues to function normally with just a single path.

You can also run multipathd in the foreground with extra debugging enabled.

Most likely, this isn’t a multipath issue in itself, rather it is a network misconfiguration somewhere in the path stack.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I didn't know if there was any info in there I shouldn't put so that's why I redacted it. The .25 and .26 are the same iqn and .28 is it's own.

I have checked the system log and all it ever talks about is the AULA and how it is working. I know the VLAN can be a deal breaker, but this is exactly how it was setup with VMWare and worked just fine. I will try and bring a path and see how it works.
 
I didn't know if there was any info in there I shouldn't put so that's why I redacted it. The .25 and .26 are the same iqn and .28 is it's own.
The target names are not important. The only people who care, are those trying to comprehend the data you are supplying. Turns out you had a 4th variant not covered by my guess. In the future, if you feel the need to obscure the data - making same IQN xxx, and unique one yyy - would be somewhat clearer.

It is curious that your Discovery IP presents a target that is not present on your Portal IPs. Are you sure the Nimble is properly configured? I'd ping their support for a configuration review of iSCSI presented to a Debian Linux host.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox