Quick HOWTO on setting up iSCSI Multipath

Spoke with Pure, they ran an `arping` from the array. Looks like there's no flapping (after the initial ping) on the array side.
1739280077074.png
1739280099720.png

Going to see if support knows of if there is any logic that prevents it from flapping.

It looks like when arping either interface, it returns the MAC for the first interface
1739280532669.png
 
Last edited:
1739372640808.png
1739372653561.png

Second time is reports correctly, other than the first response again. Is the first one a broadcast?
 
View attachment 82251

Second time is reports correctly, other than the first response again. Is the first one a broadcast?
Hi, sorry for the delay. In principle, it would be good to avoid such ARP "inconsistencies" (e.g., getting replies with different MAC addresses). In my test setup, I have seen intermittent connection issues due to these -- while the issues are to some degree mitigated by multipath, it would still be better to avoid them altogether. This should be possible by some variant of source-based routing, or using VRFs [1], but I'm still trying to find the most maintainable option.

Regarding your screenshots from posts #21 and #22: Did you change anything in the network configuration between taking those two screenshots? I'm asking because, as you point out, they look different (in #22, "most" replies report the MAC address of the correct interface, whereas in #21 most replies report the MAC address of the first interface). Are they both from the same arping invocation?

[1] https://docs.kernel.org/networking/vrf.html
 
Regarding your screenshots from posts #21 and #22: Did you change anything in the network configuration between taking those two screenshots? I'm asking because, as you point out, they look different (in #22, "most" replies report the MAC address of the correct interface, whereas in #21 most replies report the MAC address of the first interface). Are they both from the same arping invocation?

Yup, this is from the same arping with no changes between them. On Pure arrays only tech support has access to these commands and no changes were made by them or myself, so this is just how it behaves normally.
 
The best way to remove any ambiguity is to isolate interfaces on their own VLANs/Subnets.
If that is not possible, one should carefully review vendor's best practices.

For example:
https://support.purestorage.com/bun...opics/concept/c_linux_host_configuration.html
Code:
Note: If multiple interfaces exist on the same subnet in RHEL, your iSCSI initiator may fail to connect to Pure Storage target.
In this case, you need to set sysctl's net.ipv4.conf.all.arp_ignore to 1 to force each interface to only answer ARP requests for its own
addresses. Please see RHEL KB for Issue Detail and Resolution Steps (requires Red Hat login).

The above recommendation is generally applicable to any Linux.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The best way to remove any ambiguity is to isolate interfaces on their own VLANs/Subnets.
If that is not possible, one should carefully review vendor's best practices.


The above recommendation is generally applicable to any Linux.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

How does ESXi get around this issue? It seems to be doing something similar, but I know they probably have a lot more magic behind the curtains to do so.

I will try setting the arp_ignore and report back with the results.



We can run them on different VLANs, but this is more to see if there's a technical limitation behind this or simply that it's easier to do, therefor best practice.
 
How does ESXi get around this issue?
They were/are running proprietary "Linux like" OS and Kernel. The networking is proprietary as well.

I will try setting the arp_ignore and report back with the results.
We do not run Pure Storage, so I cannot make any recommendations or guarantees regarding the applicability of this solution.

That said, most storage vendors mention the "single subnet" use case in their documentation as a special case.
I strongly recommend extensive testing in your environment. It may initially seem like everything is configured correctly, only to find "gremlins causing unexplained behavior" a few months from now.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I strongly recommend extensive testing in your environment. It may initially seem like everything is configured correctly, only to find "gremlins causing unexplained behavior" a few months from now.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Absolutely, we have a couple of months from migration - So we're spending a lot of time getting ahead of issues.
 
The best way to remove any ambiguity is to isolate interfaces on their own VLANs/Subnets.
If that is not possible, one should carefully review vendor's best practices.

For example:
https://support.purestorage.com/bun...opics/concept/c_linux_host_configuration.html
Code:
Note: If multiple interfaces exist on the same subnet in RHEL, your iSCSI initiator may fail to connect to Pure Storage target.
In this case, you need to set sysctl's net.ipv4.conf.all.arp_ignore to 1 to force each interface to only answer ARP requests for its own
addresses. Please see RHEL KB for Issue Detail and Resolution Steps (requires Red Hat login).

The above recommendation is generally applicable to any Linux.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thanks for the suggestion! I personally have not tested arp_ignore=1 yet, but it sounds like it may be a possibility.

An alternative (that allows to keep arp_ignore at its default) might be to define one VRF [1] for each path in /etc/network/interfaces and assign each iSCSI interface to its respective VRF, for example in my setup with two iSCSI interface ens19 and ens20:
Code:
auto ens19
iface ens19 inet static
    address 172.16.0.200/24
    vrf path1

auto ens20
iface ens20 inet static
    address 172.16.0.201/24
    vrf path2

auto path1
iface path1
    vrf-table auto

auto path2
iface path2
    vrf-table auto
... and then use Open-iSCSI's ifaces feature with the iSCSI interfaces ens19 and ens20 as described in the guide that was posted here.

However, please note: I've run a few tests in the setup with VRFs and so far it looks good, but these were not very realistic workloads. So @PwrBank if you are in a position to test setups, it would be interesting to hear your experience with the arp_ignore=1 and the VRF setup.

[1] https://docs.kernel.org/networking/vrf.html
 
  • Like
Reactions: Johannes S
Here is the baseline and the arp_ignore results, I will try to get the VRF and VRF+arp_ignore today

Pure got back with the baseline results
Code:
= CT1 (primary) =
 
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.60 | nl
Fri Feb 21 01:01:23 PM CST 2025
1 ARPING 10.10.254.60 from 10.10.254.52 eth18
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:69] 0.581ms
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.589ms
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.598ms
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.601ms
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.560ms
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.592ms
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.578ms
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.590ms
10 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.590ms
11 Sent 8 probes (1 broadcast(s))
12 Received 9 response(s)
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.61 | nl
Fri Feb 21 01:01:34 PM CST 2025
1 ARPING 10.10.254.61 from 10.10.254.52 eth18
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.589ms
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.605ms
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.574ms
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.563ms
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.587ms
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.622ms
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.584ms
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.574ms
10 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.566ms
11 Sent 8 probes (1 broadcast(s))
12 Received 9 response(s)
root@der-pure-ct1:~#
 
= CT0 (secondary) =
 
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.60 | nl
Fri Feb 21 01:02:34 PM CST 2025
1 ARPING 10.10.254.60 from 10.10.254.50 eth18
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:69] 0.578ms
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.584ms
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.546ms
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.573ms
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.572ms
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.552ms
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.548ms
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.559ms
10 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.556ms
11 Sent 8 probes (1 broadcast(s))
12 Received 9 response(s)
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.61 | nl
Fri Feb 21 01:02:53 PM CST 2025
1 ARPING 10.10.254.61 from 10.10.254.50 eth18
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.578ms
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.588ms
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.569ms
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.569ms
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.566ms
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.667ms
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.596ms
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.575ms
10 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.570ms
11 Sent 8 probes (1 broadcast(s))
12 Received 9 response(s)
root@der-pure-ct0:~#

Edit the sysctl config file

Bash:
micro /etc/sysctl.conf

Added
Code:
net.ipv4.conf.all.arp_ignore = 1

Added the changes to the systemctl config and applied using the following command
Code:
sysctl -p /etc/sysctl.conf

After making the changes:
Code:
= CT1 =
 
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.60 | nl
Fri Feb 21 02:27:53 PM CST 2025
1 ARPING 10.10.254.60 from 10.10.254.52 eth18
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.611ms
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.601ms
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.582ms
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.579ms
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.583ms
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.564ms
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.567ms
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.571ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.61 | nl
Fri Feb 21 02:28:15 PM CST 2025
1 ARPING 10.10.254.61 from 10.10.254.52 eth18
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.582ms
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.587ms
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.569ms
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.586ms
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.598ms
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.582ms
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.595ms
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.585ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct1:~#
 
= CT0 =
 
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.60 | nl
Fri Feb 21 02:28:46 PM CST 2025
1 ARPING 10.10.254.60 from 10.10.254.50 eth18
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.585ms
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.560ms
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.584ms
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.585ms
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.582ms
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.596ms
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.570ms
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.568ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.61 | nl
Fri Feb 21 02:29:04 PM CST 2025
1 ARPING 10.10.254.61 from 10.10.254.50 eth18
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.575ms
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.568ms
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.580ms
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.576ms
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.585ms
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.580ms
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.565ms
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.596ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct0:~#


EDIT:

So far haven't had any issues with VMs or networking with that setting enabled. Still maxing out the dual 25GbE connection too.
1740418899180.png
 
Last edited:
Okay, more results. I'm not sure I'm doing the VRF correctly, so I may need help getting this tested.

But here's my notes:

Created vrf

Bash:
ip link add vrf-blue type vrf table 10
ip link set dev vrf-blue up
ip route add table 10 unreachable default metric 4278198272
ip link set dev scsi0 master vrf-blue
ip link set dev scsi1 master vrf-blue
sysctl -w net.ipv4.tcp_l3mdev_accept=1
sysctl -w net.ipv4.udp_l3mdev_accept=1

Listed the vrf
Bash:
# list devices bound to the vrf
ip link show vrf vrf-blue

1741113483154.png

The interfaces turned off for some reason after enabling the vrf
Bash:
ip -br link
1741113554465.png

Re-enabled them using ip
Bash:
ip link set ens2f0np0 up
ip link set ens2f1np1 up
ip link set scsi0 up
ip link set scsi1 up
ip link set vrf-blue up

Able to ping from the vrf
1741113584294.png

Pure responded with:
Code:
= CT1 = 
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.60 | nl 
Mon Mar 3 12:50:22 PM CST 2025 
1 ARPING 10.10.254.60 from 10.10.254.52 eth18 
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.750ms 
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.744ms 
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.770ms 
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.596ms 
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.576ms 
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.612ms 
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.602ms 
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.585ms 
10 Sent 8 probes (1 broadcast(s)) 
11 Received 8 response(s) 
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.61 | nl 
Mon Mar 3 12:50:33 PM CST 2025 
1 ARPING 10.10.254.61 from 10.10.254.52 eth18 
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.591ms 
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.737ms 
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.590ms 
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.571ms 
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.588ms 
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.744ms 
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.582ms 
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.726ms 
10 Sent 8 probes (1 broadcast(s)) 
11 Received 8 response(s) 
root@der-pure-ct1:~# 
 
= CT0 = 
 
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.60 | nl 
Mon Mar 3 12:51:37 PM CST 2025 
1 ARPING 10.10.254.60 from 10.10.254.50 eth18 
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.592ms 
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.747ms 
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.737ms 
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.728ms 
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.569ms 
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.739ms 
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.565ms 
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.731ms 
10 Sent 8 probes (1 broadcast(s)) 
11 Received 8 response(s) 
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.61 | nl 
Mon Mar 3 12:51:47 PM CST 2025 
1 ARPING 10.10.254.61 from 10.10.254.50 eth18 
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.752ms 
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.733ms 
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.739ms 
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.729ms 
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.728ms 
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.732ms 
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.579ms 
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.730ms 
10 Sent 8 probes (1 broadcast(s)) 
11 Received 8 response(s) 
root@der-pure-ct0:~#

Now disabling arp_ignore and leaving the current vrf settings
Looks like the interfaces are back to responding with whatever. Probably due to rebooting the server and it losing the vrf.
Re-added the vrf as above

Looking at the route list, it looks like it should be good to go
1741113636322.png

Code:
= CT1 = 
 
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.60 | nl 
Tue Mar 4 10:09:18 AM CST 2025 
1 ARPING 10.10.254.60 from 10.10.254.52 eth18 
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:69] 0.668ms 
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.676ms 
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.583ms 
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.745ms 
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.580ms 
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.580ms 
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.604ms 
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.585ms 
10 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.582ms 
11 Sent 8 probes (1 broadcast(s)) 
12 Received 9 response(s) 
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.61 | nl 
Tue Mar 4 10:09:31 AM CST 2025 
1 ARPING 10.10.254.61 from 10.10.254.52 eth18 
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.583ms 
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.618ms 
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.725ms 
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.585ms 
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.749ms 
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.758ms 
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.752ms 
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.750ms 
10 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.590ms 
11 Sent 8 probes (1 broadcast(s)) 
12 Received 9 response(s) 
root@der-pure-ct1:~# 
 
= CT0 = 
 
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.60 | nl 
Tue Mar 4 10:10:15 AM CST 2025 
1 ARPING 10.10.254.60 from 10.10.254.50 eth18 
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:69] 0.743ms 
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.752ms 
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.737ms 
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.737ms 
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.741ms 
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.731ms 
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.572ms 
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.733ms 
10 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:62] 0.704ms 
11 Sent 8 probes (1 broadcast(s)) 
12 Received 9 response(s) 
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.61 | nl 
Tue Mar 4 10:10:28 AM CST 2025 
1 ARPING 10.10.254.61 from 10.10.254.50 eth18 
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:69] 0.731ms 
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.752ms 
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.578ms 
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.580ms 
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.728ms 
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.732ms 
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.578ms 
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.573ms 
10 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:62] 0.719ms 
11 Sent 8 probes (1 broadcast(s)) 
12 Received 9 response(s) 
root@der-pure-ct0:~#

I don't think I have the VRF setup correct
 
Okay, more results. I'm not sure I'm doing the VRF correctly, so I may need help getting this tested.
Thanks for trying this and the arp_ignore=1! Manually setting up the VRF like you did may be a bit tricky, it's probably easier to let ifupdown2 handle this. My previous answer [1] on this was a bit brief, so here are some more details:

If I understand correctly, your /etc/network/interfaces, after following the guide by the original poster, defines two IP addresses in the same subnet on two different physical interfaces. Similarly, I have these two stanzas for ens19/ens20 on my test system:
Code:
auto ens19
iface ens19 inet static
    address 172.16.0.200/24

auto ens20
iface ens20 inet static
    address 172.16.0.201/24

It should be enough to add two VRF stanzas (the names don't really matter too much) ...
Code:
auto path1
iface path1
    vrf-table auto

auto path2
iface path2
    vrf-table auto

... and modify the ens19/ens20 stanzas to attach to these VRF by adding two vrf options:
Code:
auto ens19
iface ens19 inet static
    address 172.16.0.200/24
    vrf path1

auto ens20
iface ens20 inet static
    address 172.16.0.201/24
    vrf path2

Then reload the network config with ifreload -a. Then, ens19/ens20 should be attached to their respective VRF, and the VRFs should have one route defined (to the subnet where the iSCSI portals are located), e.g.:
Code:
# ip link | egrep 'ens(19|20)'
3: ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master path1 state UP mode DEFAULT group default qlen 1000
4: ens20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master path2 state UP mode DEFAULT group default qlen 1000
# ip route show vrf path1
172.16.0.0/24 dev ens19 proto kernel scope link src 172.16.0.200
# ip route show vrf path2
172.16.0.0/24 dev ens20 proto kernel scope link src 172.16.0.201
I assume you have also set up Open-iSCSI to bind directly to the ens19/ens20 interfaces using iscsiadm -m iface, as described in the guide by the original poster.

On my test setup, this setup makes the host respond to ARP requests for 172.16.0.200/172.16.0.201 only on one interface, and with the correct MAC address -- no further tweaks are necessary (in particular, no need to set the tcp_l3mdev_accept/udp_l3mdev_accept sysctls).

[1] https://forum.proxmox.com/threads/quick-howto-on-setting-up-iscsi-multipath.157532/post-750326
 
Alrighty, got that setup. Followed what you posted above, here's the results:
⚠️ Note: The MAC address are different than the last few posts I made because I was using a bridge for multiple different things on those ports. This time they reflect the actual MAC addresses of the ports

Code:
= CT1 =

root@der-pure-ct1:~# date ; arping -c 8 10.10.254.60 | nl
Fri Mar 7 10:42:04 AM CST 2025
1 ARPING 10.10.254.60 from 10.10.254.52 eth18
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.774ms
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.583ms
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.581ms
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.724ms
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.742ms
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.592ms
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.740ms
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.584ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct1:~# date ; arping -c 8 10.10.254.61 | nl
Fri Mar 7 10:42:16 AM CST 2025
1 ARPING 10.10.254.61 from 10.10.254.52 eth18
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 9.547ms
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.732ms
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.730ms
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.580ms
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.744ms
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.743ms
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.753ms
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.735ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct1:~#

= CT0 =

root@der-pure-ct0:~# date ; arping -c 8 10.10.254.60 | nl
Fri Mar 7 10:43:03 AM CST 2025
1 ARPING 10.10.254.60 from 10.10.254.50 eth18
2 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.584ms
3 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.726ms
4 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.739ms
5 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.693ms
6 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.583ms
7 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.736ms
8 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.731ms
9 Unicast reply from 10.10.254.60 [BC:97:E1:78:47:60] 0.726ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct0:~# date ; arping -c 8 10.10.254.61 | nl
Fri Mar 7 10:43:19 AM CST 2025
1 ARPING 10.10.254.61 from 10.10.254.50 eth18
2 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.580ms
3 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.577ms
4 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.741ms
5 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.727ms
6 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.741ms
7 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.569ms
8 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.733ms
9 Unicast reply from 10.10.254.61 [BC:97:E1:78:47:61] 0.730ms
10 Sent 8 probes (1 broadcast(s))
11 Received 8 response(s)
root@der-pure-ct0:~#

So it looks like that does work as intended as well.
 
Alrighty, got that setup. Followed what you posted above, here's the results:
⚠️ Note: The MAC address are different than the last few posts I made because I was using a bridge for multiple different things on those ports. This time they reflect the actual MAC addresses of the ports

[...]

So it looks like that does work as intended as well.
Thanks for testing and reporting back! The results look good.

Is there any reason to do the VRF vs arp_ignore?
In my opinion, the VRF solution is much cleaner than the arp_ignore change: For one, the VRF configuration more visible (if configured in /etc/network/interfaces) whereas it's easier to forget about a changed sysctl config file change in /etc/sysctl.d. Also, regarding sysctls and especially network-related sysctls, my general recommendation would be to stay with the defaults if possible.

So I'd say, the general recommendation for iSCSI multipath would still be to have multiple disjoint subnets for the different paths. If this is not desired and assigning multiple IPs in the same subnet to the Proxmox VE nodes is necessary, the VRF solution looks cleaner than the arp_ignore change.
 
Last edited:
Hello. My apologies for the delay in responding to you. Reflecting on my configuration, I realized the wiki article didn't work for me due to how I chose to architecture the storage LAN. Coming to Proxmox with ~20 years of VMware vCenter experience, I opted to create the same storage architecture that I was comfortable with under VMware. That is, using a single storage VLAN for iSCSI traffic rather than 2 distinct storage networks. Had I used 2 different storage networks then the wiki article guidance would have identified both distinct interfaces rather than just 1. Since I have just 1 storage network, I needed to manually define the interfaces/IP & MAC addresses with iscsiadm in order for it properly build out MPIO then simply manually add the datastore/LUN into the cluster using pvesm. Using this architecture, I've not yet had a storage failure in any VMware clusters.

The architecture for my cluster has each Proxmox node with (2) 10-gbit links for storage (in the same VLAN). Each NIC connects to a different Cisco storage switch. The storage switches have trunks between them and trunks connecting them to the network core. Rapid per-VLAN spanning tree keeps the network loop free. If my storage network were built with SOHO network switches that don't offer STP, then yes, building out 2 separate networks would be the better option.

My risk analysis deemed that splitting storage NICs between separate LANs is riskier as there is slightly less failure tolerance in the event of multiple link failures. This is most evident in my lower-end SAN units that are used mostly for testing. These only have 1 storage NIC each. With separate networks, I have to pick 1 of the 2 storage networks for it to live on which will only allow 1 path to it from each node. In the event of a node storage NIC failure, the SAN is unreachable. However, with a single storage VLAN, if 1 storage NIC on a node goes down, there is still a viable path via node storage NIC two.

I think I will revise my guide to note that it uses a single storage VLAN and was designed to mimic VMware ESXi best practices. This may help others who are migrating to Proxmox from VMware in the enterprise.
Hi uptonguy75. Running Proxmox 9. I am trying to follow your guide, but got stuck at page 2 "Add a New iSCSI LUN (Per Node)" step 3. I do not have nodes under /etc/iscsi.

I believe we have the same configuration. Mine is:
Two Proliant servers with two NIC interfaces with addresses in the same subnet.​
HPE Alletra Storage MP B10000 with four NIC ports in the same subnet.​

Per Node: The interfaces appear in iscsiadm -m iface and discovery on each interface responds with the four Alletra addresses.

Any help would be greatly appreciated.
 
Hi uptonguy75. Running Proxmox 9. I am trying to follow your guide, but got stuck at page 2 "Add a New iSCSI LUN (Per Node)" step 3. I do not have nodes under /etc/iscsi.

I believe we have the same configuration. Mine is:
Two Proliant servers with two NIC interfaces with addresses in the same subnet.​
HPE Alletra Storage MP B10000 with four NIC ports in the same subnet.​

Per Node: The interfaces appear in iscsiadm -m iface and discovery on each interface responds with the four Alletra addresses.

Any help would be greatly appreciated.
Found the issue: I'm using Proxmox 9.0.3. For Step 3, the files are no longer in /etc/iscsi/nodes but in /var/lib/iscsi/nodes/<Target IQN>.

The document saved my life. Thanks!
 
Hey everyone, I wanted to thank uptonguy75 for his efforts on documenting and figuring this all out. Without it I would never have figured out what I needed to get our environment to work.

We have an existing VMware compute cluster that uses multiple SAN solutions from different vendor/generations.
In order to test the viability of using Proxmox as a replacement for VMware a few conditions had to be met.

1) It had to support iSCSI as our existing VMware cluster did.
2) It had to be highly available with redundant paths for storage.
3) It had to utilize VLANs on the iSCSI ports as our existing infrastructure is designed that way.
3) It had to integrate with multiple SAN solutions well as our existing does.

Figuring out all of the intricacies required to get there was no small task and since there seemed to be no good documentation on what a config should look like, I decided to post mine here to help others.

My network config ended up like this:
Code:
auto lo
iface lo inet loopback

iface enp129s0f0np0 inet manual # Trunk for Virtual Machines
iface enp129s0f1np1 inet manual # iSCSI A Trunk VLANS 102,112,122
        offload-rxvlan off
        offload-txvlan off
        offload-tso off
        offload-rx-vlan-filter off
iface enp129s0f2np2 inet manual # VLAN 91 Virtual Machine Migration
iface enp129s0f3np3 inet manual # iSCSI B Trunk VLANS 101,111,121
        offload-rxvlan off
        offload-txvlan off
        offload-tso off
        offload-rx-vlan-filter off
iface enp131s0f0np0 inet manual # iSCSI B Trunk VLANS 101,111,121
        offload-rxvlan off
        offload-txvlan off
        offload-tso off
        offload-rx-vlan-filter off
iface enp131s0f1np1 inet manual # Trunk for Virtual Machines
iface enp132s0f0np0 inet manual # iSCSI A Trunk VLANS 102,112,122
        offload-rxvlan off
        offload-txvlan off
        offload-tso off
        offload-rx-vlan-filter off
iface enp132s0f1np1 inet manual # VLAN 91 Virtual Machine Migration

iface eno1 inet manual
iface eno2 inet manual
iface eno3 inet manual
iface eno4 inet manual
iface idrac inet manual

auto bond0
iface bond0 inet manual
        bond-slaves enp129s0f0np0 enp131s0f1np1
        bond-mode 1
        bond-miimon 100
        bond-primary enp131s0f1np1
        bond-updelay 100
        bond-downdelay 100

auto VMbrdg0
iface VMbrdg0 inet static
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 150 251 261 840 842 843 844 845 891 855 870 874 880 895
        
auto bond1
iface bond1 inet manual
        bond-slaves enp131s0f0np0 enp129s0f1np1
        bond-mode 1
        bond-miimon 100
        bond-primary enp131s0f0np0
        bond-updelay 100
        bond-downdelay 100

auto iSCSIA
iface iSCSIA inet manual
        bridge-ports bond1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 101 111 121
        offload-rxvlan off
        offload-txvlan off
        offload-tso off
        offload-rx-vlan-filter off
        post-up ifconfig bond1 mtu 9000 && ifconfig iSCSIA101 mtu 9000 &&ifconfig iSCSIA111 mtu 9000 && ifconfig iSCSIA121 mtu 9000

auto iSCSIA101
iface iSCSIA101 inet static
        address 172.18.101.41/24
        vlan-id 101
        vlan-raw-device iSCSIA
        hwaddress ether 3c:fd:fe:33:44:01


auto iSCSIA111
iface iSCSIA111 inet static
        address 172.16.111.41/24
        vlan-id 111
        vlan-raw-device iSCSIA
        hwaddress ether 3c:fd:fe:33:44:02


auto iSCSIA121
iface iSCSIA121 inet static
        address 172.14.121.41/24
        vlan-id 121
        vlan-raw-device iSCSIA
        hwaddress ether 3c:fd:fe:33:44:03
        
auto bond2
iface bond2 inet manual
        bond-slaves enp132s0f0np0 enp129s0f3np3
        bond-mode 1
        bond-miimon 100
        bond-primary enp132s0f0np0
        bond-updelay 100
        bond-downdelay 100

auto iSCSIB
iface iSCSIB inet manual
bridge-ports bond2
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 102 112 122
        offload-rxvlan off
        offload-txvlan off
        offload-tso off
        offload-rx-vlan-filter off
        post-up ifconfig bond2 mtu 9000 && ifconfig iSCSIB102 mtu 9000 &&ifconfig iSCSIB112 mtu 9000 && ifconfig iSCSIB122 mtu 9000

auto iSCSIB102
iface iSCSIB102 inet static
        address 172.18.102.41/24
        vlan-id 102
        vlan-raw-device iSCSIB
        hwaddress ether f8:f2:1e:33:44:01
        #mtu 9000

auto iSCSIB112
iface iSCSIB112 inet static
        address 172.16.112.41/24
        vlan-id 112
        vlan-raw-device iSCSIB
        hwaddress ether f8:f2:1e:33:44:02
        #mtu 9000

auto iSCSIB122
iface iSCSIB122 inet static
        address 172.14.122.41/24
        vlan-id 122
        vlan-raw-device iSCSIB
        hwaddress ether f8:f2:1e:33:44:03
        #mtu 9000

auto Management150
iface Management795 inet static
        address 10.65.102.41/24
        gateway 10.65.102.1
        vlan-id 795
        vlan-raw-device VMbrdg0

auto bond3
iface bond3 inet manual
        bond-slaves enp129s0f2np2 enp132s0f1np1
        bond-mode 1
        bond-miimon 100
        bond-primary enp132s0f1np1
        bond-updelay 100
        bond-downdelay 100

auto vMotion
iface vMotion inet static
        bridge-ports bond3
        address 192.168.91.41/24
        post-up ifconfig bond3 mtu 9000 && ifconfig vMotion mtu 9000


source /etc/network/interfaces.d/*

And my multipath.conf looks like this:

Code:
defaults {
        find_multipaths strict
        find_multipaths_timeout -10
        polling_interval 5
        path_selector "round-robin 0"
        path_grouping_policy multibus
        user_friendly_names yes
        fast_io_fail_tmo 5
        dev_loss_tmo 10
}

blacklist {
        # Deny all devices by WWID to start.
        wwid .*

        # Blacklist system devices.
        devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
}

blacklist_exceptions {
        # WWIDs from HPE 3PAR LUNs
        # Run `multipath -ll` or `udevadm info` to find the WWIDs of your storage devices.

        # EMC Unity (SAN 1) LUNs
        wwid "360060000000000000000000000000000"
}

multipaths {
multipath {
        wwid "360060000000000000000000000000000"
        alias mpath_lun
   }
}

# --- Device-Specific Configuration ---
# Use the `devices` section to apply settings based on vendor and product IDs.
devices {
    # Dell PowerStore Array-Specific Settings
    device {
        # Match Dell vendor ID
        vendor "Dell"
        # Match PowerStore product ID
        product "PowerStore"
        # For PowerStore with ALUA (Active/Active), use priority-based path grouping.
        path_grouping_policy group_by_prio
        # Use ALUA path checker for PowerStore
        path_checker alua
        # Use the ALUA prioritizer
        prio alua
        # Set a round-robin selector for the active paths.
        path_selector "round-robin 0"
        # Immediate failback to a higher-priority path if it becomes available.
        failback immediate
    }

    # Dell EMC Unity Array-Specific Settings
    device {
        # Match Dell vendor ID
        vendor "DGC"
        # Match Unity product ID (e.g., Unity, VVOL)
        product "VRAID"
        product_blacklist "LUNZ"
        # For Unity (Active/Passive), use multibus path grouping.
        path_grouping_policy group_by_prio
        prio_args "prefer_active"
        prio "emc"
        # Use `emc_clariion` for Unity's active/passive handler.
        hardware_handler "1 emc"
        # Set a round-robin selector.
        path_selector "round-robin 0"
        rr_min_io 100
        rr_weight uniform
        # Set the path checker
        path_checker "emc_clariion"
        # Failback to the primary path.
        failback immediate
        no_path_retry fail
    }
}


After all of this I have a Proxmox cluster with highly available iSCSI (8 paths per LUN) that performs very well (so far it looks like it rivals if not exceeds VMware).

1 caveat is that I have had the built in network tools totally trash this config.

Any which way, I hope this helps someone else and saves them the days of hair pulling I went through.

John