iSCSI multipath - no reconnect after switch reboot

Jun 9, 2025
19
2
3
Hi,

OK, I have two pve v9.0.11 nodes in my cluster where I'm trying to get iSCSI and multipath working. As part of the overall hardware set up, I've got the two nodes (call them node 1 and 2), a TrueNAS SAN (Enterprise, dual controller), and two Unifi fiber agg switches (not the ECS model so no MLAG unfortunately). Each node has a connection to each switch (2 different subnets), and the SAN is cross-connected to the switches over the 2 subnets (controller A-1 to switch1, controller A-2 to switch2, controller B-1 to switch 2, controller B-2 to switch 1).

I've followed the multipath wiki and it works - I have iSCSI connections and a connected LVM for guests. Problem is I'm running into reconnect issues if the switches are rebooted. If I reboot switch 1, node 2 loses one of the iSCSI paths and does not reconnect that path until I reboot the node. Node 1 does reconnect once the switch has fully booted. If I reboot switch 2, both nodes lose one of the iSCSI paths and neither reconnect until I reboot each node. I'm also unable to ping the SAN from the nodes on the affected interface.

To be clear, multipath does appear to be working in as far as I don't 100% lose connectivity...I just can't seem to get 100% connectivity back after a switch reboot other than to also reboot the nodes. Note that I DO NOT see this behavior if I simply unplug either interface. For an unplug, the corresponding target goes down but is restored/reconnected upon plugging the cable back in.

Anyone have any thoughts as to why this happens and is there any way to mitigate? Thanks...
 
OK, after dumbing the SAN connections down a bit (one interface per controller) I was able to id the issue as a problem with flow control on one of my multiport NICs....flow control wasn't being enabled after a disconnect (ex. switch reboot). Both nodes have same model of card which seems to explain the behavior. Also, the state of the SAN (controller a as primary vs. controller b, etc.) was a contributing factor to whether both connections dropped or only one.

I've added some ethtool commands to /etc/network/interfaces which seems to work most of the time...flow control is re-enabled once the iface comes up and the iSCSI connection is reestablished. However, the "pre-up" command (for the ethtool) doesn't always run as expected...guessing a timing issue with the switch reboot or something similar. Doing some research, I found references to using systemd/network as a way to more reliably enable flow control. However on my pve host, there are no references to any of my interfaces.... does (or can) ProxMox use /systemd/network?

Thanks

UPDATE: Neither pre-up nor post-up are working consistently in this scenario.... I'm having to go to the shell and execute "ethtool -r [IF-NAME]" to correct the connection issue.... any advice would be appreciated, thanks again.
 
Last edited: