Weirdest thing ever!!!

WacArts · May 9, 2025

Hi all,

I have the weirdest thing ever. I have a two node cluster with iscsi multipath. We had a power outage a while ago and the storage on one of the nodes started to have intermittent connection issues. It would connect some of the time and fail with communication failure (0) or connection timed out (596) the rest. The summary page for the storage would show periods of connection with spaces in between.

I narrowed it down to the multipath entries after running isciadm -m session -P3 on both nodes and comparing the outputs. On the failing node, it has lost the target info on two of the paths. Where there should be two routes between the two storage volumes for each node, there is only one route active between the two storage volumes on the two hosts.

What I would like to do is reset this connection between this node and volumes without wiping out the connections with the other node

bbgeek17 · May 9, 2025

Hi @WacArts ,

WacArts said:
I have the weirdest thing ever. I have a two node cluster

It is not weird. It is a rather common shortcut that leads to many complaints when both nodes reboot due to quorum loss.

WacArts said:
We had a power outage a while ago and the storage on one of the nodes started to have intermittent connection issues. It would connect some of the time and fail with communication failure (0) or connection timed out (596) the rest.

The easiest solution would be to reboot the node.

WacArts said:
I narrowed it down to the multipath entries after running isciadm -m session -P3 on both nodes and comparing the outputs. On the failing node, it has lost the target info on two of the paths. Where there should be two routes between the two storage volumes for each node, there is only one route active between the two storage volumes on the two hosts.

Did you try to figure out why the paths are missing? Are the network paths operational? Can you ping appropriate IPs? Can you scan with: pvesm scan iscsi
May be you have a failed switch/NIC ?

WacArts said:
What I would like to do is reset this connection between this node and volumes without wiping out the connections with the other node

Any iscsiadm manipulation done on the "bad" node are local to that node. You should examine the log on the node to understand why the connection was not established.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Johannes S · May 9, 2025

bbgeek17 said:
Hi @WacArts ,

It is not weird. It is a rather common shortcut that leads to many complaints when both nodes reboot due to quorum loss.

And the solution for this is (except having a third node) is to add a small device (can even be an old pc or a raspberry) as witness node: https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

This won't help with the ISCSI issues of course but will prevent a reboot of both nodes if one of them fails.

WacArts · May 12, 2025

WacArts said:
WacArts said:

We had a power outage a while ago and the storage on one of the nodes started to have intermittent connection issues. It would connect some of the time and fail with communication failure (0) or connection timed out (596) the rest.

Click to expand...

The easiest solution would be to reboot the node.

I did that when I first noticed the issue - didn't resolve it then and probably won't resolve it now

WacArts said:
WacArts said:

I narrowed it down to the multipath entries after running isciadm -m session -P3 on both nodes and comparing the outputs. On the failing node, it has lost the target info on two of the paths. Where there should be two routes between the two storage volumes for each node, there is only one route active between the two storage volumes on the two hosts.

Click to expand...

Did you try to figure out why the paths are missing? Are the network paths operational? Can you ping appropriate IPs? Can you scan with: pvesm scan iscsi
May be you have a failed switch/NIC ?

I ran ICMP on all network ports that are implicated in this issue on both hosts and all responded. pvesm scam iscsi also returned all connected. Yes I did try to figure out why the paths are missing, I just haven't found a logical explanation as to why they aren't. They are all connected, it just seems that two of the connections to the storage on one node haven't connected properly.
If someone could explain to me how to reset these connections without trashing the remaining connections or the storage devices, then I would love to hear it

WacArts said:
WacArts said:

What I would like to do is reset this connection between this node and volumes without wiping out the connections with the other node

Click to expand...

Any iscsiadm manipulation done on the "bad" node are local to that node. You should examine the log on the node to understand why the connection was not established.

I have done some more digging and the open-iscsi service fails to start because of the iscsiadm initiator reported an error. This is the error that has kicked my backup storage over and I want to restore the initiator connections to the correct iface targets. The iface targets have different IP addresses where they have lost the connection to the target hence the failure as it is looking for something that doesn't exist

bbgeek17 · May 12, 2025

Hi @WacArts,

It would be helpful if you could provide structured data CLI outputs, command history, and a clear comparison between the “good” and “bad” system states.

It’s always a bit frustrating for the community when troubleshooting becomes a slow back-and-forth of “try this” followed by “I already tried that.” Detailed, upfront information makes it much easier (and faster) to help.

Thanks!

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

WacArts · May 12, 2025

Johannes S said:
And the solution for this is (except having a third node) is to add a small device (can even be an old pc or a raspberry) as witness node: https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

This won't help with the ISCSI issues of course but will prevent a reboot of both nodes if one of them fails.

Thanks Johannes, something to look into when I get this running again

Search

Search

Weirdest thing ever!!!

WacArts

New Member

bbgeek17

Distinguished Member

Johannes S

Renowned Member

WacArts

New Member

bbgeek17

Distinguished Member

WacArts

New Member

We value your privacy