Weirdest thing ever!!!

Nov 26, 2024
2
0
1
Hi all,

I have the weirdest thing ever. I have a two node cluster with iscsi multipath. We had a power outage a while ago and the storage on one of the nodes started to have intermittent connection issues. It would connect some of the time and fail with communication failure (0) or connection timed out (596) the rest. The summary page for the storage would show periods of connection with spaces in between.

I narrowed it down to the multipath entries after running isciadm -m session -P3 on both nodes and comparing the outputs. On the failing node, it has lost the target info on two of the paths. Where there should be two routes between the two storage volumes for each node, there is only one route active between the two storage volumes on the two hosts.

What I would like to do is reset this connection between this node and volumes without wiping out the connections with the other node
 
Hi @WacArts ,
I have the weirdest thing ever. I have a two node cluster
It is not weird. It is a rather common shortcut that leads to many complaints when both nodes reboot due to quorum loss.
We had a power outage a while ago and the storage on one of the nodes started to have intermittent connection issues. It would connect some of the time and fail with communication failure (0) or connection timed out (596) the rest.
The easiest solution would be to reboot the node.
I narrowed it down to the multipath entries after running isciadm -m session -P3 on both nodes and comparing the outputs. On the failing node, it has lost the target info on two of the paths. Where there should be two routes between the two storage volumes for each node, there is only one route active between the two storage volumes on the two hosts.
Did you try to figure out why the paths are missing? Are the network paths operational? Can you ping appropriate IPs? Can you scan with: pvesm scan iscsi
May be you have a failed switch/NIC ?

What I would like to do is reset this connection between this node and volumes without wiping out the connections with the other node
Any iscsiadm manipulation done on the "bad" node are local to that node. You should examine the log on the node to understand why the connection was not established.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S