Iscsi storage lost and HA issue

Ch@rlus

Renowned Member
Feb 14, 2013
31
3
73
Hi guys,

I recently built a Proxmox 5.1 cluster, with 5 identical HP servers.

Alls theses servers are connected throught ISCSI to a storage bay, with a multipath configuration.

Everything is running great, but I wanted to test several scenarios, and had an issue on the following one : What happens if a server suddenly lost all his iscsi link, but remains in the quorum ?

I simply removed the iscsi vlans from this server, while running i/o bench on a few VMs. Once the iscsi links were down, the VM started to produce i/o errors (seems legit), but after waiting 5min, I noticed that they weren't migrated on another node with alive iscsi. All theses VM were in a HA group.

Is this behaviour normal ? Is there something I can do to change this and make sure that a node with non-alive storage will "migrate" / restart it's VM on another node ?

Thanks in advance,
Regards
 
Is this behaviour normal ? Is there something I can do to change this and make sure that a node with non-alive storage will "migrate" / restart it's VM on another node ?

Yes, this is expected. You should make your iscsi connection redundant, so that this cannot happen.
 
Thanks for your reply. That's what I wanted to be sure.

I'll take the necessary measures to make sure that the iscsi links are redundant, but things may always happen and broke it.

I'll probably see if I can implement an auto-fence on my node if it detects iscsi links down for X seconds. That's be a good feature to include in the proxmox fencing mechanism too (I do not see any case were this could cause any trouble : If the storage is down, then the node is unusable)
 
I'll probably see if I can implement an auto-fence on my node if it detects iscsi links down for X seconds. That's be a good feature to include in the proxmox fencing mechanism too (I do not see any case were this could cause any trouble : If the storage is down, then the node is unusable)

It is just difficult to detect/judge if the storage is down. But yes, the plan was to add a such tests - maybe a generic way to run tests