HA and Fencing ... how reliable is it?

pashadee

Active Member
Jan 11, 2014
34
0
26
Hi guys,

I am not sure how High Availability and Fencing works exactly in proxmox so I am hoping someone with more experience than me can shed some light on this for me.

I have 5 Dell servers with iDrac 6 Express which also support IPMI, so using either one of those 2 methods I can setup fencing, but still have an uneasy feeling about running HA Cluster because I don't know the answer to the following scenarios:

1. I suffer a power outage and all 5 nodes go down and get rebooted at the same time.... what happens once the nodes are back on? What happens if some take longer coming on because a file system check is triggered or for other reasons?

2. What happens if my switch fails?

3. What happens if a port on the switch fails?

Thanks for your guys' time, much appreciated!

Paul
 
1. I suffer a power outage and all 5 nodes go down and get rebooted at the same time.... what happens once the nodes are back on? What happens if some take longer coming on because a file system check is triggered or for other reasons?

I'm not sure, but you need to have quorum before HA work again.
So when you'll have 3 nodes up, HA will work.
If the 2 others nodes are not yet up, I'm not sure if they'll be fenced by the 3 others.

2. What happens if my switch fails?

3. What happens if a port on the switch fails?


nothing, because you can't connect to the fencing devices.
Also, no more multicast, so rgmanager will no work too.

for HA, you need redudant switchs + bonding !

 
Thanks for your response!


I'm not sure, but you need to have quorum before HA work again.
So when you'll have 3 nodes up, HA will work.
If the 2 others nodes are not yet up, I'm not sure if they'll be fenced by the 3 others.

-- Once a node is fenced I believe the way to bring them back in is manually correct?


nothing, because you can't connect to the fencing devices.
Also, no more multicast, so rgmanager will no work too.

-- So in this scenario it sounds like no harm should be done either because if the rgmanager isn't working, it shouldn't out of the blue launch the VM on another node? In other words if my switch fails and all nodes are now stand alone, I'm assuming that whatever VM is in HA managed mode (so say VM100), it will continue to run on the node it was on, and there would be no attempt to start it up on another node, would you agree?


for HA, you need redudant switchs + bonding !

-- this isn't an option for me as each of my servers has only 2 ethernet ports. eth0 --> Proxmox ... eth1 --> Ceph ... I suppose the only other thing I can try is to setup fencing via both interfaces?



Thanks for your time!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!