How work HA+ DRBD + cLVM and communication failure

hermelin

Renowned Member
Sep 28, 2012
35
2
73
Hello
I plan use Proxmox with HA + DRBD + cLVM on two nodes in PRIMARY/PRIMARY mode. Each node have only one LAN port.
Thanks for great manual here http://pve.proxmox.com/wiki/DRBD

But how work cluster in this situation ?

Situation: Both nodes will run correctly but communication link between nodes died (for example switch failed). Both nodes lost network connection and doesnt connect through UTP cross-cable. DRBD sync lost.

And my questions is: What cluster will do ?
1) Run all VMs from opposite node ? So all VMs run twice ! And than I must stop them and run this http://pve.proxmox.com/wiki/DRBD#Recovery_from_split_brain_when_using_two_DRBD_volumes
2) Cluster do nothing? So what will do if one node failed ? Is differed between bad node or communication link?
3) Is Proxmox HA+DRBD cluster with only one LAN per node impossible ? I must use next reliable connection between both nodes (cross-cable).

Thank for reply

Zdenek
 
If your single switch fails, nothing happens.
The cluster setup in Proxmox requires quorum to do anything.
If no nodes can talk there is no quorum, if there is no quorum then no decisions on what to do can be made.

You really need to have three nodes to make proper quorum too, so just two DRBD nodes is really not enough if you want HA.
Next, for HA you need fencing, without fencing bad things could happen such as the "all VMs run twice" problem you asked.

If you want HA you need HA everything, redundant network connections across redundant switches using redundant power, etc etc etc.

You really should have a different LAN port for your normal communication vs DRBD replication.
We use redundant 10G Infiniband for DRBD and Proxmox cluster communication and redundant 1G Ethernet for VM network communications.

Lastly, the cluster is not aware of the DRBD status.
Lets say you were in the middle of syncing DRBD volume and the cluster decided to fence the node with good data and start your VM on the node with bad data.
That might end up being the beginning of a bad day.
HA with DRBD does work, I've tested it, but I do not use it because I would rather let a human make a decision on what to do.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!