Hi NStorm
I use DRBD since very years ago without problems
These are my recomendations:
- NIC(s) dedicated for DRBD, connected in mode "NIC-to-NIC", and if are 2 NICs dedicated for DRBD, then use bonding with balance-rr mode for get double speed of net communication, and set the MTU of this(these) NIC(s) the max (see the hardware manual of your NICs for know until how much is supported).
- For a quick task of maintenance (in case of split-brain or something like), you must have two DRBD partitions. Then the VMs of the first PVE node writes in the first DRBD partition, and the second PVE node writes in the second DRBD partition, of this manner, you can resolve many problems of split-brain quickly and all online, ie without power off nothing.
- If you want more speed, sure that if you have different disks for each DRBD partition will be better (for don't execute in simultaneous the writes and reads of a DRBD partition with the other DRBD partition in the same disk).
- For get more perfomance in DRBD, see the tuning of DRBD of the website of linbit
- Also, you can run a verify online of your replicate DRBD volumes without power off nothing (i have it in automatic mode in the crontab with a personal bash script that report the time of start and end as some other things more)
- In my mini test lab, i have it with only two PVE Nodes, and for get "HA", i use the manual fence that require a minimun of human intervention for get that the VMs starts in the other PVE Node, and all this without lose any data of VMs
- With this scenery, you can have HA for your VMs "and not for your CTs"
Good luck with your proyect
Best regards
Cesar
Red-Edited:
I assume DRBD are still the best choice here for 2 nodes. And I'd better setup pair of master/slave volumes for both nodes to easier the split-brains
With PVE is better to have Active/Active, because:
1- Can you have live migration of VMs
2- "HA" also will work well, and better if you have each DRBD volume for each PVE node
3- Also in the global confguration file of DRBD, you can configure send a mesage immediately for email if a replication of DRBD volume have any fail
But will it work for HA or do I have to manually resolve on one node failure?
With only two PVE Nodes, the best option for apply "HA" is only manually, configuring manual fence in the configuration of PVE cluster, and after, where you have a problem with the PVE node, you will can execute "HA" with only run a command: /usr/sbin/fence_ack_manual <the_name_of_PVE_node_with_problems> (but before, for be sure, you should power off the PVE node disconnecting the electric power)
Notes:
1- DRBD have the best performance that you will can get with only two PVE nodes
2- With only two PVE nodes, apply a automatic fencing is a bad idea, because in this scenery each PVE Node don't get a majority of votes of quorum (then, the PVE node more quick will apply fencing to the other PVE node, and this action don't is the ideal), while that with a manual fence, you can think, analize the situation, and after that you are sure, you can apply manual fencing over the PVE node that you want.