Hello guys,
I would love to get some opinions/tipps from you guys who know more than me, especially I would love to get a response from user "e100" who seems to have been running what I am going to describe below
I am currently planning on setting up a Proxmox VE Cluster trying to get HA with as low as possible and feasable money spent (I can not get a proper fencing device).
This is what my brain believes should work after some reading here (especially after some posts from e100):
Hardware:
2x "Powerfull" servers for running KVMs (with RAID10, but that is irrelevant to the following thoughts)
1x "small" server only beeing there as "quorum" witness
Goal:
nodeA RUNS KVM_1 & KVM_2 and has a syncronised working copy via DRDB of nodeB which RUNS KVM_3 & KVM_4 and itself (nodeB) has a syncronised working copy via DRBD of nodeA.
nodeC("small-only-for-quorum") is only part of the cluster to provide Quorum so I can have VMs manged via HA failover-domains.
Setup:
On the 2 "powerfull" servers (nodeA & nodeB) I want to install Proxmox on top of Debian so I can do manuell partitioning.
I would create on each of the two "powerfull" nodes:
3x LVM2 VolumeGroups(VG):
1) VG_host (just containing debian proxmox host system)
2) VG_vms_local (containing the KVMs which should be RUNNING on the host when the cluster is fully operational, meaning the opposite node is NOT down)
3) VG_vms_remote (containing the KVMs which should ONLY be running when the opposite node is down, meaning at that time there would be runnung all 4 KVMs on the node)
VG_vms_local and VG_vms_remote would be replicated via drbd, where:
VG_vms_local is drbd ressource r0
VG_vms_remote is drbd ressource r1
drbd replication is done via a set of two separate network links (eth2 & eth3) by bonding, directly, back-to-back
cluster / LAN communciation is done via a set of two seperate network links (eth0 & eth1) by bonding, to ONE switch
On the "small" server (nodeC) I would do a standard Proxmox install.
Now all 3 servers would be added to one Proxmox cluster.
How I think it would run:
With all 3 servers beeing in the cluster and only nodeA and nodeB hosting KVMs (replicated via drbd to each others) and the third node3 only beeing there as "Quorum witness" (having no access to the KVMs data), I should be able to run HA.
My questions now:
I need to setup proper failover-domains in case nodeA fails (no connection to the cluster for whatever reason).
Now nodeB & nodeC should both see that nodeA is not responding and therefore the two KVMs KVM_1 & KVM_2 should be started AUTOMATICALLY on nodeB.
1) Can this be done via failover-domains? How to do it? Is there a good example/howto into this failover-domain thing? This is where my knowledge/understanding gets blurry.
2) Let's assume everything worked out as described (example above), what happens after nodeA comes back up again? Will KVM_1 & KVM_2 be stopped on nodeB and started on nodeA again AFTER drbd syncronisation has been completed, automatically?
3) Can I really do this setup without a fencing device that ENSURES that (in the above example) nodeA IS down?
What if nodeA only lost network connection to the cluster / LAN on eth0 & eth1 and the drbd replication connection via eth2 & eth3 is still active? It would mean I have a fucked up mess, right? Can this be taken care of by failover-domains?
4) Did I forget anything? Is my setup build on any faulty assumption that would break everything I have planned?
Thanks for reading the whole post up to this last line
I would love to get some opinions/tipps from you guys who know more than me, especially I would love to get a response from user "e100" who seems to have been running what I am going to describe below
I am currently planning on setting up a Proxmox VE Cluster trying to get HA with as low as possible and feasable money spent (I can not get a proper fencing device).
This is what my brain believes should work after some reading here (especially after some posts from e100):
Hardware:
2x "Powerfull" servers for running KVMs (with RAID10, but that is irrelevant to the following thoughts)
1x "small" server only beeing there as "quorum" witness
Goal:
nodeA RUNS KVM_1 & KVM_2 and has a syncronised working copy via DRDB of nodeB which RUNS KVM_3 & KVM_4 and itself (nodeB) has a syncronised working copy via DRBD of nodeA.
nodeC("small-only-for-quorum") is only part of the cluster to provide Quorum so I can have VMs manged via HA failover-domains.
Setup:
On the 2 "powerfull" servers (nodeA & nodeB) I want to install Proxmox on top of Debian so I can do manuell partitioning.
I would create on each of the two "powerfull" nodes:
3x LVM2 VolumeGroups(VG):
1) VG_host (just containing debian proxmox host system)
2) VG_vms_local (containing the KVMs which should be RUNNING on the host when the cluster is fully operational, meaning the opposite node is NOT down)
3) VG_vms_remote (containing the KVMs which should ONLY be running when the opposite node is down, meaning at that time there would be runnung all 4 KVMs on the node)
VG_vms_local and VG_vms_remote would be replicated via drbd, where:
VG_vms_local is drbd ressource r0
VG_vms_remote is drbd ressource r1
drbd replication is done via a set of two separate network links (eth2 & eth3) by bonding, directly, back-to-back
cluster / LAN communciation is done via a set of two seperate network links (eth0 & eth1) by bonding, to ONE switch
On the "small" server (nodeC) I would do a standard Proxmox install.
Now all 3 servers would be added to one Proxmox cluster.
How I think it would run:
With all 3 servers beeing in the cluster and only nodeA and nodeB hosting KVMs (replicated via drbd to each others) and the third node3 only beeing there as "Quorum witness" (having no access to the KVMs data), I should be able to run HA.
My questions now:
I need to setup proper failover-domains in case nodeA fails (no connection to the cluster for whatever reason).
Now nodeB & nodeC should both see that nodeA is not responding and therefore the two KVMs KVM_1 & KVM_2 should be started AUTOMATICALLY on nodeB.
1) Can this be done via failover-domains? How to do it? Is there a good example/howto into this failover-domain thing? This is where my knowledge/understanding gets blurry.
2) Let's assume everything worked out as described (example above), what happens after nodeA comes back up again? Will KVM_1 & KVM_2 be stopped on nodeB and started on nodeA again AFTER drbd syncronisation has been completed, automatically?
3) Can I really do this setup without a fencing device that ENSURES that (in the above example) nodeA IS down?
What if nodeA only lost network connection to the cluster / LAN on eth0 & eth1 and the drbd replication connection via eth2 & eth3 is still active? It would mean I have a fucked up mess, right? Can this be taken care of by failover-domains?
4) Did I forget anything? Is my setup build on any faulty assumption that would break everything I have planned?
Thanks for reading the whole post up to this last line
Last edited: