Hi,
I am one of the many who was pushed out of ESXi and now start learning Proxmox. I am also in the process of replacing my actual server with a new one. So moving from an R820 running ESXi 6.7.3 to an FX2S with 2 server blades and 2 storage blades that will be running Proxmox. Also, this server is hosted in a professional data center, so has redundant power supplies and risks of physical hazard is at the lowest.
HA is to be ensured by Kubernetes and applications, like pfSense for the firewalls. The thing is that Kubernetes itself also needs 3 servers to ensure its HA. I plan to run Controller 1 from first blade server, Controller 2 from the second and I would need live migration to move Controller 3 from blade 1 to blade 2 when applying updates to Proxmox and rebooting it. I intend to do that manually and I do not want to rely on Proxmox HA for that.
So that means the typical 2 nodes cluster for Proxmox, which is discouraged. So here are a few options I am thinking about and would like to know which one would be the better according to your experience :
1-Just go with the 2 nodes as they are
Without HA and with reliable environment for everything else, will it do the job ?
2-Deploy a QDevice in a VM itself hosted in blade 1 ?
When I need to reboot it, I live migrate it and my Kubernetes Controller No 3 to blade 2 before the reboot. Once done, I move them back in blade 1. That way, quorum is always available.
3-Deploy a QDevice at my home, which is connected to the data center by site-to-site IPSec VPN
Higher latency, glitches when the FW failover happens during a reboot but will recover by itself, lower availability at my home... But because it is needed only when a node is down, would it be better than option 2 ?
4-Design another remote channel between my home and the data center for that QDevice
SSH, SSL Tunnel or whatever, I can put in place basically anything to secure that channel between the 2 environments.
5-Give 2 votes to blade server No 1 and 1 vote to blade no 2.
When rebooting node No 2, No 1 is still in command. Loss of quorum while rebooting node No 1 will not let me change anything in the cluster during that time but the VMs will keep running as they are until Node 1 is back.
As for me, option 2 is what I think would be the best. Everything will be in the server and in the data center. I have to manually live migrate my Kubernetes control plane No 3 in all cases (HA is not fast enough to rely on to restart control planes after the outage), so to do it once or twice does not change much.
I am not sure option 5 is even possible. If it is, is my understanding of Proxmox loosing quorum is right or if there is more to consider ?
Thanks for sharing your experience,
I am one of the many who was pushed out of ESXi and now start learning Proxmox. I am also in the process of replacing my actual server with a new one. So moving from an R820 running ESXi 6.7.3 to an FX2S with 2 server blades and 2 storage blades that will be running Proxmox. Also, this server is hosted in a professional data center, so has redundant power supplies and risks of physical hazard is at the lowest.
HA is to be ensured by Kubernetes and applications, like pfSense for the firewalls. The thing is that Kubernetes itself also needs 3 servers to ensure its HA. I plan to run Controller 1 from first blade server, Controller 2 from the second and I would need live migration to move Controller 3 from blade 1 to blade 2 when applying updates to Proxmox and rebooting it. I intend to do that manually and I do not want to rely on Proxmox HA for that.
So that means the typical 2 nodes cluster for Proxmox, which is discouraged. So here are a few options I am thinking about and would like to know which one would be the better according to your experience :
1-Just go with the 2 nodes as they are
Without HA and with reliable environment for everything else, will it do the job ?
2-Deploy a QDevice in a VM itself hosted in blade 1 ?
When I need to reboot it, I live migrate it and my Kubernetes Controller No 3 to blade 2 before the reboot. Once done, I move them back in blade 1. That way, quorum is always available.
3-Deploy a QDevice at my home, which is connected to the data center by site-to-site IPSec VPN
Higher latency, glitches when the FW failover happens during a reboot but will recover by itself, lower availability at my home... But because it is needed only when a node is down, would it be better than option 2 ?
4-Design another remote channel between my home and the data center for that QDevice
SSH, SSL Tunnel or whatever, I can put in place basically anything to secure that channel between the 2 environments.
5-Give 2 votes to blade server No 1 and 1 vote to blade no 2.
When rebooting node No 2, No 1 is still in command. Loss of quorum while rebooting node No 1 will not let me change anything in the cluster during that time but the VMs will keep running as they are until Node 1 is back.
As for me, option 2 is what I think would be the best. Everything will be in the server and in the data center. I have to manually live migrate my Kubernetes control plane No 3 in all cases (HA is not fast enough to rely on to restart control planes after the outage), so to do it once or twice does not change much.
I am not sure option 5 is even possible. If it is, is my understanding of Proxmox loosing quorum is right or if there is more to consider ?
Thanks for sharing your experience,