[SOLVED] Asking for your experience

Heracles31 · Aug 4, 2024

Hi,

I am one of the many who was pushed out of ESXi and now start learning Proxmox. I am also in the process of replacing my actual server with a new one. So moving from an R820 running ESXi 6.7.3 to an FX2S with 2 server blades and 2 storage blades that will be running Proxmox. Also, this server is hosted in a professional data center, so has redundant power supplies and risks of physical hazard is at the lowest.

HA is to be ensured by Kubernetes and applications, like pfSense for the firewalls. The thing is that Kubernetes itself also needs 3 servers to ensure its HA. I plan to run Controller 1 from first blade server, Controller 2 from the second and I would need live migration to move Controller 3 from blade 1 to blade 2 when applying updates to Proxmox and rebooting it. I intend to do that manually and I do not want to rely on Proxmox HA for that.

So that means the typical 2 nodes cluster for Proxmox, which is discouraged. So here are a few options I am thinking about and would like to know which one would be the better according to your experience :

1-Just go with the 2 nodes as they are

Without HA and with reliable environment for everything else, will it do the job ?

2-Deploy a QDevice in a VM itself hosted in blade 1 ?

When I need to reboot it, I live migrate it and my Kubernetes Controller No 3 to blade 2 before the reboot. Once done, I move them back in blade 1. That way, quorum is always available.

3-Deploy a QDevice at my home, which is connected to the data center by site-to-site IPSec VPN

Higher latency, glitches when the FW failover happens during a reboot but will recover by itself, lower availability at my home... But because it is needed only when a node is down, would it be better than option 2 ?

4-Design another remote channel between my home and the data center for that QDevice

SSH, SSL Tunnel or whatever, I can put in place basically anything to secure that channel between the 2 environments.

5-Give 2 votes to blade server No 1 and 1 vote to blade no 2.

When rebooting node No 2, No 1 is still in command. Loss of quorum while rebooting node No 1 will not let me change anything in the cluster during that time but the VMs will keep running as they are until Node 1 is back.

As for me, option 2 is what I think would be the best. Everything will be in the server and in the data center. I have to manually live migrate my Kubernetes control plane No 3 in all cases (HA is not fast enough to rely on to restart control planes after the outage), so to do it once or twice does not change much.

I am not sure option 5 is even possible. If it is, is my understanding of Proxmox loosing quorum is right or if there is more to consider ?

Thanks for sharing your experience,

sw-omit · Aug 4, 2024

Recently moved from ESX (8.0 though) to proxmox (with 5 servers, the same servers even, re-installing and transferring VM's one server at the time)
1. Indeed 2 nodes is very much discouraged, but it will work, as long as both nodes remain in sync and active. If one of them goes down (for updates), things that are running while remain running, but making changes (which includes starting VM's) will be blocked. Restarting a VM though FROM WITHIN THE VM (which keeps the KVM-process running) will work, reboot from proxmox will not since it's basically a stop-and-start.
2. If that host the QDevice goes down for whatever (unplanned) reason, you're still in the same boat, needing tricks/commands/work-arounds to get things running again.
3. QDevices have lower requirements, so the higher latency isn't that big of an issue
4. The traffic is already SSH-encrypted/-tunneled, so a simple (ip-filtered) port-forward would probably do the job as well
5. No real advantage, and just adds complexity in my opinion.
One other option you might want to look into though, would be the qm remote-migrate function [1], which would let you migrate between clusters, but also between 2 non-clustered servers, keeping them both as working, stand-alone servers but still gives the option to move servers

[1] https://pve.proxmox.com/pve-docs/qm.1.html

Heracles31 · Aug 5, 2024

Thanks for your answer. The remote migration is marked as experimental, so I will rather use another option. So I guess it will be between options 2 and 3…

esi_y · Aug 5, 2024

Heracles31 said:
1-Just go with the 2 nodes as they are

Without HA and with reliable environment for everything else, will it do the job ?

It would, but QD is better. Have a look at corosync options such as two_node, wait_for_all, auto_tie_breaker:
https://manpages.debian.org/bookworm/corosync/votequorum.5.en.html

Heracles31 said:
2-Deploy a QDevice in a VM itself hosted in blade 1 ?

Makes no sense, see above.

Heracles31 said:
3-Deploy a QDevice at my home, which is connected to the data center by site-to-site IPSec VPN

Latency is not a problem for QD, IPSec s2s is great for this.

Heracles31 said:
4-Design another remote channel between my home and the data center for that QDevice

After you deploy it (it SSH connects to your qnetd host and executes a script), it continues to communicate with TLS, you can check with corosync-qnetd-tool -l

Heracles31 said:
5-Give 2 votes to blade server No 1 and 1 vote to blade no 2.

See auto_tie_breaker_node instead.

Heracles31 · Aug 5, 2024

Thanks for your input.

The corosync options you pointed me to are interesting. I will have to read them another time to be sure to properly understand them and how to operate them.

You said that option 2 makes no sense. I understand that it does not if the QDevice remains in Node 1 all the time. The idea of that option is to move the QDevice from Node1 to Node 2 whenever I need to do maintenance on Node 1. That way, the regular quorum of 2 out of 3 votes would be maintained during all the time :
--all 3 votes most time
--the 2 Proxmox nodes during migration of the QDevice
--the QDevice and 1 node while the other node reboot

The probability of only 1 node going down unexpectedly is very low because the 2 blades are in the same chassis. So if both goes down, I will have problems to deal with but no quorum problems. If one node goes down unexpectedly, I have 50% chance of keeping quorum. In the last 50%, only then I would drop loosing quorum.

Do you still consider it non-sense ? I thought that it would ensure proper quorum all the time, so would safeguard the cluster and do better than dropping to 50% of votes.

esi_y · Aug 5, 2024

Heracles31 said:
Thanks for your input.

The corosync options you pointed me to are interesting. I will have to read them another time to be sure to properly understand them and how to operate them.

The options were made specifically for cases like yours. I know you might get some input here that they are "unsupported" by PVE, which means that they do not test them, but most importantly not willing to troubleshoot if you have e.g. paid subscription. Given the fact the other options on your list (and a 2-node cluster itself) fall all within this category, I think they are the best (if you do not want to run the external QD).

Heracles31 said:
You said that option 2 makes no sense.

It makes no sense for the simple reason that the same functionality can be achieved by shuffling around the corosync votes, but then again PVE officially relies on all nodes having exactly 1 vote (not that I would know of anything that breaks with it). There's no problem running a QD in a VM, just there's zero sense to run it in a VM within the same cluster. Within the cluster, you can shuffle around corosync votes.

Heracles31 said:
I understand that it does not if the QDevice remains in Node 1 all the time. The idea of that option is to move the QDevice from Node1 to Node 2 whenever I need to do maintenance on Node 1. That way, the regular quorum of 2 out of 3 votes would be maintained during all the time :
--all 3 votes most time
--the 2 Proxmox nodes during migration of the QDevice
--the QDevice and 1 node while the other node reboot

All these are true for the uneven votes count. So that gives you same flexibility without being confronted with a situation that you need to launch a VM within an inquorate cluster to have it quorate and in order to do that you would first have to artificially set quorum to 1. You might have as well shuffled the votes instead to achieve the same.

Heracles31 said:
The probability of only 1 node going down unexpectedly is very low because the 2 blades are in the same chassis. So if both goes down, I will have problems to deal with but no quorum problems. If one node goes down unexpectedly, I have 50% chance of keeping quorum. In the last 50%, only then I would drop loosing quorum.

With all that said, the uneven nodes count is also completely unnecessary solution given the other options available (if you do not want to run a QD - properly):

1) The two_node sets your quorum to 1 already, so (any) one node going down will not be a problem. There's the caveat there that it activates wait_for_all which you might not want, which is fine, because ...

2) If you were to run a HA scenario (I would not recommend that), you would however need (and you still might prefer without HA) to be sure both nodes do not think they are quorate at the same time if split by network (I wonder how that would happen without both being cut off from the outside world at the same time). This is what auto_tie_breaker and auto_tie_breaker_node is for - to deterministically define the "master" node.

3) If you were to have more than 2 nodes, the last_man_standing option would be for you for the scenario no (2).

So the summary is all you wanted to achieve with that jerry-rigged QD can be achieved by the 2/1 votes distribution, which in turn can be instead achieved by the auto_tie_breaker. If all of that sounds somehow unstable to you, I would recommend running the external QD (note that there's also similarly named options which allow you to tweak how QD behaves and they hardly well tested by PVE staff anyhow). There's no additional risks associated with a QD outside of the cluster for your situation.

Heracles31 said:
Do you still consider it non-sense ? I thought that it would ensure proper quorum all the time, so would safeguard the cluster and do better than dropping to 50% of votes.

Do you still consider it a meaningful alternative to auto_tie_breaker?

Just to be clear, my preferences would be:

1) Off-site QD (could be a VM, even in another cluster, even in public cloud);
2) auto_tie_breaker;
3) two_node.

Note that option 2 gives you what you would get with the MacGyver approach without the peanut butter all over the counter. Option 3 gives you what QD does without having to run any but a little more risk of split brain. You might now wonder how option 1 is any better to have QD jerry-rigged. Well, the QD is the referee in case either of your nodes/blades has no idea what's going on in the outside world. Why does that matter? Because say you e.g. have a routing issue that cuts off only one of your nodes from the other node and from the ouside. If that is your "master node", your services are down even they do not have to be. In option 3 they would be up (without any QD). In option 1 they would be up (with a proper QD). In option 2 they would be down but without need for QD.

Heracles31 · Aug 5, 2024

esi_y said:
Do you still consider it a meaningful alternative to auto_tie_breaker?

For now, I will confess that I do but I trust that it is more because I do not understand the auto_tie_breaker option properly. As I said, I will have to read these options again to better understand them.

But for sure, I understand your answer : this option No 2 is a MacGyver approach (very compatible with my personality

) that has no real benefit at the cost of over-complicating things.

Thanks again for your experience. Very useful and appreciated.

Heracles31 · Aug 5, 2024

Ok, I think I get it better now...

As long as the 2 nodes are up, there is no need for anything. They do just fine : quorum is there and all functionalities are there.

For something like a simple reboot, loss of quorum is no big deal. For a moment I will not be able to do much with the remaining node but it will keep running its load and the cluster will recover in a matter of minutes. Also, there are no risks of damage thanks to loss quorum. No point making something like my MacGyver movable QDevice for this.

With that, it is well over 99.99% of the time that is properly handled without anything special. Adding Auto-tie-breaker would let me an operational cluster even when the node that reboots is No2 but this gain is small considering the loss of quorum for a few minutes is of no concern.

So now, what needs to be managed are corner cases. A reliable QDevice is the proper answer for these cases. Unfortunately, I can not have an onsite QDevice that will be as available and reliable as the Proxmox nodes themselves.

These corner cases are :
A-when a node goes down for a longer period and / or in an unexpected way
B-Both nodes remain up but loose connection to each other and potentially to outside as well (probably config error here...).

For A, auto-tie-breaking would force Node 1 to be the master. If that is the one down, I will be in trouble until I recover it. If it is the one up, I will be able to operate it. A functional QDevice would let me use the proper node, no matter which one is the good one. So Yes, a reliable QDevice is important.

As for B, a functional QDevice would also ensure one node would remain the master.

So now, I can define my problem better : Which QDevice will prove itself most helpful ? An external one with lower availability, my MacGyver internal and movable one or no QDevice at all ?

Time for me now to evaluate my risks, considering that all of them are corner cases :

--An internal and movable QDevice will do better than an external one in controlled situations (properly moved before exposing a node to a risk like an update) because its availability will be better than the remote one (power, network connection, ...).

--An external QDevice can do better than the internal one in some uncontrolled situations, as long as the unexpected problem does not affect communication with at least one node and there are no problems with the less reliable remote QDevice itself.

Also, because human error is a major part of most incidents, any extra ops is a risk by itself.
--Internal and movable QDevice requires extra ops, so is more error prone and as such higher risk by definition
--External QDevice will be more stable in term of operation, so lower risk

So what will I choose as my residual risk...

Once internal QDevice failed me, it may be harder to re-establish everything. If the external QDevice fails me, the cluster should wait for me to restore it and than recover without much damages. To restore such a remote connection should not be too bad.

Probability is already low because all of these are corner cases.
Less reliable QDevice has a potential impact lower than other because both nodes should keep running their load, freeze to prevent any change and wait for me to re-establish a network connection.

That is way easier than recovering corrupted config or database or some other internal problems that may happen inside the cluster during an incident.

So again, what residual risk will I choose...

1-External QDevice without the proper power protection, Internet connectivity and more factors that can disrupt it when it will be needed but with low impact on the cluster ?

2-Internal and movable QDevice that requires more operations but will be reliable as long as it is running from the proper node, at the risk of higher impact when the house of cards falls down ?

3-No QDevice at all because all of these are corner cases and auto-tie-breaker or other options can ensure the cluster should be recoverable without too much damage ?

Agree that now, my MacGyver solution is less appealing

Thanks again for your input, I think I have a better understanding and will be able to take a better decision. The only proper answer is a reliable QDevice. Up to me to deploy one with high enough reliability for my need or use alternate options to safeguard the cluster in case something wrong happen.

esi_y · Aug 5, 2024

Heracles31 said:
So again, what residual risk will I choose...

1-External QDevice without the proper power protection, Internet connectivity and more factors that can disrupt it when it will be needed but with low impact on the cluster ?

2-Internal and movable QDevice that requires more operations but will be reliable as long as it is running from the proper node, at the risk of higher impact when the house of cards falls down ?

3-No QDevice at all because all of these are corner cases and auto-tie-breaker or other options can ensure the cluster should be recoverable without too much damage ?

Agree that now, my MacGyver solution is less appealing

Thanks again for your input, I think I have a better understanding and will be able to take a better decision. The only proper answer is a reliable QDevice. Up to me to deploy one with high enough reliability for my need or use alternate options to safeguard the cluster in case something wrong happen.

I am glad you gave it thorough thought. I would just re-iterate that that external QD (with lower "reliability") is likely more preferred to an on-site one. The reason to me would be that I typically need those services be available from the outside world. If both nodes are up, they do not really need any QD. Without HA services, you do not run any risk of getting reboots even when QD is down and quorum is lost. But I would not like the case when e.g. on-site QD during a split network situation would take the side of the "wrong" node. If if tells the "wrong" node that it has the quorum, the one that just got cut off from the outside world, it would not be helpful to me. Yes you have preserved integrity, only one node that has a quorum, deterministic situation, but at times that one node might easily be one that you cannot access from the ouside world, the world that it typically should provide services to.

And you are managing all this from the outside fully knowing that you see one node that lost quorum but do not see the other node and the QD which are happily idle in an isolated network part.

Heracles31 · Aug 6, 2024

That last argument for the external QDevice is the one I needed most

You are very right that because I am running this server remotely, it is crucial that the QDevice endorses the node that is reachable from outside no matter the rest of the situation. For that, it is essential that QDevice itself runs outside.

Thanks again,

Search

Search

[SOLVED] Asking for your experience

Heracles31

New Member

sw-omit

Active Member

Heracles31

New Member

esi_y

Active Member

Heracles31

New Member

esi_y

Active Member

Heracles31

New Member

Heracles31

New Member

esi_y

Active Member

Heracles31

New Member