Alternative to corosync for large clusters

illustris · Feb 17, 2021

We've spent the past 6 months trying to make a 50 node PVE cluster stable. While we managed to identify and report multiple issues with corosync that cause instabilities, at this point it is still far from stable. The issue isn't just keeping the corosync cluster alive. Whenever Corosync has one of its "episodes", all the nodes in the cluster start flooding each other with UDP traffic. In other words, all the nodes in the cluster DDoS each other. If corosync and your VMs/LXCs share a NIC, it will make these guests unreachable. Sometimes it gets bad enough to the point that you can't even SSH in. Given this risk of catastrophic failure at large scales, is anyone at Proxmox looking into the viability of alternatives like using an external Zookeeper cluster for larger PVE clusters? The only prior mention of this I could find was this mailing list thread from 2016:

https://lists.proxmox.com/pipermail/pve-devel/2016-September/022909.html

dietmar · Feb 17, 2021

The plan is to provide some GUI to manage multiple clusters, so there should be no need to setup such large corosync clusters.

illustris · Feb 17, 2021

That would be very useful. Although it wouldn't solve the issue of shared storage. Since each cluster expects guest IDs to be unique, using the same shared storage across multiple clusters would be impossible.

dietmar · Feb 17, 2021

illustris said:
That would be very useful. Although it wouldn't solve the issue of shared storage. Since each cluster expects guest IDs to be unique, using the same shared storage across multiple clusters would be impossible.

Sure. But a cluster with 32 nodes can host many VMs, so IMHO there is no real need to move VMs between clusters - at least not to achieve HA (You still can, but data needs to be copied).

xiaolin0199 · Feb 25, 2021

dietmar said:
The plan is to provide some GUI to manage multiple clusters, so there should be no need to setup such large corosync clusters.

Has this plan already begun? Will version 6.4 be released?

Sascha72036 · Apr 29, 2021

We have exactly the same problem with our 48 node cluster.
Some nodes start udp floods. As a result the 10G nics from other nodes entered in a blocking state.

We tried sctp, but thats not the solution.
Have you found a way to run corosync stably in large clusters?

illustris · Apr 29, 2021

Sascha72036 said:
We have exactly the same problem with our 48 node cluster.
Some nodes start udp floods. As a result the 10G nics from other nodes entered in a blocking state.

We tried sctp, but thats not the solution.
Have you found a way to run corosync stably in large clusters?

We did a lot of tests with various parameters in corosync.conf to stabilize the cluster. We eventually got to a point where a 52 node cluster would run without crashing for a couple of days. A 50 node cluster would run for much longer, as long as we're not actively making a lot of changes to pmxcfs. We're scaling the main cluster down now, splitting it into a federation of clusters grouped by generation. It seems stable enough at 48 nodes. I'm hoping the multi-cluster GUI that dietmar mentioned gets released soon. We're not giving up on larger proxmox clusters though, hopefully we'll eventually identify the bottleneck.

Also, make sure you're on the latest version of proxmox. Some older versions of corosync had an integer overflow bug that would cause issues if you cluster has 48 or more nodes.

esi_y · Sep 14, 2024

illustris said:
We did a lot of tests with various parameters in corosync.conf to stabilize the cluster. We eventually got to a point where a 52 node cluster would run without crashing for a couple of days. A 50 node cluster would run for much longer, as long as we're not actively making a lot of changes to pmxcfs.

This was all with knet only?

Search

Search

Alternative to corosync for large clusters

illustris

Member

dietmar

Proxmox Staff Member

illustris

Member

dietmar

Proxmox Staff Member

xiaolin0199

Well-Known Member

Sascha72036

Member

illustris

Member

esi_y

Renowned Member