Alternative to corosync for large clusters

illustris

Member
Sep 14, 2018
22
4
23
34
We've spent the past 6 months trying to make a 50 node PVE cluster stable. While we managed to identify and report multiple issues with corosync that cause instabilities, at this point it is still far from stable. The issue isn't just keeping the corosync cluster alive. Whenever Corosync has one of its "episodes", all the nodes in the cluster start flooding each other with UDP traffic. In other words, all the nodes in the cluster DDoS each other. If corosync and your VMs/LXCs share a NIC, it will make these guests unreachable. Sometimes it gets bad enough to the point that you can't even SSH in. Given this risk of catastrophic failure at large scales, is anyone at Proxmox looking into the viability of alternatives like using an external Zookeeper cluster for larger PVE clusters? The only prior mention of this I could find was this mailing list thread from 2016:

https://lists.proxmox.com/pipermail/pve-devel/2016-September/022909.html
 
The plan is to provide some GUI to manage multiple clusters, so there should be no need to setup such large corosync clusters.
 
  • Like
Reactions: ebiss
That would be very useful. Although it wouldn't solve the issue of shared storage. Since each cluster expects guest IDs to be unique, using the same shared storage across multiple clusters would be impossible.
 
That would be very useful. Although it wouldn't solve the issue of shared storage. Since each cluster expects guest IDs to be unique, using the same shared storage across multiple clusters would be impossible.
Sure. But a cluster with 32 nodes can host many VMs, so IMHO there is no real need to move VMs between clusters - at least not to achieve HA (You still can, but data needs to be copied).
 
We have exactly the same problem with our 48 node cluster.
Some nodes start udp floods. As a result the 10G nics from other nodes entered in a blocking state.

We tried sctp, but thats not the solution.
Have you found a way to run corosync stably in large clusters?
 
Last edited:
We have exactly the same problem with our 48 node cluster.
Some nodes start udp floods. As a result the 10G nics from other nodes entered in a blocking state.

We tried sctp, but thats not the solution.
Have you found a way to run corosync stably in large clusters?
We did a lot of tests with various parameters in corosync.conf to stabilize the cluster. We eventually got to a point where a 52 node cluster would run without crashing for a couple of days. A 50 node cluster would run for much longer, as long as we're not actively making a lot of changes to pmxcfs. We're scaling the main cluster down now, splitting it into a federation of clusters grouped by generation. It seems stable enough at 48 nodes. I'm hoping the multi-cluster GUI that dietmar mentioned gets released soon. We're not giving up on larger proxmox clusters though, hopefully we'll eventually identify the bottleneck.

Also, make sure you're on the latest version of proxmox. Some older versions of corosync had an integer overflow bug that would cause issues if you cluster has 48 or more nodes.
 
We did a lot of tests with various parameters in corosync.conf to stabilize the cluster. We eventually got to a point where a 52 node cluster would run without crashing for a couple of days. A 50 node cluster would run for much longer, as long as we're not actively making a lot of changes to pmxcfs.

This was all with knet only?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!