Is a 3-node Full Mesh Setup For Ceph and Corosync Good or Bad

x509

New Member
Jan 27, 2026
7
2
3
Hello PROXMOX forum!

I am working on an infrastructure setup for a small-medium business. We are confused about the networking topology and my research hasn't given me a great answer. We are looking at a 3-node setup with mesh networking for Ceph (internal and public networks) and corosync. We mostly want to go the mesh route so that we don't have to buy 25Gb switches. I have read the PROXMOX docs on mesh networking, hyperconverged, etc. but the implementation details and real-world data seems to be lacking.

Our current hardware plan:
  • Dell R7xx hosts.
  • Each with a single ~20-core CPU.
  • Each with 128GB RAM.
  • Each with 5x 1.92TB SSDs.
  • Each with 4x 25Gb ports.
  • Each with 4x 10Gb ports (Base-T, ethernet).
  • Each with 2x 1Gb ports.
The plan was to use:
  • 2x 25Gb ports for Ceph internal traffic (DAC Cables, Mesh: node A->B, B->C, C->A).
  • 2x 25Gb ports for Ceph public traffic (cabled as above). Possibly also VM migration traffic.
  • 2x 10Gb for VM traffic to LAN.
  • 2x 10Gb for management traffic (also for corosync backup).
  • 2x 1Gb for corosync (Mesh: node A->B, B->C, C->A).
We have talked to a few PROXMOX partners about what we want to do. One is advocating strictly against a mesh network topology. Another says mesh networking is fine. When I asked about why the one partner is against mesh, they said it is due to the PROXMOX node having to handling networking duties and issues occurring when the node goes down. They basically said mesh is one of the worst things you can do and it is only for homelab situations.

Can anyone help me with some real-world implementation details/results and thoughts?

Further info: We do not expect to outgrow a 3-node cluster, plus we can always upgrade these nodes since we are buying pretty low specced hardware. Our current infrastructure is a 2-node vSphere cluster running StarWind VSAN with 2x 25Gb ports direct connected between nodes. CPU, RAM, and storage is nearly the same as noted above, just jammed into 2 nodes instead of 3. About 25 VMs: file servers, domain controllers, database servers, application servers, web servers. CPU is typically <10%, RAM usage is typically ~50%, storage is ~50% used, networking utilization is low.
 
@x509 , welcome to the forum.

We don’t use Ceph or mesh networking in our deployments. That said, if you plan to engage a Proxmox Partner, I would recommend following their guidance, they will ultimately be responsible for supporting your infrastructure.

If this is a production environment for your business and you don’t already have hands-on experience with mesh networking, it would be easier to invest in a simple FS switch (ideally two). This allows you to focus on running the business rather than troubleshooting a complex network design.

And, if you are providing consultancy services to a customer, also use dedicated switches. It simplifies the architecture and puts you in a much stronger position to support and troubleshoot the environment effectively.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
One challenge with 3 nodes and 3/2 replication is that if you reboot or shut down one node you’re already at the minimum.
This has nothing to do with the pros and cons of a mesh network though. And there are still usecases, where three nodes are ok, because they can tolerate the loss of one node and consider the failure of a second node a tolerable risk.
 
  • Like
Reactions: bl1mp
Hi x509,
we used it in my former company and it worked flawless, but there are some points in sizing and handling you should be aware of.

In this setup, I would definitely emphasize to include monitoring and backup, because you run with the minimum level of redundancy. (As you always should)

If you already in contact with a Proxmox Partner ask them to explain, which potential issues they see in which meshed network setup.

We used the broadcast setup. This is not providing the best bandwith, but easy to configure and has low entrance barrier for Linux networking.

BR, Lucas
 
  • Like
Reactions: Johannes S
which potential issues they see in which meshed network setup.
The partner I spoke with kept reverting to the same argument against a mesh. Basically, they said that "using a mesh network is bad because then your proxmox node has to handle networking too instead of just focusing on serving VMs". I think their concern was centered around a node-down situation and networking data not routing properly and thus storage (Ceph) being affected. I get what they are concerned about, but I feel like if this was such a huge issue then mesh networking wouldn't really exist as a solution.
 
it would be easier to invest in a simple FS switch (ideally two)
Yes, that is our other option. We would have each node connect to two switches for redundancy, and separate some traffic with VLANs. I.e.: one port on each switch, for each node, for both Ceph internal and Ceph public networks. We really just don't want more hardware (25Gb switches) if we don't have to!
 
where three nodes are ok, because they can tolerate the loss of one node and consider the failure of a second node a tolerable risk.
One challenge with 3 nodes and 3/2 replication is that if you reboot or shut down one node you’re already at the minimum.

We understand and can accept the risk of a 3-node cluster and losing a single node. Our current cluster is 2 nodes, and can function on a single node, so the risk profile is about the same. I like to run 5 nodes to have more fault tolerance, but I the budget is there even if we were to lower the specs on the nodes a bit to have overall the same total capacity.
 
  • Like
Reactions: Johannes S
The partner I spoke with kept reverting to the same argument against a mesh. Basically, they said that "using a mesh network is bad because then your proxmox node has to handle networking too instead of just focusing on serving VMs". I think their concern was centered around a node-down situation and networking data not routing properly and thus storage (Ceph) being affected. I get what they are concerned about, but I feel like if this was such a huge issue then mesh networking wouldn't really exist as a solution.

Yes, but that might only happen with a routed setup, not in a broadcast setup, and should be tested before using it productive.

Anyways FS switche or MikroTik (for low budget) or any other switches you have configuration experience with might work.

Anyways with 3 nodes you still run on the minimum baseline of redundancy, so keep monitoring an backup in mind.
 
  • Like
Reactions: Johannes S
The partner I spoke with kept reverting to the same argument against a mesh. Basically, they said that "using a mesh network is bad because then your proxmox node has to handle networking too instead of just focusing on serving VMs". I think their concern was centered around a node-down situation and networking data not routing properly and thus storage (Ceph) being affected
In a Node Down scenario, you are left with two nodes that can still communicate directly with each other, since this is a mesh network. Additional load only becomes a factor in a Link Down scenario. In that case, depending on the topology and forwarding method, one node may need to relay traffic to an isolated peer.

In a stable environment with quality hardware, a Node Down event is generally more likely than a Link Down event, simply because there are more components involved that can fail at the node level.

Ultimately, this may come down to operational considerations for the Proxmox Partner: a switched topology is more common and familiar to most teams, and a partner may simply be unwilling to take on the one-off support or operational complexity associated with a less conventional design.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
While it's true that 3-nodes is the bare minimum for Ceph, losing a node and depending on the other 2 to pick up the slack workload will make me nervous. For best practices, start with 5-nodes. With Ceph, more nodes/OSDs = more IOPS.

As been said, better have good backup and restore procedures.

I do setup 3-node Ceph clusters using full-mesh broadcast setup per https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Broadcast_Setup

This eliminates using a switch. Mind you, you can never expand this cluster.

All Ceph public, private, and Corosync traffic use this network. To make sure this network traffic never gets routed, I use the IPv4 link-local network address of 169.254.1.0/24.

In addition, I set the Datacenter migration option to use this network and set the migration type=insecure for faster migrations.
 
Last edited:
  • Like
Reactions: Johannes S and UdoB
Yes, that is our other option. We would have each node connect to two switches for redundancy, and separate some traffic with VLANs. I.e.: one port on each switch, for each node, for both Ceph internal and Ceph public networks. We really just don't want more hardware (25Gb switches) if we don't have to!
I personally believe a 25Gb mesh network is just a matter of time when you gonna have some problems. I can't imagine you are going to build a production cluster with no LAN switches for all mentioned traffic. I am also not sure you just don't want to have any more hardware in your installation or you got a tight budget? You have mentioned 25Gb interfaces - 25Gb LAN cards and SFP28/DAC cables can cost quite a lot if you don't find a way to use some relatively cheap components (non DELL). I would rather got for a pair of some reliable 10Gb switches (Cisco Nexus) and trunks of multiple 10Gb links. I know that let's say 4x10Gb is not the same as a single 40Gb link in some cases (i.e. single point to point data stream), but you need a reliable network specially for any disk traffic. By the way I don't use any vsan or other distributed RAID solutions for running VMs and I am familiar with SAN/enterprise storage solutions - I am simply trying to map all our VMware related experience (100+ hosts) to Proxmox world (not necessarily the same scale, but similar hardware design).
 
For best practices, start with 5-nodes.
I would love to get 5 nodes, but the budget is not there. Furthermore, the powers-that-be already see us running off of 2 nodes so telling them "we need 5 nodes now even though demand is flat" is going to be a tough conversation. The underlying budgeting problem is that there is still a minimum amount of cores/memory/ssds that are required since Ceph gobbles up a bunch on each host.
 
I can't imagine you are going to build a production cluster with no LAN switches for all mentioned traffic.
We do have 10Gb LAN switches (Aruba), we are just trying (thinking through) keeping our storage traffic off of these switches, hence a mesh for Ceph traffic. We would use the LAN switches for VM traffic, management traffic, possibly corosync, possibly backup links.

I personally believe a 25Gb mesh network is just a matter of time when you gonna have some problems.
This is what I am trying to understand, why would a mesh network be a problem eventually?
 
Ceph clusters using a full-mesh setup are supported, please take a look at [1] for more information.

Note, however, that we highly discourage to use the same network for corosync and Ceph, the recommendation is to have at least one dedicated network for Corosync (it can be 1G, it should be more than enough). See [2] for more information about cluster requirements.

[1] https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
[2] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_cluster_requirements
 
  • Like
Reactions: Johannes S
This is what I am trying to understand, why would a mesh network be a problem eventually?
I have never seen such ethernet mesh network in practice, so I guess this may be just a workaround when you don't have switches or you spent all budget for servers and disks. The mesh is supported, maybe that even works, but why do we use switches for years if mesh is enough and so good?
I am sure in case you experience some serious problems on production systems in VMs, it is gonna be much better when you can get some statistics from particular ports of switch (pair of switches, because of redundancy) and any rough network traffic will not spread through all interfaces around. In case of server failure or your cluster expand it is also much easier. Less cables, less chance to mess something. I got 200+ physical servers, 90% as a blades - can you imagine how much less cable mess do we have that way? How many more points to eventually monitor, control and shape network traffic do we have and how helpful that can be?

Anyway you question was like a mental quest for me - I like it! :-)

We do have 10Gb LAN switches (Aruba)
I don't know which model do you have, but I would go for some more switches, dedicated for a disk traffic. Maybe the same model you already have, maybe not necessarily new devices (we buy a lot of these as refurbished) as Aruba seems to be very reliable and there are firmware updates available from Aruba support website.
 
  • Like
Reactions: Johannes S
The partner I spoke with kept reverting to the same argument against a mesh. Basically, they said that "using a mesh network is bad because then your proxmox node has to handle networking too instead of just focusing on serving VMs". I think their concern was centered around a node-down situation and networking data not routing properly and thus storage (Ceph) being affected. I get what they are concerned about, but I feel like if this was such a huge issue then mesh networking wouldn't really exist as a solution.
Don't want to start an argument here, but whoever told you that has little idea what is a PVE Ceph mesh cluster. Linux kernel routing may use like 0'1% of CPU and FRR may use like 3% CPU while converging or during node boot for a few seconds. If we follow the same reasoning, hyper converged clusters would have never been invented nor we should never ever use Ceph in out PVE nodes ;) (and Ceph can eat quite a chunk of your CPU and memory), not to mention implementing SDN on PVE.

Have a few production PVE+Ceph clusters using a custom FRR with fallback setup, using corosync both inside the mesh and the outside nics. Zero issues at all for years. Usually deploy them when customer has a tight budget and they require Ceph on 25G+ (or plan to add more nodes in the foreseeable future). If 10G network is enough, we just buy switches as they are affordable enough and simplify adding/replacing nodes.

In short: mesh is a perfectly valid production level solution if properly implemented.

I personally believe a 25Gb mesh network is just a matter of time when you gonna have some problems.
Simply speaking, the only thing FRR does is monitor each node neighbors and send/receive routes that get inserted in the kernel routing table. Network traffic is managed by kernel itself.

I am sure in case you experience some serious problems on production systems in VMs, it is gonna be much better when you can get some statistics from particular ports of switch (pair of switches, because of redundancy) and any rough network traffic will not spread through all interfaces around.
VM traffic does not flow through the mesh, only Ceph and Corosync (if configured) does. VM traffic will flow via another pair of nic ports connected to hardware switches that provide external conectivity.