redundancy for CEPH and SDN

wickeren

Renowned Member
Sep 16, 2010
12
0
66
Have a Proxmox cluster with Enterprise subscription. Dedicated interfaces for corosync (1GB), SDN (10GB) and CEPH (25GB) om separate switches.

Have encountered a connection issue on one of the nodes to the CEPH switch, causing issues on the VM's running on that node. As a want to prevent this from happening again, I want to have some sort of redundancy. For corsosync this was easy and already done, setting up multiple interfaces and priorities. But how to do this for CEPH and the SDN defined VXLANs?

What I want to achieve:

CEPH using 25GB NIC primary, 10 GB as backup/failover
SDN using 10GB NIC primary, 25 GB as backup/failover

But I'm not sure how to do this...

Standard Linux bond, looks impossible to me? Fabric maybe, but what kind? Do I need EVPN? Do I need to interconnect the switches? Need VLANs on it? The SDN part is still quite new and not too much examples and documentation on this specific subject are available, at least not that I'm aware of.
Anyone has done something similar? Any best practices?
 
Last edited:
Hi @wickeren

first and foremost, please avoid double posting [1] and close the other thread.

To your question:
The easiest way to get redundancy in these scenarios would be to use MC-LAG capable switches and bond the interfaces across switches. This of course comes at the cost of money.

Another option that comes to mind would be to use adaptive load balancing (balance-alb in PVE) [2] and bond a 10 G and a 25 G interface each.
This would change the network topology significantly and might come with additional side effects.

Disclaimer, i am not a network engineer and don't know the best solution in this case, but rather give my personal best guess.

Best regards
Jonas

[1] https://forum.proxmox.com/threads/ceph-and-sdn-redunancy-in-proxmox-cluster.184743/
[2] https://www.kernel.org/doc/Documentation/networking/bonding.txt#:~:text=balance-alb or 6