Anyone using 10gbe mesh/ring network for Ceph?

gkovacs

Renowned Member
Dec 22, 2008
516
51
93
Budapest, Hungary
So there is a howto on the wiki that details the setup of a 10 Gbit/s Ethernet network without using a network switch:
http://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

If I understand correctly, you would need a two port 10 Gbe NIC (or two NICs) in each of your nodes, you connect each network port to two different adjacent nodes (thereby connecting all nodes in a circle). The wiki article recommends it for a 3 node cluster, but states that it should work the same for a 5 node cluster.

Then according to Method 1, you simply set up a broadcast mode bond on each of your nodes between your two ports, which would mean all traffic to / from your network would eventually propagate to all nodes. (Or according to Method 2, you set up routes for all destinations).

So in case of a 5 node cluster, if your packet is lucky then it would reach its destination in one hop, if unlucky then 2 hops.

My questions:
- Has anyone tried to build the successfully?
- If yes, how many nodes, and which method?
- What kind of NICs and cabling have you used?
- Is the performance good for Ceph?
- Is there a performance difference between Method 1 (broadcast bonds) and Method 2 (up/down routing)?
 
you misunderstood the article. a full mesh means one link to every other node - the article explicitly states that you need "n-1" ports per node, where "n" is the number of nodes. so a 5 node cluster that is "fully" connected needs 4 ports on each node, 1 for connecting to every other node. what you are describing is a "ring" network topology (a dual ring to be exact). a ring is vastly inferior to a (full) mesh, except for the number of ports and cables involved.
 
  • Like
Reactions: gkovacs
you misunderstood the article. a full mesh means one link to every other node - the article explicitly states that you need "n-1" ports per node, where "n" is the number of nodes. so a 5 node cluster that is "fully" connected needs 4 ports on each node, 1 for connecting to every other node. what you are describing is a "ring" network topology (a dual ring to be exact). a ring is vastly inferior to a (full) mesh, except for the number of ports and cables involved.

Ok, thanks for clearing that up. So let's say I want to build a dual ring topology, because my test cluster consists of 5 nodes and connecting every node to every other node would be unpractical (cabling and available PCIe slots), and also not much cheaper that switched. Also, with 5 nodes a dual ring should be similar in performance compared to a switched one.

- Will the dual ring topology work with the same broadcast bond method?

- Do I need to manually prevent loops in this setup somehow?
- Can I build this with Linux bonds, or would I need OpenVSwitch for this?
 
Last edited: