Getting confused with different networks for Ceph and proxmox

topquark

New Member
Sep 25, 2018
18
3
3
34
I'm reading up on using proxmox and Ceph in a HA cluster. One thing I'm getting confused about is the different networks, their bandwidth requirements and traffic that runs trough them, as well as which are nice to have seperated and which are usualy combined in one nic. Most manual's and threads I find talk about the network's of proxmox and ceph seperately. But I'm having a tough time understanding what their purpouses are in an intergrated enviroment and where they overlap. Thinking of setting up a 3 node proxmox+ceph HA cluster.

As far as I understand:
Ceph needs:
Heartbeat - Is this the same as corosync? Or is this different? should be seperated?
Private - Used for Ceph OSD syncing. Proxmox VM's don't have acces to this? Preferably 10GbE
Public - Where proxmox VM's are accessing their local storage. Is this used at all if VM's are on the same servers as the OSD's? Is this also the LAN? Is this also where proxmox's UI is? What bandwidth should this need?

Proxmox Cluster:
Corosync: Low latency, low bandwidth, Preferably redundant.
LAN with vlans: "normal" LAN with VLAN's for seperation of certain VM's. Is this the same Proxmox VM's talk to eachother here? High bandwith?
proxmox UI: Probally on the LAN with VLANS, on the management VLAN?

So would this mean 5-6 NIC's in the ideal scenario? and douled for redundancy? Seems like a bit much.
Sorry if it sounds a bit confusing, but I'm getting confused with all similar but different terms. If I'm reading ceph documentation I'm left wondering how does this integrate to proxmox and vice versa
 
Heartbeat - Is this the same as corosync? Or is this different? should be seperated?

The same as in similar, yes, but both are different services not connected to each other in anyway.

Private - Used for Ceph OSD syncing. Proxmox VM's don't have acces to this? Preferably 10GbE
Exactly, OSD traffic (syncing, re-balancing, ..) is done over those.
Public - Where proxmox VM's are accessing their local storage. Is this used at all if VM's are on the same servers as the OSD's? Is this also the LAN? Is this also where proxmox's UI is? What bandwidth should this need?
* used for local VMs: yes, as their objects can be anywhere in the ceph server (you can migrate VMs) but if objects are local then ceph tries to fetch those from the local OSDs directly.

* It can be LAN it can be WAN, that depends on what you configure it to be.

* Is this also where proxmox's UI: the PVE UI normally listens on all networks, so it's reachable over all of those, but the default network for VM or Storage migration traffic depends on the IP you used during the installation as "main IP", you can check cat /etc/pve/.members to see the IPs from all nodes. It's the IP your node name resolves too.
Note that you can change the migration network too, i.e., the one live migration data of VMs is send over :)


So would this mean 5-6 NIC's in the ideal scenario? and douled for redundancy? Seems like a bit much

So, personally I'd say the following is a real good base (network) setup:

* one NIC connected to WAN which allows to let VMs talk to the internet (if desired) and/or their users 1GBe or 10GBe depends on what you want to send over this
* one NIC with 10GBe or faster for Ceph private traffic (and maybe migration too, depending on how much bandwidth is available) - here you could also use one NIC with two ports, to allow full Mesh in a three node cluster, so you do not need a 10GBe or 40 GBe switch
* a 100MBe to 1GBe for corosync and corosync only, the other networks can be added as additional links (preferably with lower priority) as fallback, but the one where storage traffic happens should not be used.

So three NICs are normally really OK. If less are OK or more should be used depends totally on your setup and what you want to do with that..
 
Whoa, thanks for the reply clears up a lot. I'm still left with a few questions though, largely on the difference between CEPH public and proxmox or normal LAN. So let me rephrase what you said in my own words to make sure I'm not misunderstanding things:

*Heartbeat is purely a CEPH thing, and by default is combined on the CEPH cluster (aka private) network. [Q: If heartbeat is similar to corosync why not use the same interface for low latency purpouses? sticking such a thing on the buissy 10Gbe private network a bad idea? why (not)?]

*Public is used by VM's to access the CEPH storage, but the VM's first look at their own physical proxmox machine for the data befor going over the network. [Q: Does this mean that for a "default" setup of 3 nodes, 3 replications, and 2 min replica's, it should not be used at all? while for a more conservative 3 nodes 2 replications and 1 min replica, it will need to be the high bandwidth for Fast-ish acess to vm's virtual drives?]

*Corosync. Own network, preferably 2 rings low bandwidth.

*LAN+Proxmox UI: This is not really a network, as it can be set for each VM. But I'm putting it here so I can describe it: Basicly does everything that isn't Corosync or related to CEPH. With All VLAN's you'd normaly have: Management vlan for PVE; VM's serving the normal LAN pc's; VM's serving eachother. I'm guissing that I'd like this to have 10Gbe too. [Q: Is this network trafic usually shared with the public CEPH network NIC, or is the public shared with the cluster/private CEPH network? If this is shared with public, does it share the same IP/VLAN][Q: With main IP probally being on the management VLAN within this traffic, live migration will be on this network too. Is that the right location for this traffic?]

Thanks in advance for the help.
 
I know this goes against the "recommended" setup from ceph, but i did all mine over a single 10G link.

I just put corosync, Lan networking for VM's, and Ceph private networks on different VLAN's. Ceph public is in the same network as my LAN so that VM's can connect to it easily without a second virtual network adapter.

The recommendations are kind of an ideal situation, however i've had no issues running it on a single 10G trunk link with just vlans separating the traffic. I'm currently running this professionally as well in a small business environment.

Obviously this may change based on your throughput needs for VM's and the Ceph synchronizing, however we have many small nodes, so this was more than adequate for our performance needs.
 
  • Like
Reactions: El Tebe
I just put corosync, Lan networking for VM's, and Ceph private networks on different VLAN's. Ceph public is in the same network as my LAN so that VM's can connect to it easily without a second virtual network adapter.

Yeah, that this is not "recommended" for normal setup is one thing, but topquark explicitly mentioned "HA Cluster" which such a setup is for sure not.
That's a single point of failure, one to long hickup, failed Switch upgrade, ... and the whole cluster gets fenced or Ceph rebalacing kills off the VM/corosync traffic - maybe all that happens during backup time (such bad things always like to do) and even more load is on the single link with all those components on them..

however we have many small nodes, so this was more than adequate for our performance needs.

Great if a single 10G link is enough for all of this, I hope you have bandwith limit on all Ceph, Backup, .. operations to ensure some left over capacity is there for the cluster network in high(er) loads.

And that smaller cluster can be made easier is IMO a bit of a misconception. For big clusters you can take more trade-offs, there lot's off other nodes which can take up the load and redundancy. If you run a, for example, three node cluster then some network redundancy and design so that the two possible left-over nodes can take the load of a failed one on-top of their own base load is required.

I'd really suggest to go at least with two networks, one for ceph and maybe other storage-IO like load and one for the rest, if three are not an option, which sometimes just is the case.
 
  • Like
Reactions: El Tebe
We're definitely working on expanding the network to be more redundant, but the big step was getting into this configuration and then having the ability to expand out.

Right now we're running 40 nodes across 6 switches, with 2 OSD per host. We plan to upgrade the network to a full dual ring configuration with 2x 10G links per host to separate rings each, and run corosync and management interfaces etc. on the remaining 2x 1G Ethernet. We accepted the tradeoffs of potential failure domains in order to get this up and running so we had the headroom (operationally in our environment) to "roll over"rather than "cut over". For now, our risks are low enough to work with, and the performance is excellent.

My point to OP was not that this is the best way to do it, but rather it's just one way to get into Ceph hyperconvergance. Also that the requirements in the documentation are not explicitly hard rules, but rather a best practice to have unmatched resiliency.

If one is aware and accepting of the potential risks and increased failure domain sizes, the barrier to entry can be lowered. My reasoning for sharing is it sounds like OP is just getting into this, maybe at home or in a test cluster at work, and pointing out that you CAN do it with less would be a first step into getting a workable cluster stood up, and then later working it into a proper fully redundant setup.
 
It's indeed true that I'm just getting into HA, and that it's just for a home setup. But I want to do it properly. Including redundant switches and interfaces. And I also want to really understand what I'm doing, instead of just following a guide and having it "just work". On my hardware the current vision is to have 3 redundant interfaces:
2x 10Gbe with 1Gbe fallback
1x 1Gbe with 1Gbe fallback

The 1Gbe with 1Gbe fallback will be set as corosync only (actually two corosync rings, not active/backup)
One 10Gbe with 1Gbe fallback will be Ceph private
One 10Gbe with 1Gbe fallback will be LAN access
Not sure where I would want the CEPH public network on. I think the LAN would be the most logical choise. Though I wouldn't want VM's access to their hard drive to throttle when other vm's are being hit/hitting the LAN hard. Same story would hold though when rebalancing/restore is going on on the private network. That said, I think the lan interface would be most logical.

That basically the setup I'm trying to tackle for now. That said, I'm still looking for answers on the behind the scenes workings/design:
*If heartbeat is similar to corosync why not use the same interface for low latency purpouses? sticking such a thing on the buissy 10Gbe private network a bad idea? why (not)?
*Does the design of the ceph public network mean that for a "default" setup of 3 nodes, 3 replications, and 2 min replica's, it should not be used at all? while for a more conservative 3 nodes 2 replications and 1 min replica, it will need to be the high bandwidth for Fast-ish acess to vm's virtual drives?
 
so in a perfect setup one would need

1 network for ceph public
1 network for ceph private (ceph cluster)
1 or 2 networks for proxmox cluster (corosync)
1 network for VM access and management (could use this as a 2nd ring for proxmox cluster)

then for each network a set of two switches for redundancy or one set vlaned out to accomodate the above.

i plan on building ceph setup of 5 nodes with proxmox corosync in mesh over 1Gb QUAD NIC, ceph pub/priv in mesh over 25Gb QUAD NIC and VM access (also used as 2nd ring for proxmox coro) over 25Gb NICconnected to redundant 25Gb switches. We don't see ourselves expanding past 5 nodes anytime soon and opted for mesh setup. Is it reasonable to have ceph pub and priv on one network (mesh). i'm hoping that having direct connections over 25Gb shouldn't cause any issues during replication/re balancing.
 
1 or 2 networks for proxmox cluster (corosync)

You can now use up to eight links with different priority, I'd suggest one dedicated cluster network, I mean some traffic can happen on that but no IO traffic (ceph, nfs, ...). Better to have a 100 MBps network dedicated for cluster traffic than 40GBps shared with ceph private traffic, latency matters for cluster traffic not bandwidth. Then I'd add all other networks as fallback, the more IO traffic there's on the lower the priority.

i plan on building ceph setup of 5 nodes with proxmox corosync in mesh over 1Gb QUAD NIC, ceph pub/priv in mesh over 25Gb QUAD NIC and VM access (also used as 2nd ring for proxmox coro) over 25Gb NICconnected to redundant 25Gb switches. We don't see ourselves expanding past 5 nodes anytime soon and opted for mesh setup. Is it reasonable to have ceph pub and priv on one network (mesh). i'm hoping that having direct connections over 25Gb shouldn't cause any issues during replication/re balancing.

Sounds OK but depends a lot on the rest of the HW, e.g., how much data the OSD disks can move around. If you have a lot of SSDs or even fast NVMes per node you could still have issues, think that each object needs to get written three times for redundancy.

But IMO that network setups is on the higher end of the performance spectrum.
 
  • Like
Reactions: herzkerl and yaboc
* one NIC with 10GBe or faster for Ceph private traffic (and maybe migration too, depending on how much bandwidth is available) - here you could also use one NIC with two ports, to allow full Mesh in a three node cluster, so you do not need a 10GBe or 40 GBe switch
I know this is an old topic but you mean there having one nic for private an public traffic of ceph? Do you keep each in their own subnet?
 
I know this is an old topic but you mean there having one nic for private an public traffic of ceph? Do you keep each in their own subnet?
You can do this for flexibility. With enough bandwidth Ceph's public & cluster network can share the same NIC port and just be separated by VLAN. To change the cluster network of Ceph is usually no issue, but the public network is a different story.
 
  • Like
Reactions: benoitc
Hi all,
I am trying to install Proxmox cluster with a network switch and 3 nodes that each one has 4 ethernet cards/4 interfaces/8 ports. I created 3 separate networks as one has internet connection(1 port), one to manage corosync(3 ports binding) and one for ceph(4 ports binding). I am stuck on ceph management phase and couldn't find any document how to implement ceph organisation.
Anyone to show the road ? There is one screen shot of the network implementation attached. And is there anything wrong about network configuration?

Best regards,
Bahadır
 

Attachments

  • Screenshot from 2021-06-25 01-04-44.png
    Screenshot from 2021-06-25 01-04-44.png
    57.7 KB · Views: 66

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!