Proxmox + Ceph = 4 NICS?

gasmanc

New Member
Jul 29, 2019
11
0
1
42
Hi,

I’m in the planning stages of setting up a three node cluster. 2 of the nodes will have dual 10gbe and dual 1gbe NICs, and the other has a single 10gbe Nic and a dual 1gbe Nic.

Reading through the documentation and doing searches it seems that Proxmox wants 2 NICs - 1 for Corosync and one for everything else, and Ceph wants a public and a private network - so 4 NICs??

If I only want to use three NICs, could I use a 1gbe Nic for Corosync, a 10gbe Nic for Proxmox and ceph public network and the the last 10geb (1gbe for one of the nodes) for the ceph osd private network?

Are there any additional steps for sharing the Proxmox/ceph public network?

Thanks
 
Ceph can be split into a public and private network but does not have to. You can run Ceph on one NIC only quite fine. Do not try to run Ceph on a 1G NIC, even on one server. That won't make you happy.
 
What about separating Corosync network and Ceph osd network for rebalancing? How can I share the Ceph public network and Proxmox public network?
 
What about separating Corosync network and Ceph osd network for rebalancing? How can I share the Ceph public network and Proxmox public network?

Don't confuse the two networks for Ceph (cluster, public) with the corosync cluster network.

You can have both Ceph networks configured in the same network so they both use the one 10G NIC. In the config you would use the same IP for both networks.
Corosync on the other hand does not need a lot of bandwidth but wants low latency. For this don't configure you corosync network on the same physical network on which you run stuff that can demand a lot of the network like any storage traffic.

Run the Corosync on one of the 1G NICs. To have a fallback you can configure a second Link (ring in the config file) on the other 1G NIC (Section in the manual).

TL;DR:
Ceph Cluster and Public networks are both only related to Ceph, need lots of bandwidth and can be run on the same physical network -> 10G NIC
Corosync cluster network is not related to any Ceph networks, likes low latency -> 1G NIC with possible fallback on the second 1G NIC
 
  • Like
Reactions: El Tebe and gasmanc
Don't confuse the two networks for Ceph (cluster, public) with the corosync cluster network.

You can have both Ceph networks configured in the same network so they both use the one 10G NIC. In the config you would use the same IP for both networks.
Corosync on the other hand does not need a lot of bandwidth but wants low latency. For this don't configure you corosync network on the same physical network on which you run stuff that can demand a lot of the network like any storage traffic.

Run the Corosync on one of the 1G NICs. To have a fallback you can configure a second Link (ring in the config file) on the other 1G NIC (Section in the manual).

TL;DR:
Ceph Cluster and Public networks are both only related to Ceph, need lots of bandwidth and can be run on the same physical network -> 10G NIC
Corosync cluster network is not related to any Ceph networks, likes low latency -> 1G NIC with possible fallback on the second 1G NIC

We have only two 10Gbe NICs in all node. Is it a good idea to bond these and use for all network traffic (ceph+corosync+wan) with VLANs?
Theres no option to add more NICs :(
 
We have only two 10Gbe NICs in all node. Is it a good idea to bond these and use for all network traffic (ceph+corosync+wan) with VLANs?
Theres no option to add more NICs :(
Absolutely not :/
VLANs don't guarantee a certain bandwidth to the VLAN and traffic in one VLAN can still congest the whole physical NIC. You might be off okayish if you can configure a QOS rule in your network infrastructure that will always prioritize corosync traffic but even then I don't want to guarantee anything.
And if Ceph is sharing the bandwidth with the production traffic the performance will not be great.

If at all somehow possible add more NICs.
 
  • Like
Reactions: El Tebe
Absolutely not :/
VLANs don't guarantee a certain bandwidth to the VLAN and traffic in one VLAN can still congest the whole physical NIC. You might be off okayish if you can configure a QOS rule in your network infrastructure that will always prioritize corosync traffic but even then I don't want to guarantee anything.
And if Ceph is sharing the bandwidth with the production traffic the performance will not be great.

If at all somehow possible add more NICs.

Nope, its not possible because of the type of the blades (the whole system has internal networking).. :(
So, I have to use one 10Gb NIC for corosync (connected to switch #1) and the other one NIC for Ceph (connected to switch #2)
(single point of failure..).

And what about the "manager" network (for proxmox) and the "public internet"?
I have to use VLANS for separate these nets.. but, on which physical network?
 
Nope, its not possible because of the type of the blades (the whole system has internal networking).. :(
So, I have to use one 10Gb NIC for corosync (connected to switch #1) and the other one NIC for Ceph (connected to switch #2)
(single point of failure..).

And what about the "manager" network (for proxmox) and the "public internet"?
I have to use VLANS for separate these nets.. but, on which physical network?
Well, its simply the wrong hardware for a hyper-converged setup. And I don't recommend blades for this kind of setup.

But with that said, run VLANs on the single interfaces and bond the VLANs together. This way you can use active-backup with a bond-primary to separate the traffic onto the different switches. Ceph traffic -> switch1, Crosoync -> switch2. And if one link fails, the split traffic will land on the same link. While this is far from perfect, it might keep it going. Ah and don't use HA.
 
Well, its simply the wrong hardware for a hyper-converged setup. And I don't recommend blades for this kind of setup.

But with that said, run VLANs on the single interfaces and bond the VLANs together. This way you can use active-backup with a bond-primary to separate the traffic onto the different switches. Ceph traffic -> switch1, Crosoync -> switch2. And if one link fails, the split traffic will land on the same link. While this is far from perfect, it might keep it going. Ah and don't use HA.

Thank you.
Another approch: if I not use Ceph: is there any storage type/solution that works in HA?
We have 8 physical nodes.
 
Another approch: if I not use Ceph: is there any storage type/solution that works in HA?
The point is not HA or shared storage. It is its combination on the same network interface. Whatever you do, especially for a HA cluster, you will need to guarantee low and stable latency on the Corosync links. You will not (or very very hard) achieve this with shared NIC ports.
 
The point is not HA or shared storage. It is its combination on the same network interface. Whatever you do, especially for a HA cluster, you will need to guarantee low and stable latency on the Corosync links. You will not (or very very hard) achieve this with shared NIC ports.
Okay, now I got it.
Thank you for your patience and your accurate answers. ;)
cheers
 
  • Like
Reactions: Alwin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!