Mixed Proxmox + Ceph networking model

MMartinez

Renowned Member
Dec 11, 2014
48
6
73
Hi,

We've got a Proxmox cluster using FreeNAS for sharing storage.

Most of the nodes have 2x1Gb NIC, and one have 2x1Gb+2x10GB NIC. We use a primary NAS that shares iSCSI resources (directly attached to some VM from proxmox) and NFS as a KVM storage (disk images), and a secondary one for backups and replicas. Both FreeNAS servers have 2x10Gb NIC.

We use a LACP bond (2x1GB or 2x10Gb) on all the servers, connected to a 2-stack Netgear 3300 switch. We use mainly two VLAN on the Cluster: one (untagged) for Storage and Corosync and another one (tagged) for virtualmachines.

Now we want to improve our cluster adding 4 proxmox+ceph nodes, and the question is which would be the recommended network model? The new nodes have 2x1Gb NIC + 2x10Gb NIC.

I've already done a test creating a new cluster with 3 nodes. Using one 1Gb NIC for corosync and VM and one 10Gb NIC for Ceph.

And now, I want to add the new proxmox+ceph nodes to our production cluster. So we will have:
  • 8 Proxmox nodes
  • 4 Proxmox+Ceph nodes
  • FreeNAS storage
I'm thinking in different schemas:

1) To keep the network model simple, just define a different network using one 10Gb NIC to handle Ceph traffic (private), and to use our current storage+corosync network as public Ceph network. By doing this I would keep the LACP bond on the current servers and I'll be able to access to Ceph's storage from all the nodes without defining more VLAN.

2) Dedicate both 10Gb NIC to public/private Ceph network (new ceph vlan) on the Proxmox+ceph nodes, and define a subinterface to this vlan on the other Proxmox nodes to allow direct access to the ceph storage.

3) In fact, I am wondering if, in a small ceph enviroment (of 4 nodes and 1TB ceph): The Ceph private/public network is really necessary? May I use only one 10Gb network as a Ceph+corosync +current storage network?

I would like to receive suggestions from you. Before writing to the forum, I've been searching info about the recomended network model in the wiki and the forum and I've found similar questions on this forum (like https://forums.servethehome.com/ind...network-model-with-multiple-interfaces.11573/ or https://forum.proxmox.com/threads/proxmox-ceph-network-configuration.37595/) but I still have some doubts.

Regards,

Manuel Martínez
 
  • Like
Reactions: El Tebe
Very important in all of this, you need to either separate your corosync network to a dedicated link and/or use two rings. As with a growing cluster, your network traffic will also grow and interfere with the corosync traffic. Then your cluster will become unstable.

For a network setup, think about roughly the following:
  • Ceph should use 10GbE, the public and cluster network don't need to be split.
  • Corosync with separate network and better two rings (different interfaces)
  • FreeNAS Storage accesses on its own network (optional 10GbE)
  • Client traffic on its own network (better manageability and security)
Proxmox Ceph docs: https://pve.proxmox.com/pve-docs/chapter-pveceph.html
Cluster network: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network
General networking: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_network_configuration
 
Thanks Alwin,

To avoid other traffic to interfere with corosync, and keep using just 2x1Gb bond in most of my nodes, would help to define a dedicated VLAN to corosync? I would define also other VLAN for Ceph and FreeNAS (and of course for client traffic)

Without changing the 2x1Gb NIC, what would you recommend? One bond and VLAN on it or one NIC for corosync and the other one for everything else?

I believe that you are suggesting the last option (dedicated NIC for corosync) but please confirm.

Best regards,

Manuel Martínez
 
Last edited:
Hi again,

I've been thinking a while, reading and doing some tests.

I understand that is better to have independent NIC for every kind of traffic and we will probably end adding more NIC to our servers but I would like to share one idea and get your opinion from you.

I've read that it is posible to set the QoS VLAN priority directly on Linux (for example https://www.kuncar.net/blog/2018/us...rface-with-802-1p-cos-priority-bits-set/2014/) so I believe that, setting a higher priority to a corosync specific vlan on every node, it will make posible to ensure the right corosync timing in the case that the network is saturated as the nodes should priorize the outgoing traffic from corosync vlan.

This would make posible to use the fault tolerant schema with a two-NIC bond with VLANs for corosync, storage, ceph, vm... rather than one port for corosync and one for the rest of the traffic.

I've tried to install the debian "vlan" package in one test proxmox node, and set the egress qos priority using vconfig and looks good.

What do you think?

Kind regards,

Manuel Martínez
 
Last edited:
I do not recommend QoS for cluster setups, as they influence the latency to the contrary. A dedicated NIC port for corosync traffic and two rings, to be save. In a degraded state, resources are squeezed together and reliability is even more important then.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!