Ceph Network public and cluster some questions

When designing cluster architecture, the key considerations are not how many links, nor how much bandwidth. The key is to identify the disparate network traffic that is produced and consumed, define minimum requirements for each, and define what constitutes a critical fault.

faults can be described as physical layer failures (switch going out, cable pulled, etc) or logical (network contention leading to application timeout.)

Once you defined what are the minimum criteria for acceptance, you can begin designing the solution. For example:
1. corosync traffic is among the most sensitive, and consequential, in a proxmox cluster. To avoid a physical fault taking out your nodes, you would want to be sure to have more then one interface; to make sure no failure occurs due to contention you want to make sure that at least ONE of those interfaces are not party to any other traffic. These considerations are the same for any type of traffic.
2. ceph traffic can be very bursty. if you will use ceph over an interface with another type of traffic, all will be well until a rebalance commences. to accomodate for this, you should only share traffic you know that can be constrained at times. With only three nodes this isnt much of an issue, since no rebalances will ever take place, but if you have more nodes it will be. Moreover, depending on type of disk and count, it can consume all network bandwidth you give it.
3. "NAS" traffic isnt relevant in an of itself, unless you have minimum bandwidth requirements (eg, need to complete a full backup in an alloted timeframe.) a "slow" filer isnt going to break anything.
4. traffic separation and ACLs are important in any environment that utilizes proper security policies. This can be accomplished using vlans or physical link separations.

food for thought.
 
  • Like
Reactions: DaSilva and UdoB
When designing cluster architecture, the key considerations are not how many links, nor how much bandwidth. The key is to identify the disparate network traffic that is produced and consumed, define minimum requirements for each, and define what constitutes a critical fault.

faults can be described as physical layer failures (switch going out, cable pulled, etc) or logical (network contention leading to application timeout.)

Once you defined what are the minimum criteria for acceptance, you can begin designing the solution. For example:
1. corosync traffic is among the most sensitive, and consequential, in a proxmox cluster. To avoid a physical fault taking out your nodes, you would want to be sure to have more then one interface; to make sure no failure occurs due to contention you want to make sure that at least ONE of those interfaces are not party to any other traffic. These considerations are the same for any type of traffic.
2. ceph traffic can be very bursty. if you will use ceph over an interface with another type of traffic, all will be well until a rebalance commences. to accomodate for this, you should only share traffic you know that can be constrained at times. With only three nodes this isnt much of an issue, since no rebalances will ever take place, but if you have more nodes it will be. Moreover, depending on type of disk and count, it can consume all network bandwidth you give it.
3. "NAS" traffic isnt relevant in an of itself, unless you have minimum bandwidth requirements (eg, need to complete a full backup in an alloted timeframe.) a "slow" filer isnt going to break anything.
4. traffic separation and ACLs are important in any environment that utilizes proper security policies. This can be accomplished using vlans or physical link separations.

food for thought.
Thank you @alexskysilk