Traffic on cluster network

Alessandro 123

Well-Known Member
May 22, 2016
653
24
58
40
Hi,
which kind of traffic is made on the cluster network? Only the corosync/peacemaker management traffic?

Could I use a VLAN on the public (redundant) network, maybe with an high priority QoS set on it?

Official docs suggest to use two isolated network connections, this means at least 6 network cards for each server (two for the public network, two for the cluster network, two for the storage network)

Some questions:

1) what do you mean with isolated? Can I set a bonded interface for the cluster network pointing to two different switches that has the intra-switch-link ? These are not isolated by each other (are only isolated from the other networks) but still highly-available with automatic failover and no SPOF

2) you wrote gigabit network. Which kind of traffic is routed there?

3) when migrating a VM between hosts, the RAM dirty pages are sent through which interface? The cluster one ? If yes, this makes mandatory to use at least a gigabit connection but could break the corosync packaget on heavy load.
 
Is possible to have and advice here? I'm planning a new cluster and I have to design the network gear.
Currently, If I understood properly, I need at least 4 redundant network:

1 for public (gigabit)
1 for private (used by customers to reach their own servers without routing through the public network) (gigabit)
1 for corosync cluster (gigabit)
1 for gluster storage (10 gigabit)
1 for IPMI/out-of-band (100mbit)

all must be redundant, so 2 ports for each network except the IPMI (9 ports in each server) and 9 top-of-rack switches in each rack!

A huge network and a mess with cabling. Having the ability to create VLANs (with priority QoS) over the private network to use with corosync would be great.

I can put the IPMI network in a VLAN over one of the corosync switches, in this case I'll reduce by 1 the number of needed switches as long as I put a gigabit switch also for IPMI. 12 ports for corosync, 12 ports for IPMI. On the other switch, only the first 12 ports are used by corosync (there is no redundant IPMI):

2 switches for public
2 switches for private
2 switches for corosync + IPMI
2 switches for gluster

8 switches.

But 8 switches are still too much. Best way would be to share the private network with corosync:

2 switches for public
2 switches for private+corosync
2 switches for gluster
1 switch for ipmi

7 switches.
 
Hi,

We can only tell you what work and not what may be work or in 8 of 10 cases work.

The point if your cluster network (default used by live migration Ram data and corosync), have not enough power (latence and speed), it will fail.
The result is a reboot of the node, corosync re-transmits, live migration fails, storage problems, VM freezing, etc.

Isolated mean the network should not irritate each other.

2 switches are enough for failover.

and yes 8 NIC's are requested if you like to have private network.
 
8 nics..... :-(
This is a huge mess. A big "spaghettata"
iStock_000020323398_Medium.jpg
 
Really, I don't know how to put 9 networks (and 9 cables) on each server.
32 servers supported by proxmox in a single cluster, means 32*9 = 288 cables only for proxmox servers, plus the cables needed by gluster servers

Any hint?
 
Why 9?

I say 8 Nics and the 2 Nics for the private network you can do with VLan or skip it, because there is no real benefit beside the more performance.

So you now have only 6 left.
 
9 because there is also the dedicated ipmi port

The private network is needed because it's isolated from the public network and we don't charge for traffic made over it to our customers. I can't use a vlan because I'm using different switches and also to avoid that a DoS on the public interface could also bring down the private network
 
The absolutely bare minimum for a working cluster is two interfaces, physical or virtual. obviously, a cluster with only one physical connection (even 10gbe) is not suitable for production use.

For production use, you need:
1. redundant, LOW LATENCY interconnect for cluster traffic. 2x 10gbe should do it. bonded 1gb connections doesnt work for this as bonding does not reduce latency.
2. redundant, LOW LATENCY interconnect for storage cluster traffic. 2x 10gbe should do it. bonded 1gb connections doesnt work for this as bonding does not reduce latency.
3. redundant service network traffic, preferably via multiple switches. 2x 10gbe will do nicely, but its possible to have multiple 1gb links.

notes:
- It is possible to mix the cluster networks using vlans. corosync traffic on one vlan bond active on eth0, the other on eth1- but this should only be considered if there is a physical limitation to network ports as a loss of a link will force both networks to share a single link and potentially interfere with each other.
- Private networks can be handled with vlans. there is almost no compelling reason to do it with physical links.

So, yes, 7 links including IPMI.
 
Wait, are you saying that for cluster network I need redundant 10GbE ? This is absolutely overkill, for just routing a "ping".
Corosync is mostly an heartbeat packet like a "ping", there is no need for a redundant 10GbE.
Latency between 10GbE and 1GbE, both on copper, should be mostly the same. You can reduce latency by using fiber, but that's absolutely overkill for a cluster network.

Cluster is obviously redundant and on bonded 10GbE network.

Public network is on redundant 1GbE network.

I can't use VLANs for private network over public network, i've already told why: in case of DDoS on the public network, also the private will go down. Additionally, our datacenter has a dedicated private network.
 
Wait, are you saying that for cluster network I need redundant 10GbE ? This is absolutely overkill, for just routing a "ping".

shrug. this all depends on what your risk tolerance is. In production, its NEVER a good idea to depend on a single link for cluster traffic. The consequences or a link disruption can be benign (single node getting fenced off) to catastrophic (loss of quorum.) Having multiple links is cheap insurance. As for speed- The point of a 10gbe connection for cluster heartbeat has nothing to do with its ability to carry 10gbit of data per second; its that the latence of 10gbit is 10 times faster then 1 gbit. a high latency heartbeat network can be just as deadly as signal loss; again, its cheap insurance.

my time at 2am on a sunday is more valuable then a couple of network ports. yours may not be.
 
It's not a couple of ports. Are at least 2 ports for EACH server. By using 15 servers in a cluster, means 30 ports. 30 ports means two 10GbE switches that are far more expensive than a 1GbE

Honestly, I've never ever seen issue with heartbeat/corosync running of 100mbit, in my case i'll run on bonded gigabit connection.
It's the first time that I ear that corosync need 10GbE for the latency. Corosync works in terms of seconds, not microseconds.
5 microseconds or 50 microseconds doesn't make difference for corosync that is waiting (If I remember properly) for 30 seconds before declaring as down the node.

Just a question: on average, which is the speed for dirty ram migration? What should I expect ?
 
cost is relative.

you have an investment in 15 servers, all which have a cost associated. the data they house has a relative value, and the services they provide (or the partial or total failure to provide this service) has a cost associated as well. In light of that, what is the relative value of the additional switches plus switch ports vs the time, cost, direct lost revenue, loss of trust, and your own (or your NOC team) availability? Not for me to say. in my case its a no brainer.

Honestly, I've never ever seen issue with heartbeat/corosync running of 100mbit,

I honestly dont know what to say to that. we're evaluating your anecdotal evidence (as rich or limited as it may be) vs the opinion of a stranger on the internet. I have no advice for you.

Just a question: on average, which is the speed for dirty ram migration? What should I expect ?

Its not very fast. Have a look here: http://moiseevigor.github.io/virtualization/2014/11/26/speedup-your-kvm-migration-proxmox/
 
Obviously would be possible to use the same (redundant) switches for both cluster and storage network, right?
Assuming not more than 20 nodes (between PVE and storages), I can put storage network on the upper ports in vlan1, and cluster network on the lower ports in vlan2. Same switch. The same for the other switch to get HA and link aggregation. 2 ports for each vlans would be used, aggregated, as intra-switch link for HA traffic and load balancing. Other 2 ports (if needed) would be used for connecting these switches to other racks (but is not my case)

Doing this way i'll be able to put a full 10GB network for both cluster and storage. with 10GB network used for cluster, the VM migration will be very fast, as RAM will be moved on a 10GB link. Additionally, i'll only use 2 switches and not 4.

So:

2 switches for the public gigabit networks
2 switches for the private gigabit networks
2 switches for the cluster+storage networks (10GB)
1 switch for the KVM/OOB/iLO network.

There is also the possibility to use the same switches for both public and private network, but in case of issues like DDoS, both network would be affected (if the switch goes down due to high PPS, both network would be down)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!