Weird Cluster bandwidth behavior

andy77

Well-Known Member
Jul 6, 2016
248
13
58
40
Hi @ all,

yesterday we had a pretty weird behaviour of our 25 node cluster. I did two things that seem to broke the whole cluster.

1) Started a live zfs migration form a node with a 6.1-x version to a 6.2-x node.
2) Installed in the meantime a new node and added it to the cluster (cause I forgot that the live migration is running)

After adding the new node I did recognized that the cluster seemed to be unhealthy. Checking the availability of all the nodes I saw that the latency of the nodes is pretty bad and sometimes even timeouts happan. This let me think that I have a network problem, and lead me in a different analysis direction (collisions). After checking the switches status I recognized that on all ports of the switch, where nodes from this cluster are active, we had almost 1GB/s traffic per port which leads to arroung 25GB/s traffic through the whole switch. To be honest, in my hurry and still not sure what causes the real problem (maybe broken network card or really the cluster filesystem), I did shutdown every node and started it again, which fixed the problem.

Now I would like to understand what happaned. My assumtion is, that the cluster filesystem got somehow a loop where every node changed something and rechanged it again. That lead to a super high traffic on the switch.

Any other ideas or explanations on that?

Regards
Andy
 
Hi,

do you have a dedicated redundant Network for corosync? And what network do you use for migration?
I can imagine you migrate over the corosync network, so the latency rise and the corosync queue gets filed and resend pagages.
With large clusters, this can be that there is a point where this can't keep up and stays on this level.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!