Adding new node causes cluster instability for 15 minutes

harvie

Well-Known Member
Apr 5, 2017
138
24
58
35
Hello,
when i add new node to cluster it usualy puts whole cluster in very messy state, all nodes are becoming online, offline or unknown state at random and whole cluster goes unquorate. and this completely scary mess lasts for 10 to 20 minutes, then suddenly everything converges and whole cluster becames usable. but i don't get what happens.

I got this with proxmox 4 and 5, now this still happens on proxmox 6 with knet.

is this normal? I didn't noticed any mention of this in documentation or warning in the UI. When i first installed proxmox cluster for a test, i was thinking that i completely screwed something and reinstalled all nodes to give it another try. then it happend again, so i gave up and forget about the running servers. when i came back few weeks later, the cluster was up and running. from that i realized it needs time to converge to stable state.

but come on 15 minutes? really? why is that? am i doing something wrong?
 
Last edited:
this is not normal, seems your cluster network is not reliable? dig deeper.
 
How do i test it? It's just gigabit network running on some refurbished (used to be expensive high end hardware) switches with vlans.
If network switch is causing this, how comes it gets stable after 20 minutes and then i never have problems again (unless i am adding another node). Also i don't have any other networking issues. Average ping rrt is 0.083ms with 0% loss