Cluster out of sync

Leo David

Well-Known Member
Apr 25, 2017
115
6
58
45
Hi,
I've recently added 4 Proxmox nodes to the cluster ( previously formed by 3 nodes ), but being connected to another separate switch.
The problem is randomly, cluster goes offline all nodes get red, and on tghe new nodes, journalctl -f gives this:

corosync[1332]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
Oct 29 14:21:26 pmx-dt4 pmxcfs[1235]: [status] notice: cpg_send_message retry 60
Oct 29 14:21:27 pmx-dt4 pmxcfs[1235]: [status] notice: cpg_send_message retry 70

If i restarty all nodes, all nodes go green, but at a point the nodes go back red...:(
Do you have any ideea what could be the problem ?
Thank you very much !
 
Hi,
Thank you very much.
I've disabled snooping on the new switch where the new nodes are connected, not sure that solved though... They are green after reboot anyway. Another problem i think it that because i'm using the same subnet for network storage ( ceph & gluster ) and management.
 
Hi,
I shall try to create a different vlan for cluster network, but i have some concerns.
At the moment, each pve node is 10gb sfp+ connected to the switch. Also, they have 1gb unused connection each. Woult it be a good practice to still use the 10gb card for cluster network by separating it from storage network with vlan tagging and therefore bennefit of 10gb for migrations, or to use the dedicated 1gb port for cluster network i a differend vlan and therefore limit vms migrations at 1gb/s ?
Or,
Is that a way to use 10gb comnection for storage & migration but 1gb connection for cluster ?


Thanks a lot !
 
corosync network != migration network.
use 1G as primary corosync, 10G as secondary corosync.
 
Thank you, but i dont think i've understood. I basically need to have 1 x network ( 10gb one ) for storage & migration, and the 1gb network for corosync.
How could this be configured ?
 
VLANs.
10G: vlan for storage, vlan for migration/mgmt, etc etc, vlan for secondary corosync
1G: vlan for primary corosync
etc.
 
Ok, thank you.
I'm trying to locate edit datacenter.cfg to create a separate migration network and disable ssl on it, but i just can't find this file. Should i just create it, or those settings are now in a different file ?