I installed a test-cluster with three nodes and wanted to test the redundant ring for corosync. On the first node I executed
On the second node I did this:
And on the third:
Looking into the logs (journalctl -b -u corosync) gives me this:
Every 30s it's marking my ring1 as "faulty" and 1s later it recovers the ring.
Did I do anything wrong?
The servers are very old ones (~10 years) but this is just equipment for testing purposes. Both network cards are Gigabit ones, but the one for ring0 is an onboard Intel card and connected to a Gigabit switch while the second network card (for ring1) is a cheap Realtek card connected to a 100 Mbit switch.
Code:
pvecm create testcluster -bindnet0_addr 172.26.1.1 -ring0_addr 172.26.1.1 -bindnet1_addr 172.28.1.1 -ring1_addr 172.28.1.1
On the second node I did this:
Code:
pvecm add 172.26.1.1 -ring0_addr 172.26.1.2 -ring1_addr 172.28.1.2
And on the third:
Code:
pvecm add 172.26.1.1 -ring0_addr 172.26.1.3 -ring1_addr 172.28.1.3
Looking into the logs (journalctl -b -u corosync) gives me this:
Jan 17 10:48:56 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e30 23e32
Jan 17 10:48:56 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e32
Jan 17 10:48:56 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e32
Jan 17 10:48:57 t-pve2 corosync[4304]: error [TOTEM ] Marking ringid 1 interface 172.28.1.2 FAULTY
Jan 17 10:48:57 t-pve2 corosync[4304]: [TOTEM ] Marking ringid 1 interface 172.28.1.2 FAULTY
Jan 17 10:48:58 t-pve2 corosync[4304]: notice [TOTEM ] Automatically recovered ring 1
Jan 17 10:48:58 t-pve2 corosync[4304]: [TOTEM ] Automatically recovered ring 1
Jan 17 10:49:02 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e44
Jan 17 10:49:02 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e44
Jan 17 10:49:02 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e46
Jan 17 10:49:02 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e46
Jan 17 10:49:02 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e46
Jan 17 10:49:02 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e46
Jan 17 10:49:02 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e47
[...]
Jan 17 10:49:22 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e87
Jan 17 10:49:25 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e89 23e8b 23e8d 23e8f
Jan 17 10:49:25 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e89 23e8b 23e8d 23e8f
Jan 17 10:49:25 t-pve2 corosync[4304]: [TOTEM ] Retransmit List: 23e8b 23e8f
Jan 17 10:49:25 t-pve2 corosync[4304]: notice [TOTEM ] Retransmit List: 23e8b 23e8f
Jan 17 10:49:27 t-pve2 corosync[4304]: error [TOTEM ] Marking ringid 1 interface 172.28.1.2 FAULTY
Jan 17 10:49:27 t-pve2 corosync[4304]: [TOTEM ] Marking ringid 1 interface 172.28.1.2 FAULTY
Jan 17 10:49:28 t-pve2 corosync[4304]: notice [TOTEM ] Automatically recovered ring 1
Jan 17 10:49:28 t-pve2 corosync[4304]: [TOTEM ] Automatically recovered ring 1
Every 30s it's marking my ring1 as "faulty" and 1s later it recovers the ring.
Did I do anything wrong?
The servers are very old ones (~10 years) but this is just equipment for testing purposes. Both network cards are Gigabit ones, but the one for ring0 is an onboard Intel card and connected to a Gigabit switch while the second network card (for ring1) is a cheap Realtek card connected to a 100 Mbit switch.