Cluster broken during upgrade 3 - 3.1

donty

Member
Mar 31, 2009
42
0
6
We have an odd problem with a cluster and I just wonder what kind of latency can corosync work effectvely with? We are seeing 60-120ms on multicast tests which is odd on a LAN!

All was working well in a cluster before the upgrade and none of the standard cluster fixes seem to work.

Before I go playing with vlans and switches I thought I would check on how much latency is too much.

Thanks

D

We also had a host node throw a raid disk during the process but that node is now out of service and powered down.
 
Last edited:
Hi,
why you have so much latency? Are the nodes in different datacenters?

A short test show this between two nodes
Code:
# with 10GB ethernet
proxmox1 :   unicast, seq=1, size=69 bytes, dist=0, time=0.132ms
proxmox1 : multicast, seq=1, size=69 bytes, dist=0, time=0.143ms
proxmox1 :   unicast, seq=2, size=69 bytes, dist=0, time=0.079ms
proxmox1 : multicast, seq=2, size=69 bytes, dist=0, time=0.088ms
proxmox1 :   unicast, seq=3, size=69 bytes, dist=0, time=0.084ms
proxmox1 : multicast, seq=3, size=69 bytes, dist=0, time=0.095ms
proxmox1 :   unicast, seq=4, size=69 bytes, dist=0, time=0.056ms
proxmox1 : multicast, seq=4, size=69 bytes, dist=0, time=0.062ms
proxmox1 :   unicast, seq=5, size=69 bytes, dist=0, time=0.064ms
proxmox1 : multicast, seq=5, size=69 bytes, dist=0, time=0.072ms
proxmox1 :   unicast, seq=6, size=69 bytes, dist=0, time=0.071ms
proxmox1 : multicast, seq=6, size=69 bytes, dist=0, time=0.100ms
proxmox1 :   unicast, seq=7, size=69 bytes, dist=0, time=0.083ms
proxmox1 : multicast, seq=7, size=69 bytes, dist=0, time=0.098ms
proxmox1 :   unicast, seq=8, size=69 bytes, dist=0, time=0.061ms
proxmox1 : multicast, seq=8, size=69 bytes, dist=0, time=0.068ms

# with 1GB eternet
proxmox-a : joined (S,G) = (*, 232.43.211.234), pinging
proxmox-a :   unicast, seq=1, size=69 bytes, dist=0, time=0.216ms
proxmox-a :   unicast, seq=2, size=69 bytes, dist=0, time=0.374ms
proxmox-a : multicast, seq=2, size=69 bytes, dist=0, time=0.391ms
proxmox-a :   unicast, seq=3, size=69 bytes, dist=0, time=0.291ms
proxmox-a : multicast, seq=3, size=69 bytes, dist=0, time=0.308ms
proxmox-a :   unicast, seq=4, size=69 bytes, dist=0, time=0.331ms
proxmox-a : multicast, seq=4, size=69 bytes, dist=0, time=0.348ms
proxmox-a :   unicast, seq=5, size=69 bytes, dist=0, time=0.303ms
proxmox-a : multicast, seq=5, size=69 bytes, dist=0, time=0.319ms
proxmox-a :   unicast, seq=6, size=69 bytes, dist=0, time=0.245ms
Udo
 
Thanks for responding Udo.

We found that the local switch had responded badly to the sudden noisy loss of devices on the LAN and after a reboot it worked and went back to sub 0.1ms responses and the cluster has started pulling together again. Just never checked the multicast response times before so wondered if it was way out of line or not.

Now we just need to work through the options to get it back together cleanly, probably just start with stopping the cluster services and restarting them to see if they can do it themselves. Not sure if 3 and 3.1 can coexist in a cluster do you have any experience with partial upgrades?


Hi,
why you have so much latency? Are the nodes in different datacenters?

A short test show this between two nodes
Code:
# with 10GB ethernet
proxmox1 :   unicast, seq=1, size=69 bytes, dist=0, time=0.132ms
proxmox1 : multicast, seq=1, size=69 bytes, dist=0, time=0.143ms
proxmox1 :   unicast, seq=2, size=69 bytes, dist=0, time=0.079ms
proxmox1 : multicast, seq=2, size=69 bytes, dist=0, time=0.088ms
proxmox1 :   unicast, seq=3, size=69 bytes, dist=0, time=0.084ms
proxmox1 : multicast, seq=3, size=69 bytes, dist=0, time=0.095ms
proxmox1 :   unicast, seq=4, size=69 bytes, dist=0, time=0.056ms
proxmox1 : multicast, seq=4, size=69 bytes, dist=0, time=0.062ms
proxmox1 :   unicast, seq=5, size=69 bytes, dist=0, time=0.064ms
proxmox1 : multicast, seq=5, size=69 bytes, dist=0, time=0.072ms
proxmox1 :   unicast, seq=6, size=69 bytes, dist=0, time=0.071ms
proxmox1 : multicast, seq=6, size=69 bytes, dist=0, time=0.100ms
proxmox1 :   unicast, seq=7, size=69 bytes, dist=0, time=0.083ms
proxmox1 : multicast, seq=7, size=69 bytes, dist=0, time=0.098ms
proxmox1 :   unicast, seq=8, size=69 bytes, dist=0, time=0.061ms
proxmox1 : multicast, seq=8, size=69 bytes, dist=0, time=0.068ms

# with 1GB eternet
proxmox-a : joined (S,G) = (*, 232.43.211.234), pinging
proxmox-a :   unicast, seq=1, size=69 bytes, dist=0, time=0.216ms
proxmox-a :   unicast, seq=2, size=69 bytes, dist=0, time=0.374ms
proxmox-a : multicast, seq=2, size=69 bytes, dist=0, time=0.391ms
proxmox-a :   unicast, seq=3, size=69 bytes, dist=0, time=0.291ms
proxmox-a : multicast, seq=3, size=69 bytes, dist=0, time=0.308ms
proxmox-a :   unicast, seq=4, size=69 bytes, dist=0, time=0.331ms
proxmox-a : multicast, seq=4, size=69 bytes, dist=0, time=0.348ms
proxmox-a :   unicast, seq=5, size=69 bytes, dist=0, time=0.303ms
proxmox-a : multicast, seq=5, size=69 bytes, dist=0, time=0.319ms
proxmox-a :   unicast, seq=6, size=69 bytes, dist=0, time=0.245ms
Udo