[SOLVED] corosync lock and TOTEM Retransmit after upgrade 4.4 to 5.4

bash99

New Member
Nov 2, 2016
4
2
3
44
we have a mixed 4.4/5.3 cluster, it's our production system, so the upgrade is very slow.

recently we got a lot problems when upgrade or add new installed 5.4 box to cluster, sometimes the whole cluster locked up, corosync use 100%cpu. only separate the new box from cluster can restore it.

logs like:
"pvesr failed"

"corosync[4216]: notice [TOTEM ] Retransmit List 59 5a 101 102 ad ae 83 84 d7 d8"

after test with omping, we found that multicast failed on new upgraded box.
check with https://pve.proxmox.com/wiki/Multicast_notes

IGMP snooping is on in switch,but is also on in new upgraded box.
IGMP querier is off in new upgraded box, which should be on by Multicast_notes document.

So something changed between 4.4 and 5.5?

we add
post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
post-up ( echo 0 > /sys/class/net/$IFACE/bridge/multicast_snooping )
on network settings and restart network

Everything works normally.
 
you should enable multicast querier on your physical switch instead.
I have see problems in past, when reboot a node which was the multicast querier, breaking multicast on switch because other node querier don't take relay.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!