we have a mixed 4.4/5.3 cluster, it's our production system, so the upgrade is very slow.
recently we got a lot problems when upgrade or add new installed 5.4 box to cluster, sometimes the whole cluster locked up, corosync use 100%cpu. only separate the new box from cluster can restore it.
logs like:
"pvesr failed"
"corosync[4216]: notice [TOTEM ] Retransmit List 59 5a 101 102 ad ae 83 84 d7 d8"
after test with omping, we found that multicast failed on new upgraded box.
check with https://pve.proxmox.com/wiki/Multicast_notes
IGMP snooping is on in switch,but is also on in new upgraded box.
IGMP querier is off in new upgraded box, which should be on by Multicast_notes document.
So something changed between 4.4 and 5.5?
we add
post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
post-up ( echo 0 > /sys/class/net/$IFACE/bridge/multicast_snooping )
on network settings and restart network
Everything works normally.
recently we got a lot problems when upgrade or add new installed 5.4 box to cluster, sometimes the whole cluster locked up, corosync use 100%cpu. only separate the new box from cluster can restore it.
logs like:
"pvesr failed"
"corosync[4216]: notice [TOTEM ] Retransmit List 59 5a 101 102 ad ae 83 84 d7 d8"
after test with omping, we found that multicast failed on new upgraded box.
check with https://pve.proxmox.com/wiki/Multicast_notes
IGMP snooping is on in switch,but is also on in new upgraded box.
IGMP querier is off in new upgraded box, which should be on by Multicast_notes document.
So something changed between 4.4 and 5.5?
we add
post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
post-up ( echo 0 > /sys/class/net/$IFACE/bridge/multicast_snooping )
on network settings and restart network
Everything works normally.