When upgrading my cluster from 3.4 to 4.0 I got heavy trouble with multicast problems. It took a lot of effort to find a solution. I decided to share it here...
Upgrading from 3.4 to 4.0 also changed cluster communication from unicast UDPU to multicast. As our network should have been multicast ready I expected no problems. But it didn't work.
Symptoms:
Cluster did not get working, now quorum. On testing with "omping" as found in the troubleshooting guide, I saw complete packet loss on multicast.
Network people told me, our network ist multicast ready, so cluster should work. Some other clusters already work.
Tests:
Using a separate switch with IGMP snooping off showed, that multicast on proxmox is working. Just for testing I did a complete reinstall of all cluster nodes and connected back to the normal switch. Now multicast was partially working.
Tests with "omping" showed that now 2 of 3 servers have multicast connection.
For deeper tests I tried "tcpdump igmp". This listed "IGMP query" packets from the multicast router on every node, but only nodes which showed multicast success with "omping" also showed "IGMP report" packets.
Reason:
Cluster interface was configured as Linux bridge (vmbrx interface). Linux bridge has IGMP snooping on by default but does not reliably detect a multicast router.
Quick fix:
Disable IGMP snooping on linux bridge.
With this fix cluster communication was running and I had quorum.
But, why didn't it work as expected? And how do I fix this without any side effects?
Correct fix:
Real problem is, that Linux bridge sometimes does not reliably detect a multicast router by its own. And because "IGMP query" packets are multicast packets they are suppressed by IGMP snooping. I had to configure my bridge to know about the port where the multicast router sits. This will be done by configuring the port from "autodetect multicast router" to "receive all multicast traffic".
(See docs on Linux bridge and IGMP snooping.)
In my case multicast router is outside my proxmox server, so it sits behind physical interface ethy. Setting this interface is a bit complicated, so I did set this up in a small script in /etc/network/if-up.d/multicastrouter
My cluster interface now gets "ICMP query" packets and responds with "IGMP report" packtes. So bridge and outside switch know about multicast traffic.
Hope this helps others having similar problems.
Birger
Upgrading from 3.4 to 4.0 also changed cluster communication from unicast UDPU to multicast. As our network should have been multicast ready I expected no problems. But it didn't work.
Symptoms:
Cluster did not get working, now quorum. On testing with "omping" as found in the troubleshooting guide, I saw complete packet loss on multicast.
Network people told me, our network ist multicast ready, so cluster should work. Some other clusters already work.
Tests:
Using a separate switch with IGMP snooping off showed, that multicast on proxmox is working. Just for testing I did a complete reinstall of all cluster nodes and connected back to the normal switch. Now multicast was partially working.
Tests with "omping" showed that now 2 of 3 servers have multicast connection.
For deeper tests I tried "tcpdump igmp". This listed "IGMP query" packets from the multicast router on every node, but only nodes which showed multicast success with "omping" also showed "IGMP report" packets.
Reason:
Cluster interface was configured as Linux bridge (vmbrx interface). Linux bridge has IGMP snooping on by default but does not reliably detect a multicast router.
Quick fix:
Disable IGMP snooping on linux bridge.
Code:
echo 0 >/sys/class/net/vmbrx/bridge/multicast_snooping
But, why didn't it work as expected? And how do I fix this without any side effects?
Correct fix:
Real problem is, that Linux bridge sometimes does not reliably detect a multicast router by its own. And because "IGMP query" packets are multicast packets they are suppressed by IGMP snooping. I had to configure my bridge to know about the port where the multicast router sits. This will be done by configuring the port from "autodetect multicast router" to "receive all multicast traffic".
(See docs on Linux bridge and IGMP snooping.)
In my case multicast router is outside my proxmox server, so it sits behind physical interface ethy. Setting this interface is a bit complicated, so I did set this up in a small script in /etc/network/if-up.d/multicastrouter
Code:
# set multicast router interface
if [ "$MODE" = "start" ] ; then
PORT=`basename \`find /sys/devices/virtual/net/$IFACE/brif | grep brif/\``
SYS=`find /sys/devices -name multicast_router | grep $PORT`
echo 2 >$SYS
fi
My cluster interface now gets "ICMP query" packets and responds with "IGMP report" packtes. So bridge and outside switch know about multicast traffic.
Hope this helps others having similar problems.
Birger