Open vSwitch and multicast issues (cluster keeps losing quorum)

brad_mssw

Well-Known Member
Jun 13, 2014
133
9
58
I'm having issues where multicast doesn't appear to work when I use openvswitch to configure my bonds and bridges. Initially at boot, everything comes up and the system has quorum, then in the logs you start seeing totem retransmissions on all systems and everything gets out of whack. Obviously this is multicast related, but I've seen no guidance on workarounds when using Open vSwitch.

What I'm trying to do is bond 2 nics in LACP for redundancy and bandwidth aggregation across 2 Juniper EX switches in a stack (aka chassis cluster). Then I have my bridge on top of that with a couple of internal interfaces for the local machine to use off that bridge, one for proxmox cluster communication and one for ceph communication.

This worked fine when I was using standard linux bonding and bridging, as long as I used a post-up script to turn on the multicast querier on any bridges like:
Code:
post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier && sleep 5 )


However, that doesn't appear to be an option on openvswitch bridges, there's no bridge settings in the /sys/devices/virtual/net/$IFACE for openvswitch bridges. Is there another way to make this work? Google didn't turn up anything. Right now I've switched to using cman transport="udpu" as a workaround which seems to have worked, but I know it isn't considered a good idea.

It should also be mentioned that I am running the latest release from the pve-no-subscription repository, currently, and using the 3.10.0-4-pve kernel.

Here's my /etc/network/interfaces:
Code:
auto lo
iface lo inet loopback

allow-vmbr0 ovsbond
iface ovsbond inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eth0 eth1
  pre-up ( ifconfig eth0 mtu 9000 && ifconfig eth1 mtu 9000 )
  ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
  mtu 9000

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports ovsbond vlan50 vlan55
  mtu 9000

# Proxmox cluster communication vlan
allow-vmbr0 vlan50
iface vlan50 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=50
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.50.10.44
  netmask 255.255.255.0
  gateway 10.50.10.1
  mtu 1500

# Ceph cluster communication vlan (jumbo frames)
allow-vmbr0 vlan55
iface vlan55 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=55
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.55.10.44
  netmask 255.255.255.0
  mtu 9000

And because of a bug, as I found in this forum, I also had to append this to /etc/default/openvswitch-switch to get the interfaces to come up during boot:
Code:
RUN_DIR="/run/network"
IFSTATE="$RUN_DIR/ifstate"

check_ifstate() {
    if [ ! -d "$RUN_DIR" ] ; then
        if ! mkdir -p "$RUN_DIR" ; then
            log_failure_msg "can't create $RUN_DIR"
            exit 1
        fi
    fi
    if [ ! -r "$IFSTATE" ] ; then
        if ! :> "$IFSTATE" ; then
            log_failure_msg "can't initialise $IFSTATE"
            exit 1
        fi
    fi
}

check_ifstate

Thanks!
-Brad
 
Last edited:
This worked fine when I was using standard linux bonding and bridging, as long as I used a post-up script to turn on the multicast querier on any bridges like:

Isn't it possible to enable multicast querier on the switch?
 
I probably can enable it in the Juniper switch, but haven't touched the switch config from when I had it working with standard linux bonding and bridging since I could enable the querier there. I've never messed with the multicast settings in the Juniper side, if anyone has any suggestions on what needs to be done that is familiar with Juniper, I'd love to hear it.

Also, I should note the multicast wiki seems to suggest to disable IGMP snooping all-together: https://pve.proxmox.com/wiki/Multicast_notes
Though that is definitely easier to accomplish, I don't think that is the proper solution even if it would work. Right?

Thanks.
-Brad
 
Any reason not to put the querier on the router instead of the switch? Typically we don't use RVI (Routed Virtual Interface) interfaces on our switches, so you can't do an IGMP querier without an RVI. I enabled it on our Juniper SRX router via
Code:
set protocols igmp $iface version 2

and it appears like that may have solved it, but time will tell.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!