Open vSwitch and incorrect RSTP (+ crash on topology change involving Mellanox 10GbE adapter)

636f6e6e6f72

New Member
Apr 27, 2016
7
0
1
33
Hi all,

I have been trying to set up 4 nodes into the following topology using Open vSwitch.

fa6S0XD.png


I want to configure RSTP so that traffic between storage-1, compute-1 and compute-2 traverses the direct attach 10GbE links (shown in the diagram as a fatter pipe) when available.

My current problem is two-fold;

1. Having implemented a configuration that is similar to the RSTP example on the Open vSwitch wiki page (2.2.4 Example 4: Rapid Spanning Tree (RSTP) - 1Gbps uplink, 10Gbps interconnect) I can't seem to make traffic compute-1-storage-1 or compute-2-storage-1 traffic traverse the 10GbE links instead of via the 1GbE switch links.

2. If I disconnect the 1GbE switch links from compute-1 and compute-2 to leave the 10GbE links as the only path to storage-1, storage-1 suffers a kernel panic and spews out a trace which I haven't figured out how to get my hands on after a reboot. The kernel panic is definitely related to the Mellanox (Connect-X 2?) dual-port adapter installed, as if I fail the 1GbE switch links while the 10GbE links are disconnected there is no panic until I reconnect them. I have installed the latest MLNX_EN driver (3.x) from Mellanox, no change.

I need some assistance troubleshooting please! I'm not sure where to start.
 
Here are my interfaces files for problem #1.

storage-1:
Code:
#
# Physical interfaces
#

allow-vmbr0 eth0
# 1Gbps link to core switch
iface eth0 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=20000
   mtu 1496

allow-vmbr0 eth1
# 10Gbps link to another proxmox/ceph node
iface eth1 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=2000
   mtu 1496

allow-vmbr0 eth2
# 10Gbps link to another proxmox/ceph node
iface eth2 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=2000
   mtu 1496
   auto vmbr0

#
# Bridges
#

allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports eth0 eth1 eth2 vlan100 vlan200 vlan210 vlan220

  # Lower settings for shorter convergence times, we're on a fast network.
  # Set the priority high so that it won't be promoted to the STP root
  # NOTE: ovs_options and ovs_extra do *not* work for some reason to set the STP
  #       options.
  up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
  mtu 1496
  # Wait for spanning-tree convergence
  post-up sleep 10

#
# VLANS
#

# LAN (Temporary)
allow-vmbr0 vlan100
iface vlan100 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=100
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 192.168.0.120
  netmask 255.255.255.0
  gateway 192.168.0.1
  mtu 1496

# Proxmox VE Management
allow-vmbr0 vlan200
iface vlan200 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=200
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.0.20
  netmask 255.255.255.0
  gateway 10.0.0.254
  mtu 1496

# Proxmox VE Tunnel
allow-vmbr0 vlan210
iface vlan210 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=210
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.1.20
  netmask 255.255.255.0
  #gateway 10.0.1.254
  mtu 1496

# Proxmox VE Storage
allow-vmbr0 vlan220
iface vlan220 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=220
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.2.20
  netmask 255.255.255.0
  #gateway 10.0.2.254
  mtu 1496

compute-1:
Code:
#
# Physical interfaces
#

allow-vmbr0 eth0
# 1Gbps link to core switch
iface eth0 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=20000
   mtu 1496

allow-vmbr0 eth1
# 10Gbps link to another proxmox/ceph node
iface eth1 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=2000
   mtu 1496

#
# Bridges
#

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports eth0 eth1 eth3 vlan100 vlan200 vlan210 vlan220

  # Lower settings for shorter convergence times, we're on a fast network.
  # Set the priority high so that it won't be promoted to the STP root
  # NOTE: ovs_options and ovs_extra do *not* work for some reason to set the STP
  #       options.
  up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
  mtu 1496
  # Wait for spanning-tree convergence
  post-up sleep 10

#
# VLANS
#

# LAN (Temporary)
allow-vmbr0 vlan100
iface vlan100 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=100
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 192.168.0.130
  netmask 255.255.255.0
  gateway 192.168.0.1
  mtu 1496

# Proxmox VE Management
allow-vmbr0 vlan200
iface vlan200 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=200
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.0.30
  netmask 255.255.255.0
  gateway 10.0.0.254
  mtu 1496

# Proxmox VE Tunnel
allow-vmbr0 vlan210
iface vlan210 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=210
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.1.30
  netmask 255.255.255.0
  #gateway 10.0.1.254
  mtu 1496

# Proxmox VE Storage
allow-vmbr0 vlan220
iface vlan220 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=220
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.2.30
  netmask 255.255.255.0
  #gateway 10.0.2.254
  mtu 1496

compute-2:
Code:
#
# Physical interfaces
#

allow-vmbr0 eth0
# 1Gbps link to core switch
iface eth0 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=20000
   mtu 1496

allow-vmbr0 eth1
# 10Gbps link to another proxmox/ceph node
iface eth1 inet manual
   ovs_bridge vmbr0
   ovs_type OVSPort
   ovs_options other_config:rstp-enable=true other_config:rstp-path-cost=2000
   mtu 1496

#
# Bridges
#

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports eth0 eth1 eth3 vlan100 vlan200 vlan210 vlan220

  # Lower settings for shorter convergence times, we're on a fast network.
  # Set the priority high so that it won't be promoted to the STP root
  # NOTE: ovs_options and ovs_extra do *not* work for some reason to set the STP
  #       options.
  up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
  mtu 1496
  # Wait for spanning-tree convergence
  post-up sleep 10

#
# VLANS
#

# LAN (Temporary)
allow-vmbr0 vlan100
iface vlan100 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=100
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 192.168.0.131
  netmask 255.255.255.0
  gateway 192.168.0.1
  mtu 1496

# Proxmox VE Management
allow-vmbr0 vlan200
iface vlan200 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=200
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.0.31
  netmask 255.255.255.0
  gateway 10.0.0.254
  mtu 1496

# Proxmox VE Tunnel
allow-vmbr0 vlan210
iface vlan210 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=210
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.1.31
  netmask 255.255.255.0
  #gateway 10.0.1.254
  mtu 1496

# Proxmox VE Storage
allow-vmbr0 vlan220
iface vlan220 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=220
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.2.31
  netmask 255.255.255.0
  #gateway 10.0.2.254
  mtu 1496

storage-2:
Code:
#
# Physical interfaces
#

# Bond eth0 and eth1 together
allow-vmbr0 bond0
iface bond0 inet manual
  ovs_bridge vmbr0
  ovs_type OVSBond
  ovs_bonds eth0 eth1
  # Force the MTU of the physical interfaces to be jumbo-frame capable.
  # This doesn't mean that any OVSIntPorts must be jumbo-capable.  
  # We cannot, however set up definitions for eth0 and eth1 directly due
  # to what appear to be bugs in the initialization process.
  #pre-up ifconfig eth0 mtu 9000 # ifup fails with these stanzas
  #pre-up ifconfig eth1 mtu 9000
  ovs_options bond_mode=balance-tcp lacp=active other_config:lacp-time=fast
  mtu 1496

#
# Bridges
#

auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
  ovs_type OVSBridge
  ovs_ports bond0 vlan100 vlan200 vlan210 vlan220

#
# VLANS
#

# LAN (Temporary)
allow-vmbr0 vlan100
iface vlan100 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=100
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 192.168.0.110
  netmask 255.255.255.0
  gateway 192.168.0.1
  mtu 1496

# Proxmox VE Management
allow-vmbr0 vlan200
iface vlan200 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=200
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.0.10
  netmask 255.255.255.0
  gateway 10.0.0.254
  mtu 1496

# Proxmox VE Tunnel
allow-vmbr0 vlan210
iface vlan210 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=210
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.1.10
  netmask 255.255.255.0
  #gateway 10.0.1.254
  mtu 1496

# Proxmox VE Storage
allow-vmbr0 vlan220
iface vlan220 inet static
  ovs_type OVSIntPort
  ovs_bridge vmbr0
  ovs_options tag=220
  ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
  address 10.0.2.10
  netmask 255.255.255.0
  #gateway 10.0.2.254
  mtu 1496
 
Hi,
Have you found a solution here?
For me it only works if i lower the
other_config:rstp-priority
and take higher numbers in the switch.

On an older supermicro-server with 10GB-Intel-Cards, i see kernel-crashes, too.
Booting from debian or ubuntu kernel still works.

Markus
 
I think I've reproduced the same behavior with topology changes. Using Intel 10GBaseT NICs here. I didn't have console due to an unrelated IPMI issue, so I couldn't see if it was a kernel panic or not, but connecting ports or disconnecting can sometimes cause all networking to cease on at least 1 node, and the only thing that fixes it is powercycling the affected node.

I'm thinking this is an issue that needs to be escalated to Open vSwitch themselves, however I did notice that 2.5.1 is out.

Markus, are you saying using a different kernel than the proxmox one resolves your issue?

UPDATE: definitely a kernel panic, console is locked
 
Last edited:
Hi,

yes im using ubuntu 4.7.0 - kernel at the affected server and this works stable.
BTW there is already ovs 2.6...in pvetest -maybe we should test this.

Markus

ovs 2.6 (user space) is already in all repositories ;)

if you encounter a bug with OVS that is not reproducable with a newer non-PVE kernel, it might be that the issue is in the ovs kernel module? we don't roll our own there (so we have the 4.4 one + potential patches by Ubuntu), so if you give me a concrete kernel (package) version where it works, I can check the diff and maybe spin up a test kernel for you.
 
could you try with http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4.30/ ? just to narrow down potential culprits.. the diff between Ubuntu's 4.4 and mainline 4.7 did not show any obvious OVS related commits, but the problem might also be somewhere else (e.g., NIC drivers)

a stack trace / kernel panic from when the problem occurs would also be great!
 
Ok, i will try 4.4.30 this late evening first.
(server is in production and people will be angry with me - the whole cluster hangs ..)

..and then i have to look, how to generate this stack trace...

Markus
 
I've been experiencing similar issues to those described in this thread.

I have 4 server nodes with Mellanox ConnectX-2 cards connected in a ring, with no external switch. I'm using the RSTP enabled config from the wiki.

What I've been experiencing is the following:
  • Sometimes when booting my nodes hang when bringing up the network devices, the node becomes totally unresponsive. Disconnecting the cards doesn't help. I don't get any logs and debug is not printed on the console. Booting with my cards unplugged doesn't cause this issue. Rebooting the node a few times eventually brings the node back up.
  • When rebooting nodes sometimes knocks down other nodes as well. The nodes just suddenly get cold rebooted. This happens when the other node is either shutting down or booting up and does never happen when the other node is shut off or going through the bios.

I just tried both the 4.4.3 and 4.7.0 lowlatenry kernels linked in this thread on the nodes and it seems that my issues are not present with these kernels. :)


@brad_mssw , did you try the 4.4.3 lowlatency kernel or the generic one?
 
@gardar , tried 4.4.30 (not .3) lowlatency ... it died hard on me, infact, it took out one of my NICs completely where it failed hardware initialization on reboot. It finally came back up on the 4.7 kernel after unloading the ixgbe driver and reloading it, then survived a reboot after that.
 
@gardar , tried 4.4.30 (not .3) lowlatency ... it died hard on me, infact, it took out one of my NICs completely where it failed hardware initialization on reboot. It finally came back up on the 4.7 kernel after unloading the ixgbe driver and reloading it, then survived a reboot after that.

Apologies, 4.4.30 is what I meant.

4.4.30 was running good for me for a few hours, survived reboots, etc. but then it suddenly began having problems.

I'm switching over to 4.7 now to see if it really resolved the issue or if I just got lucky earlier.

@markusd How did you manage to get that stack trace?
 
if mainline 4.4.30 does not work but mainline 4.7 does, the next steps would be to check mainline 4.5 and 4.6 and continue to narrow down the range where the issue stops occurring. I assume you are using the included ixgbe module, and not compile your own for each of those kernels?
 
also maybe check out the 4.4.35 kernel? there are some mellanox and net: fixes in the diff between 4.4.30 and 4.4.35
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!