VM networks are temporary unavailable during Hypervisor reboot

stefws

Member
Jan 29, 2015
302
4
18
Denmark
siimnet.dk
We have all VM networks virtualized by vlan tagging and connected through a single OVS switch vmbr1. This switch is connected to a pair of bonded 2x10Gbs NICs cabled to a virtual chassis comprised of two Cisco Nexus 5672up. Sometimes during a reboot of a PVE 4.4 hypervisor node (properly during start of the OVS vmbr1 but not sure) we find temporary issues reaching/connecting to various VMs on other PVE nodes from outside for up to a few minutes. This has been like this for as long as we can remember on PVE 4.x w/OVS 2.4-2.6.

We can not put our finger on this annoying issues, so any hints on how to nail this are really appreciated.

TIA!
 
Also wondering why every PVE node seems to be flapping one of our two corosync rings across the bonded 2x10Gbs NICs+Cisco 5672up switches like below. Also got another ring 1 across another virtual juniper switch chassis bonded over 2x 1Gbs NICs which showing no flapping signs at all:

Mar 19 00:32:58 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
Mar 19 00:32:59 n2 corosync[5261]: [TOTEM ] Automatically recovered ring 0
Mar 19 00:33:06 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
Mar 19 00:33:07 n2 corosync[5261]: [TOTEM ] Automatically recovered ring 0
Mar 19 00:33:15 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
Mar 19 00:33:16 n2 corosync[5261]: [TOTEM ] Automatically recovered ring 0
Mar 19 00:33:24 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
Mar 19 00:33:25 n2 corosync[5261]: [TOTEM ] Automatically recovered ring 0
Mar 19 00:33:34 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
Mar 19 00:33:35 n2 corosync[5261]: [TOTEM ] Automatically recovered ring 0
Mar 19 00:33:43 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
Mar 19 00:33:44 n2 corosync[5261]: [TOTEM ] Automatically recovered ring 0
Mar 19 00:33:51 n2 corosync[5261]: [TOTEM ] Marking ringid 0 interface 10.45.71.2 FAULTY
 
Hi

For your first question, I guess the critical point here would be to pinpoint if the the open-switch startup is causing the problem or not.
I am not an open-vswitch specialist, it was added to the pve codebase by external contributors.
However two things come to my mind:
are you using STP or RSTP on the bridge ? could the learning stage of (R)STP be related to that ?
how long is taking the open-vswitch systemd unit to finish ? you can get some info on that with

systemd-analyse blame
 
Hi

For your first question, I guess the critical point here would be to pinpoint if the the open-switch startup is causing the problem or not.
It's hard to determine the exact RC, any hints on how to pin point?

I am not an open-vswitch specialist, it was added to the pve codebase by external contributors.
However two things come to my mind:
are you using STP or RSTP on the bridge ? could the learning stage of (R)STP be related to that ?
STP, I assume as it's default, also https://pve.proxmox.com/wiki/Open_vSwitch says about RSTP:

'WARNING: The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability.'

and we're currently on pve-kernel 4.4.44-84,
see our network interfaces below:

auto lo
iface lo inet loopback

auto eth8
iface eth8 inet manual

auto eth9
iface eth9 inet manual

auto bond0
iface bond0 inet manual
up ifconfig bond0 0.0.0.0 up
slaves eth4 eth0
bond-mode 4
bond-miimon 100
bond-downdelay 200
bond-updelay 200
bond-lacp-rate 0
bond-xmit-hash-policy layer2+3
post-up ifconfig bond0 mtu 9000

auto vmbr0
iface vmbr0 inet static
address x.y.z.k
netmask 255.255.255.240
gateway x.y.z.241
bridge_ports bond0
bridge_stp off
bridge_fd 0

# vOpenSwitch configuration below:

# Bond eth8 and eth9 together
allow-vmbr1 bond1
auto bond1
iface bond1 inet manual
ovs_bridge vmbr1
ovs_type OVSBond
ovs_bonds eth8 eth9
# Force the MTU of the physical interfaces to be jumbo-frame capable.
# This doesn't mean that any OVSIntPorts must be jumbo-capable.
# We cannot, however set up definitions for eth8 and eth9 directly due
# to what appear to be bugs in the initialization process.
pre-up ( ifconfig eth8 mtu 9000 && ifconfig eth9 mtu 9000 && test -x /root/bin/setup10Gifaces && /root/bin/setup10Gifaces )
ovs_options bond_mode=balance-tcp lacp=active
mtu 9000

# Bridge for our bond and vlan virtual interfaces (our VMs will
# also attach to this bridge)
auto vmbr1
allow-ovs vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
# NOTE: we MUST mention bond1, vlanX, and vlanY even though each
# of them lists ovs_bridge vmbr1! Not sure why it needs this
# kind of cross-referencing but it won't work without it!
ovs_ports bond1 vlan11 vlan12 vlan13 vlan20 vlan21
mtu 9000

allow-vmbr1 vlan11
iface vlan11 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=11
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.45.64.17
netmask 255.255.255.0
mtu 9000

allow-vmbr1 vlan12
iface vlan12 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=12
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.45.65.17
netmask 255.255.255.0
mtu 9000

allow-vmbr1 vlan13
iface vlan13 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=13
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.45.71.7
netmask 255.255.255.0
mtu 9000

allow-vmbr1 vlan20
iface vlan20 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=20
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.45.66.17
netmask 255.255.255.0
mtu 9000

allow-vmbr1 vlan21
iface vlan21 inet static
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=21
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.45.67.17
netmask 255.255.255.0
mtu 9000

how long is taking the open-vswitch systemd unit to finish ? you can get some info on that with

systemd-analyse blame
root@n7:~# systemd-analyze blame | grep openvswitch
713ms openvswitch-nonetwork.service
230ms openvswitch.service
 
Last edited:
I don't see quick fix solution here.

You need to look with tcpdump why are the VM not reachable
First you could inspect the arp traffic with
tcpdump proto arp
to see if arp who-has queries are properly answered inside the VLAN when the problem happens
 
It turns out also to be an issue if we reboot other physical servers only attached to the Nexus. Our net admins have briefly seen in the Nexus mac addresses go 'missing' from various ports when such happens. They do not know why though :/ Nexus SWs are using MSTP and our openvswitches are using STP, dunno if this creates any conflicts when we have a vlan crossing both the Nexus ports as well as OVS ports...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!