OpenVswitch - vlan traffic on bridge which don't belong to any VM

udo · Apr 19, 2017

Hi,
I have an strage effect on some nodes of an cluster.
Each node, which are in one room, show traffic on the openvswitch-bridge which is transferred between two servers (both are not virtualized) which are outside of the proxmox-cluster (its backup-traffic between an nfs-server and the backup-server).

The bridge vmbr0 is an bond of two nics, which are connected to different switches (both hp).

The traffic occours on en empty node also…

Here the part from interfaces:

Code:

#
allow-vmbr0 bond1
iface bond1 inet manual
        ovs_bonds eth4 eth5
        ovs_type OVSBond
        ovs_bridge vmbr0
        ovs_options bond_mode=active-backup

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond1

Traffic is huge (the node has no VMs and an uptime from 4 days):

Code:

eth4      Link encap:Ethernet  HWaddr 68:05:ca:1a:15:40 
          inet6 addr: fe80::6a05:caff:fe1d:1440/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1409134767 errors:0 dropped:520 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2055747032286 (1.8 TiB)  TX bytes:680 (680.0 B)
          Interrupt:35 Memory:df180000-df1a0000

eth5      Link encap:Ethernet  HWaddr 68:05:ca:1a:15:41 
          inet6 addr: fe80::6a05:caff:fe1d:1441/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1411735000 errors:0 dropped:523 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2059704122752 (1.8 TiB)  TX bytes:680 (680.0 B)
          Interrupt:37 Memory:df1c0000-df1e0000

openvswitch is up-to-date:

Code:

 dpkg -l | grep openvswitch
ii  openvswitch-common                   2.6.0-2                            amd64        Open vSwitch common components
ii  openvswitch-switch                   2.6.0-2                            amd64        Open vSwitch switch implementations

Any hints, which setings are missed on the switch (or openvswitch-side)?

Udo

Regis · Apr 19, 2017

Hi,

What is your ports configuration on the switch side for this node? My first guess is that the ports are potentially configured as promiscuous mode.

Also, check your MAC Address table on your switch to make sure that the MAC of your physical adapters of your 2 servers that creates the traffic is present. It might be that the MAC address isn't registered on any ports, so the traffic is sent to all the vlan member ports, including trunks (if you created a trunk for your node on your HP switch).

Thanks,

udo · Apr 19, 2017

Regis said:
Hi,

What is your ports configuration on the switch side for this node? My first guess is that the ports are potentially configured as promiscuous mode.

Hi,
the port-definition on the switch-side is quite easy:

Code:

#switch-a:
interface 9
   name "pve01 eth4"
exit
interface 10
   name "pve03 eth4"
exit
interface 11
   name "pve05 eth4"
exit
interface 12
   name "pve07 eth4"
exit

on the other switch it's the same (eth5 instead of eth4 as name).

Also, check your MAC Address table on your switch to make sure that the MAC of your physical adapters of your 2 servers that creates the traffic is present. It might be that the MAC address isn't registered on any ports, so the traffic is sent to all the vlan member ports, including trunks (if you created a trunk for your node on your HP switch).

Thanks,

yes - both mac-addresses are known over the trunk, where the connection came from.

I found a hint, that RSTP-problems can produce such an issue, because the learned macadresses are dropped. But I see a lot of mac-adresses (don't know if I can show the age of mac-entrys) on the switch and "show spanning-tree" shows the last topologie change is 48 days old.

As workaround I disabled the vlans from the nfs-server + backup-server from the proxmox ports, because they are not needed by any VM.

Udo

Regis · Apr 20, 2017

I would suggest checking your RSTP Configuration with the following document ftp://ftp.hp.com/pub/networking/software/59903016e7_ch13.pdf and see if it helps. There is a few things you can do to optimise STP/RSTP depending of the usage of the ports.

In any cases, for your bonds, you could configure your ports in a static LACP (trunk group: trunk e x-x trkX lacp) for both node interfaces, and use balance-tcp instead of Active-Backup. That way, you get a backup link and you pretty much double your bandwidth for your node(s). I have a similar configuration (bonds with HP Switch) and it's been working really smoothly. Of course, your ethernet adapters needs to be able to handle it, I've seen some low-end adapters not forming a bond correctly and having some strange behaviours.

udo · Apr 20, 2017

Regis said:
I would suggest checking your RSTP Configuration with the following document ftp://ftp.hp.com/pub/networking/software/59903016e7_ch13.pdf and see if it helps. There is a few things you can do to optimise STP/RSTP depending of the usage of the ports.

Hi,
thanks for the info - I will look at this today and see if this change something.

In any cases, for your bonds, you could configure your ports in a static LACP (trunk group: trunk e x-x trkX lacp) for both node interfaces, and use balance-tcp instead of Active-Backup. That way, you get a backup link and you pretty much double your bandwidth for your node(s). I have a similar configuration (bonds with HP Switch) and it's been working really smoothly. Of course, your ethernet adapters needs to be able to handle it, I've seen some low-end adapters not forming a bond correctly and having some strange behaviours.

The NICs should not be a problem (Intel dual GB-Nics), but LACP work imho not with different (unstacked) switches, or? The bond is for ha-reason and every bond-part in on a different switch.

Udo

manu · Apr 20, 2017

you could also tcpdump -i bond1 to see where the traffic is coming from

udo · Apr 20, 2017

manu said:
you could also tcpdump -i bond1 to see where the traffic is coming from

Hi Manu,
this is what I have done, to see that the traffic is not related to that host/vms…

But bond1 don't show traffic, eth4+5 yes:

Code:

time tcpdump -i bond1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond1, link-type EN10MB (Ethernet), capture size 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

real    0m21.586s
user    0m0.004s
sys     0m0.000s


time tcpdump -i eth4 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth4, link-type EN10MB (Ethernet), capture size 262144 bytes
10:26:15.539314 IP 10.1.1.232 > 224.0.0.18: VRRPv2, Advertisement, vrid 21, prio 98, authtype simple, intvl 1s, length 20
10:26:15.539905 IP 10.1.1.242 > 224.0.0.18: VRRPv2, Advertisement, vrid 11, prio 98, authtype simple, intvl 1s, length 20
10:26:15.540980 IP 10.1.1.242 > 224.0.0.18: VRRPv2, Advertisement, vrid 12, prio 101, authtype simple, intvl 1s, length 20
10:26:15.542002 IP 10.1.1.242 > 224.0.0.18: VRRPv2, Advertisement, vrid 13, prio 101, authtype simple, intvl 1s, length 20
10:26:15.543195 IP 10.1.1.232 > 224.0.0.18: VRRPv2, Advertisement, vrid 22, prio 101, authtype simple, intvl 1s, length 20
…

yes - this snippet show related traffic…

Udo

Regis · Apr 20, 2017

It seems like your end devices 10.1.1.232 and 242 sends vrrp (virtual router redundancy protocol)... As it is multicast, your node gets the packets, but doesn't reply to it... But this does seems like typical vrrp packets, I'm just not sure why you are having those packets sent at all, unless you are actually using it..? Are your networks isolated (Vlans)?

udo · Apr 20, 2017

Regis said:
It seems like your end devices 10.1.1.232 and 242 sends vrrp (virtual router redundancy protocol)... As it is multicast, your node gets the packets, but doesn't reply to it... But this does seems like typical vrrp packets, I'm just not sure why you are having those packets sent at all, unless you are actually using it..? Are your networks isolated (Vlans)?

Hi,
yes - this was an bad example - should only show that tcpdump don't show traffic on bond1 but on eth4/5. (and yes, the networks are isolated vlands).
The vrrp-packages are from keepalived (used by ha-couples).

Unfortunality I hadn't find the time today to look in your link… hopefully tomorrow.

Udo

Search

Search

OpenVswitch - vlan traffic on bridge which don't belong to any VM

udo

Distinguished Member

Regis

New Member

udo

Distinguished Member

Regis

New Member

udo

Distinguished Member

manu

Proxmox Staff Member

udo

Distinguished Member

Regis

New Member

udo

Distinguished Member