[SOLVED] Openvswitch LACP failure

aTan

Renowned Member
Mar 22, 2013
43
4
73
Hi, I use Proxmox 5.2 for a couple of months and since yesterday openvswitch LACP started to disconnect (at first one interface, after about 10 minutes other). After reboot it is OK for several hours. After failure ethtool shows that both interfaces have link up.

ovs-vswitchd.log:
Code:
2018-10-29T17:51:03.584Z|06690|bond|INFO|interface eno1: link state down
2018-10-29T17:51:03.584Z|06691|bond|INFO|interface eno1: disabled
2018-10-29T17:51:03.584Z|06692|bond|INFO|bond bond0: active interface is now eno2
2018-10-29T18:40:43.483Z|00018|ofproto_dpif_xlate(handler57)|WARN|received packet on unknown port 3 while processing icmp6,in_port=3,vlan_tci=0x0000,dl_src=b2:1c:f7:eb:2d:0e,dl_dst=33:33:00:00:00:02,ipv6_src=fe80::b01c:f7ff:feeb:2d0e,ipv6_ds
t=ff02::2,ipv6_label=0x00000,nw_tos=0,nw_ecn=0,nw_ttl=255,icmp_type=133,icmp_code=0,nd_target=::,nd_sll=00:00:00:00:00:00,nd_tll=00:00:00:00:00:00 on bridge vmbr0
2018-10-29T18:52:08.679Z|06693|bond|INFO|interface eno2: link state down
2018-10-29T18:52:08.679Z|06694|bond|INFO|interface eno2: disabled
2018-10-29T18:52:08.679Z|06695|bond|INFO|bond bond0: all interfaces disabled
switch lacp log:
Code:
3          2018-10-29 19:52:12+01:00  LACP negotiation failed because the remote
                                       interface was not selected. Please check
                                      the remote interface's status and configur
                                      ations. (Interface=10GE2/0/25, Eth-Trunk7)
4          2018-10-29 18:51:08+01:00  LACP negotiation failed because the remote
                                       interface was not selected. Please check
                                      the remote interface's status and configur
                                      ations. (Interface=10GE1/0/25, Eth-Trunk7)
switch log
Code:
Oct 29 2018 19:52:11+01:00 srvsw5 %%01LACP/2/hwLacpTotalLinkLoss_active(l):CID=0x807a0405-alarmID=0x09360001;Link bandwidth lost totally. (TrunkIndex=7, TrunkIfIndex=127, TrunkId=7, TrunkName=Eth-Trunk7, Reason=No link is selected.)
Oct 29 2018 19:52:11+01:00 srvsw5 %%01LACP/4/LACP_STATE_DOWN(l):CID=0x80480488;The LACP state is down. (PortName=10GE2/0/25, TrunkName=Eth-Trunk7, LastReceivePacketTime=[2018-10-29 19:52:11:361+01:00], Reason=The remote interface was not selected. Please check the remote interface's status and configurations.)
Oct 29 2018 19:52:11+01:00 srvsw5 %%01IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE2/0/25, AdminStatus=UP, OperStatus=DOWN, Reason=LACP negotiation failed, mainIfname=Eth-Trunk7)
Oct 29 2018 19:52:11+01:00 srvsw5 %%01IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=Eth-Trunk7, AdminStatus=UP, OperStatus=DOWN, Reason=The conditions for the activation of the interface are not met, mainIfname=Eth-Trunk7)
Oct 29 2018 18:51:07+01:00 srvsw5 %%01LACP/4/LACP_STATE_DOWN(l):CID=0x80480489;The LACP state is down. (PortName=10GE1/0/25, TrunkName=Eth-Trunk7, LastReceivePacketTime=[2018-10-29 18:51:06:465+01:00], Reason=The remote interface was not selected. Please check the remote interface's status and configurations.)
Oct 29 2018 18:51:06+01:00 srvsw5 %%01IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE1/0/25, AdminStatus=UP, OperStatus=DOWN, Reason=LACP negotiation failed, mainIfname=Eth-Trunk7)
network config:
Code:
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

allow-vmbr0 bond0
iface bond0 inet manual
        ovs_bonds eno1 eno2
        ovs_type OVSBond
        ovs_bridge vmbr0
        ovs_options lacp=active bond_mode=balance-tcp other_config:lacp-time=fast
        pre-up ( ifconfig eno1 mtu 9000 && ifconfig eno2 mtu 9000 )
        mtu 9000

allow-vmbr0 vlan101_vm
iface vlan101_vm inet static
        address  100.64.254.137
        netmask  255.255.255.192
        gateway  100.64.254.129
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=101
        ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif

allow-vmbr0 vlan4_srv_back
iface vlan4_srv_back inet static
        address  192.168.0.137
        netmask  255.255.255.0
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=4
        ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif

allow-vmbr0 vlan99_data
iface vlan99_data inet static
        address  100.64.254.71
        netmask  255.255.255.192
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=99
        ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports bond0 vlan101_vm vlan4_srv_back vlan99_data
        mtu 9000
pveversion
Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-5-pve: 4.15.18-24
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1
 
Just FYI, it was caused by an unprivileged container (running some custom video stream caching service) I moved to this server that day. After moving this ct away, LACP problems vanished.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!