[SOLVED] High packet lost after update

ca_maer

Well-Known Member
Dec 5, 2017
181
14
58
45
We just updated to the lastest version from:
Code:
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-22
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9

to:

Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.17-1-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-1
pve-kernel-4.13: 5.1-44
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

But now we have around 20% packet lost. I'm pretty sure this is not hardware related as before the update everything was working fine. Here's the error we get in dmesg
Code:
[  511.148536] vmbr0: received packet on bond0 with own address as source address (addr:a0:36:9f:a2:6b:14, vlan:0)
[  511.148539] vmbr0: received packet on bond0 with own address as source address (addr:a0:36:9f:a2:6b:14, vlan:0)
[  511.148545] vmbr0: received packet on bond0 with own address as source address (addr:a0:36:9f:a2:6b:14, vlan:0)
[  513.086951] vmbr0: received packet on bond0 with own address as source address (addr:a0:36:9f:a2:6b:14, vlan:0)

and here's our network config:
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual
#Broken

iface eno3 inet manual

iface eno4 inet manual

iface enp5s0f0 inet manual

iface enp5s0f1 inet manual

iface enp5s0f2 inet manual

iface enp5s0f3 inet manual

auto bond0
iface bond0 inet manual
    slaves enp5s0f0 enp5s0f1
    bond_miimon 100
    bond_mode balance-alb
#Proxmox

auto bond1
iface bond1 inet manual
    slaves enp5s0f2 enp5s0f3
    bond_miimon 100
    bond_mode balance-alb
#DMZ

auto vmbr0
iface vmbr0 inet static
    address  192.168.10.11
    netmask  255.255.255.0
    gateway  192.168.10.1
    bridge_ports bond0
    bridge_stp off
    bridge_fd 0
#Proxmox

auto vmbr1
iface vmbr1 inet manual
    bridge_ports bond1
    bridge_stp off
    bridge_fd 0
#DMZ


Any idea what might cause this ?
 
I can't recommend

bond_mode balance-alb

It's using arp tricks, and can't be stable.

only real stable balance mode, are active-backup or lacp.

It's been stable for multiple years so far. We only have the issue described in this thread for a single machine out of a bunch that uses balance-alb. Unfortunately, our switches does not support lacp which is why we went with alb.