Hi, we experience some weird RX discards on a few of our Proxmox nodes after we recently switched from single to bonded interfaces for vm-bridges, and we can't seem to figure out why. Since we utilize CEPH, we also need to have access to the CEPH cluster on the same bond, both for VMs/CTs and the hypervizor itself -- hence we have configured access to the CEPH public network on a separate vlan and vmbr. We use mellanox connectx3 and arista 7050QX. There are rejected packets are a constant rate of ~110pps on the vmbr on all nodes. From time to time when certain VMs are heavily accessed the rejects/errors peak at 25k pps.
Can anyone help to pinpoint the problem?
Proxmox node network configuration
Arista 7050QX switch config:
See attached screenshot of discards/errors on ports.
Can anyone help to pinpoint the problem?
Proxmox node network configuration
Code:
auto enp1s0
iface enp1s0 inet static
address 10.40.24.107/22
gateway 10.40.24.1
# 1GbE proxmox cluster/corosync
auto enp4s0
iface enp4s0 inet manual
# 40GbE lag member A
auto enp4s0d1
iface enp4s0d1 inet manual
# 40GbE lag member B
auto bond0
iface bond0 inet manual
bond-slaves enp4s0 enp4s0d1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000
#80GbE bond for VMs/CTs and CEPH
auto bond0.4028
iface bond0.4028 inet manual
mtu 8244
vlan-id 4028
#vlan for CEPH public net
auto vmbr0
iface vmbr0 inet static
address 10.40.20.107/22
bridge-ports bond0
bridge-stp off
bridge-fd 0
mtu 9000
#bridge for VMs/CTs
auto vmbr1
iface vmbr1 inet static
address 10.40.28.107/22
bridge-ports bond0.4028
bridge-stp off
bridge-fd 0
mtu 8244
#brigde for CEPH (both for HV and VMs/CTs)
Arista 7050QX switch config:
Code:
interface Ethernet3/1
description hk-proxnode-07-LAG-member
mtu 9214
flowcontrol send on
flowcontrol receive on
speed forced 40gfull
channel-group 15 mode active
!
interface Ethernet15/1
description hk-proxnode-07-LAG-member
mtu 9214
flowcontrol send on
flowcontrol receive on
speed forced 40gfull
channel-group 15 mode active
!
interface Port-Channel15
description hk-proxnode-07-bond
mtu 9214
switchport trunk native vlan 4020
switchport trunk allowed vlan 2-4080
switchport mode trunk
!
See attached screenshot of discards/errors on ports.