Hi friends, I hope you're doing well!
I'm encountering a specific issue in my network and could use some advice.
I experience random bursts of high packet loss in the network, particularly with my internet connection. Here’s the sequence of events:
TX: 78666453 packets (0 dropped)
TX: 118726145 packets (0 dropped)
I'm encountering a specific issue in my network and could use some advice.
I experience random bursts of high packet loss in the network, particularly with my internet connection. Here’s the sequence of events:
- Initially, I noticed these issues with my first WAN connection.
- I then added a second WAN connection, known to be rock-stable.
- Unfortunately, the same issues occurred with the new connection.
Network Setup
My network consists of the following:- 2 Proxmox Hosts: Each connected with:
- 2x 1Gbps LACP links to their respective access switches (no VPC/MLAG).
- 2x 10Gbps LACP links to the core switch.
- 2 Access Switches:
- sin01-edge-psw01:
- Connected to the core switch (Nexus 3000) via a 4x 1Gbps LACP bond.
- WAN edge routers are connected here.
- VLAN 3 to Proxmox millenium-fbe49
- sin01-edge-psw02:
- Connected to the core switch via a 2x 1Gbps LACP bond.
- Fedora host is connected here.
- VLAN 3 to Proxmox millenium-fbe50
- sin01-edge-psw01:
- 1 Core Switch (Nexus 3000):
- Central point of connection for access switches and Proxmox Hosts.
- VLAN 7 to Proxmox millenium-fbe49 and millenium-fbe50
- 2 VLANs
- 1 WAN connection
Observed Behavior
- When I ping the internet from the Fedora host (connected to sin01-edge-psw02), without using the OPNsense VM, there’s no packet loss. This suggests the switching fabric is functioning well.
- With OPNsense VM:
- Sending traffic through the OPNsense VM introduces excessive packet loss.
- A traceroute (MTR) reveals ~20% packet loss between the 192.168.3.0/24 network (VLAN3) and the OPNsense VM interface and from OPNsense to WAN also for traffic in inbound direction.
- People can hear me well in programs like Discord, but i can't hear them at all, indicating inbound traffic loss (For sure the drops)
- Key observation: Excessive packet drops are shown on the Proxmox virtual bridges.
Bridge Statistics
vmbr0 Interface:
RX: 2572036239 packets (637,300,259 dropped)TX: 78666453 packets (0 dropped)
vmbr1 Interface:
RX: 284869426 packets (10,593 dropped)TX: 118726145 packets (0 dropped)
Testing Traffic
- Low WAN Traffic:
- Running a speed test over the WAN causes significant drops (~25,000 drops/sec on vmbr0).
- High LAN Traffic:
- Running iperf3 within the 192.168.3.0/24 subnet shows only ~20 drops/sec—no significant issues.
- Changing Topology:
- Moving the Proxmox-Fedora link entirely to the core switch (10Gbps fiber) reduced packet loss:
- Less overall loss (~1%), but WAN-related traffic still caused heavy drops on the virtual bridge.
- Moving the Proxmox-Fedora link entirely to the core switch (10Gbps fiber) reduced packet loss:
Key Findings
- WAN Traffic Issue: Even low-rate WAN traffic causes massive drops on vmbr0.
- LAN Traffic Stable: High LAN traffic does not produce excessive drops.
- Virtualization Dependency: Drops occur only when traffic passes through a VM (e.g., OPNsense, OpenWrt).
- Host Consistency: Moving VMs between Proxmox hosts didn’t solve the issue (both hosts are identical hardware).
- Topology Changes: Eliminating copper connections between Proxmox and access switches reduces packet loss but doesn’t fully solve the problem.
- Virtual bridge performance or misconfiguration on Proxmox.
- Possible driver, hardware offloading, or interrupt handling problems.
- Any other potential issue?