Trying here as my report in the kernel thread is not getting much attention. I know of at least 5 users reporting this issue here and on Reddit - but it definitely also doesn't affect everyone.
The issue is easily tested and consistently reproducible by me with Kernel 7 and using iperf3 across Tailscale. I have reproduced it both within my local network and across interstate connections.
The specific tests below tests are run with exactly the same hardware, same network, same Tailscale version - i.e the _only_ change is the kernel version.
Kernel 7.0.2-2-pve - clear, massive regression:
Vs. Kernel 6.17.13-7-pve results:
However tests between two Tailscale LXCs on the same machine show great performance:
Issue affects multiple NIC types with at least these reported so far:
(this is the latest Mellanox FW from: https://network.nvidia.com/products/adapter-software/firmware-tools/ and https://network.nvidia.com/support/firmware/connectx4lxen/)
Network configuration:
The issue is present and reproducible with at least these kernels:
7.0.0-3-pve
7.0.2-2-pve
The issue is NOT present with all prior kernels including (i.e. the workaround is to pin to an older kernel):
6.17.13-6-pve
6.17.13-7-pve
I have done extensive testing (mostly Calude guided, for what that is worth) - and pretty much ruled out:
ECN, congestion control, TSO/GSO/GRO, tunnel offloads, conntrack, router and NIC firmware, ISP issues
It would be great to get some eyes on this, and I am happy to run tests/supply logs etc.
The issue is easily tested and consistently reproducible by me with Kernel 7 and using iperf3 across Tailscale. I have reproduced it both within my local network and across interstate connections.
The specific tests below tests are run with exactly the same hardware, same network, same Tailscale version - i.e the _only_ change is the kernel version.
Kernel 7.0.2-2-pve - clear, massive regression:
Code:
10:20 user@samba:~ > iperf3 -c 100.93.240.XX -t 30
Connecting to host 100.93.240.XX, port 5201
[ 5] local 100.125.133.YY port 50050 connected to 100.93.240.XX port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 384 KBytes 3.14 Mbits/sec 52 2.40 KBytes
[ 5] 1.00-2.00 sec 640 KBytes 5.24 Mbits/sec 40 2.40 KBytes
[ 5] 2.00-3.00 sec 640 KBytes 5.24 Mbits/sec 46 2.40 KBytes
[ 5] 3.00-4.00 sec 256 KBytes 2.10 Mbits/sec 29 1.20 KBytes
[ 5] 4.00-5.00 sec 128 KBytes 1.05 Mbits/sec 29 2.40 KBytes
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.00 sec 16.0 MBytes 4.47 Mbits/sec 1200 sender
[ 5] 0.00-30.00 sec 16.0 MBytes 4.47 Mbits/sec receiver
iperf Done.
10:20 user@samba:~ > iperf3 -c 100.93.240.XX -t 30 -R
Connecting to host 100.93.240.XX, port 5201
Reverse mode, remote host 100.93.240.XX is sending
[ 5] local 100.125.133.YY port 45092 connected to 100.93.240.XX port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 150 MBytes 1.26 Gbits/sec
[ 5] 1.00-2.00 sec 148 MBytes 1.24 Gbits/sec
[ 5] 2.00-3.00 sec 153 MBytes 1.28 Gbits/sec
[ 5] 3.00-4.00 sec 146 MBytes 1.22 Gbits/sec
[ 5] 4.00-5.00 sec 155 MBytes 1.30 Gbits/sec
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.00 sec 4.27 GBytes 1.22 Gbits/sec 16 sender
[ 5] 0.00-30.00 sec 4.26 GBytes 1.22 Gbits/sec receiver
iperf Done.
Vs. Kernel 6.17.13-7-pve results:
Code:
10:49 user@samba:~ > iperf3 -c 100.93.240.XX -t 5
Connecting to host 100.93.240.XX, port 5201
[ 5] local 100.125.133.YY port 57468 connected to 100.93.240.XX port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 64.2 MBytes 539 Mbits/sec 152 562 KBytes
[ 5] 1.00-2.00 sec 64.0 MBytes 537 Mbits/sec 1 631 KBytes
[ 5] 2.00-3.00 sec 58.9 MBytes 494 Mbits/sec 0 690 KBytes
[ 5] 3.00-4.00 sec 61.4 MBytes 515 Mbits/sec 61 534 KBytes
[ 5] 4.00-5.00 sec 51.6 MBytes 432 Mbits/sec 2 597 KBytes
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 300 MBytes 503 Mbits/sec 216 sender
[ 5] 0.00-5.01 sec 298 MBytes 498 Mbits/sec receiver
iperf Done.
10:49 user@samba:~ > iperf3 -c 100.93.240.XX -t 5 -R
Connecting to host 100.93.240.XX port 5201
Reverse mode, remote host 100.93.240.XX is sending
[ 5] local 100.125.133.11 port 54136 connected to 100.93.240.XX port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 150 MBytes 1.26 Gbits/sec
[ 5] 1.00-2.00 sec 150 MBytes 1.26 Gbits/sec
[ 5] 2.00-3.00 sec 138 MBytes 1.16 Gbits/sec
[ 5] 3.00-4.00 sec 104 MBytes 870 Mbits/sec
[ 5] 4.00-5.00 sec 153 MBytes 1.28 Gbits/sec
..
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 698 MBytes 1.17 Gbits/sec 50 sender
[ 5] 0.00-5.00 sec 695 MBytes 1.17 Gbits/sec receiver
iperf Done.
However tests between two Tailscale LXCs on the same machine show great performance:
Code:
9:42 user@samba:~ > iperf3 -c ts.ip.same.machine1 -t 5
Connecting to host ts.ip.same.machine1, port 5201
[ 5] local ts.ip.same.machine2 port 59294 connected to ts.ip.same.machine1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.38 GBytes 11.9 Gbits/sec 0 4.16 MBytes
[ 5] 1.00-2.00 sec 1.40 GBytes 12.0 Gbits/sec 0 4.16 MBytes
[ 5] 2.00-3.00 sec 1.39 GBytes 11.9 Gbits/sec 0 4.16 MBytes
[ 5] 3.00-4.00 sec 1.39 GBytes 12.0 Gbits/sec 0 4.16 MBytes
[ 5] 4.00-5.00 sec 1.38 GBytes 11.8 Gbits/sec 0 4.16 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 6.94 GBytes 11.9 Gbits/sec 0 sender
[ 5] 0.00-5.00 sec 6.94 GBytes 11.9 Gbits/sec receiver
iperf Done.
09:43 user@samba:~ > iperf3 -c ts.ip.same.machine1 -t 5 -R
Connecting to host ts.ip.same.machine1, port 5201
Reverse mode, remote host ts.ip.same.machine1 is sending
[ 5] local ts.ip.same.machine2 port 59310 connected to ts.ip.same.machine1 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.39 GBytes 11.9 Gbits/sec
[ 5] 1.00-2.00 sec 1.40 GBytes 12.0 Gbits/sec
[ 5] 2.00-3.00 sec 1.41 GBytes 12.2 Gbits/sec
[ 5] 3.00-4.00 sec 1.42 GBytes 12.2 Gbits/sec
[ 5] 4.00-5.00 sec 1.41 GBytes 12.1 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.00 sec 7.03 GBytes 12.1 Gbits/sec 0 sender
[ 5] 0.00-5.00 sec 7.03 GBytes 12.1 Gbits/sec receiver
iperf Done
Issue affects multiple NIC types with at least these reported so far:
- AQtion AQC113CS
- Aquantia Corp. AQC113C NBase-T/IEEE 802.3an Ethernet Controller [Marvell Scalable mGig] (rev 03)
- & mine:
Code:
Device Type: ConnectX4LX
Part Number: MCX4121A-ACA_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110034
12:22 root@pve-homeserver25:~/manually_installed/Mellanox $ ./mlxup --query
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX4LX
Part Number: MCX4121A-ACA_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110034
PCI Device Name: /dev/mst/mt4117_pciconf0
Base MAC: 0c42a12d0cd2
Versions: Current Available
FW 14.32.1912 14.32.1010
PXE 3.6.0502 3.6.0502
UEFI 14.25.0017 14.25.0017
Status: Up to date
(this is the latest Mellanox FW from: https://network.nvidia.com/products/adapter-software/firmware-tools/ and https://network.nvidia.com/support/firmware/connectx4lxen/)
Network configuration:
- Proxmox host uses a standard Linux bridge (vmbr0) — no SR-IOV, no VLAN-aware bridge, etc
- Physical NIC (nic2, Mellanox ConnectX-4 Lx, mlx5_core) is a bridge member of vmbr0
- LXC containers connect via veth pairs through the bridge, with Proxmox firewall enabled (fwbr/fwln/fwpr chain)
- Tailscale runs inside the LXC (not on the Proxmox host), so WireGuard UDP packets egress via: tailscale0 (LXC) → veth → fwbr → vmbr0 → nic2 (mlx5_core) → physical
- lxc.mount.entry: /dev/net/tun passthrough (required for Tailscale in LXC)
- Proxmox firewall enabled on the LXC interface (firewall=1 in LXC config) - issue still occurs with firewall disabled, though
- LXCs are unprivileged, Debian 13, running basically nothing but Tailscale and samba (also effects my Jellyfin container and iperf3 performance as above!).
The issue is present and reproducible with at least these kernels:
7.0.0-3-pve
7.0.2-2-pve
The issue is NOT present with all prior kernels including (i.e. the workaround is to pin to an older kernel):
6.17.13-6-pve
6.17.13-7-pve
I have done extensive testing (mostly Calude guided, for what that is worth) - and pretty much ruled out:
ECN, congestion control, TSO/GSO/GRO, tunnel offloads, conntrack, router and NIC firmware, ISP issues
It would be great to get some eyes on this, and I am happy to run tests/supply logs etc.
Last edited: