[SOLVED] lan network traffic degraded, but only over wireguard, and only in one direction (no vms)

beigebox

New Member
Mar 11, 2025
2
0
1
[short version]
new server, receiving traffic, over wireguard = errors

[longer version]
new server was installed a month ago
client computers open a wireguard connection with the server, then mount an nfs export inside the wireguard connection
I have used this setup on multiple vanilla debian systems without issue
this is the first time I have tried this setup with proxmox, it worked fine for weeks, then after a reboot is now broken
network traffic over wireguard to proxmox server results in nfs dropouts and dropped packets

[background]
all of the systems involved are on the same lan, connected via a 10gb unmanaged switch
all of the network adapters are intel 10gb base-t variants
the proxmox server is not running any vms, it's just running wireguard+nfs, sharing a zfs filesystem
all of the systems are running vanilla debian, except for the server which is running proxmox

[symptoms]
client opens wireguard connection to server
client opens nfs connection to server via wireguard, then transfers 10GB file to server
client will transmit say 500MB of data, then the connection is hung up for tens of seconds, another 500MB, repeat
client does iperf3 test to server over wireguard, over 1000 "retr" per second

[notes]
this happens with any traffic I have tried over wireguard, non-wireguard traffic is normal
this only happens when the traffic is being sent to the server, traffic from the server to other systems via wireguard is normal
I have tried multiple different network adapters on the server, no change
server is proxmox 8.3.4 (kernel 6.8.12-8-pve), one client is debian13 (kernel 6.12.17), another client is debian11 (kernel 5.10.0-30)
the only firewall in use is the proxmox firewall itself, the rules are very basic, allow all from client ips and client wireguard ips

[possibilities eliminated]
network hardware - non-wireguard traffic is fine (iperf, rsync+ssh, etc)
server storage - sending files over rsync, or generating random data files locally on the new server works normally
server memory - system passes stressapptest and edac-util shows no errors
client os - the two clients are on very different kernels and only have this problem when sending to the server
mtu settings - I was using a wireguard mtu of 8920, but setting it to the default mtu doesn't change anything

[things that were done during the reboot]
installed additional memory - ran stressapptest afterwards with no errors, so I don't think the memory increase is related
renamed network devices - used /etc/systemd/network/example.link to rename devices (ie: enp193s0f0np0 becomes ens1)
switched from ufw to proxmox firewall - deleted ufw rules and disabled service, recreated rules in proxmox firewall at node level
installed updates - the system was already pretty up to date (ie: pve-manager 8.3.3 to 8.3.4, proxmox-kernel-helper 8.1.0 to 8.1.1)

[current situation]
I've spent the last couple days doing trial and error but haven't been able to find the cause yet
I still find it weird that the issue only happens over wireguard, and only in one direction
the setup is very basic, systems connected via unmanaged switch, firewall rules are "allow all", no custom wireguard settings
is there a bug in the proxmox kernel or wireguard version? did I break something when renaming network devices? ghosts?
 
Nevermind, I figured it out.


resolution to wireguard asymmetrical slowdown issue
I wasted some more time messing with mtu settings, proxmox network settings, nic offload settings, kernel network settings like net.ipv4.tcp_congestion_control.
No change.

Ran across this thread that may or may not be related, slow wireguard traffic on certain systems, no resolution.
Another thread with a similar issue, no resolution.
Ultimately I found the answer in this unrelated thread talking about ways to get slightly better 10gb performance.


I am using a board that has this network setup:
2x 1gb (intel i210) integrated
2x 10gb (intel x710) integrated
4x 10gb (intel x710) discrete card

There is a setting for these cards, and many others, called "ethernet pause and pfc" aka "cos flow control".
The Juniper Network description is:
Flow control supports lossless transmission by regulating traffic flows to avoid dropping frames during periods of congestion.

For reasons I don't understand, when using the proxmox (ubuntu) kernel, this is set to "off" for the x710 adapters, which results in problems.
With the i210 adapters this is set to "on" (good), and on my vanilla debian systems with x710 adapters it is set to "on".

For those running into this issue who want to test the setting. (change "eth0" to whatever your adapter name is)
check the settings currently in use:
ethtool --show-pause eth0
change the setting (in this case turning both rx and tx to "on"):
ethtool --pause eth0 rx on tx on
if you want to apply these settings at boot:
nano /etc/network/interfaces
iface eth0 inet static
pre-up ethtool --pause $IFACE rx on tx on

As soon as I ran "ethtool --pause eth0 rx on tx on" the problems immediately went away, iperf in wireguard now generates zero errors and performance is normal.


unrelated nfs issue and slight rant about ubuntu kernel
Unrelated to the wireguard network adapter settings issue, I ran into another issue in recent weeks on a different system that created disruption.
This was a system running proxmox, while also providing an nfs export over a 100gb connection.
The nfs connection would completely drop out every 1-2 days, sometimes after only a few hours, nothing useful in the error logs.
I ran across multiple different threads going back to 2024-04 mentioning a bug in the ubuntu 6.8 kernel that causes nfs to fail on 10gb+ network connections.

The only way I could find to work around the issue was to use an older kernel
list available kernels:
apt search proxmox-kernel
install a specific old kernel (in this example we are installing "6.5.13-6"):
apt install proxmox-kernel-6.5.13-6-pve-signed
apt install proxmox-headers-6.5.13-6-pve
if you want that kernel to be persistent and not replaced by "apt upgrade":
proxmox-boot-tool kernel pin 6.5.13-6-pve
if you only want to boot that kernel one time as a test:
proxmox-boot-tool kernel pin 6.5.13-6-pve --next-boot
if you later want to un-pin that kernel:
proxmox-boot-tool kernel unpin

This is not meant as a criticism of the proxmox devs, as they are downstream from this kernel.
I mention this because it illustrates there is a cost to using the ubuntu kernel. The ubuntu kernel having zfs sans-dkms and somewhat newer drivers is probably useful to some users, however the ubuntu kernel is consistently less stable than the debian kernel, as neither of these issues exist on vanilla debian systems.
 
Last edited: