PVE8 "NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out" fix (to some extent)

tomtom13

Well-Known Member
Dec 28, 2016
69
4
48
42
Hi,
If after an upgrade from 7 to 8, or a fresh install of your interface seem to die randomly (minutes to hours), and only indication you see in your syslog is:
Code:
NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
possibly you've got a Realtek network interface that seem to be affected by this bug.

Backdrop:
At least for me it seem that something has changed in 6.2.16-12-pve from previous versions and driver r8169 seem to be flaky. Before upgrade, I've never even bothered to check which driver those were using - because everything was rock solid. However I've spent two days trying to get to the bottom of this and found an article here that explains how to at least for now get your interfaces working.

Since I don't know what might happen with "medium" article, I will write steps down here for people searching through this forum.

Fix here seems to be at least for time being to use r8168 driver that can be build from "non free" repositories. To do so:

1. check what controller you've got:
Code:
# lspci -nnk | grep -A2 Ethernet
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
        Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1028:080c]
        Kernel driver in use: r8169
        Kernel modules: r8169
if it's not physically r8169 and it's using r8169 driver, this seem to be a culprit.

2. Add non free repos to your debian apt repositories:
Code:
# cat /etc/apt/sources.list
deb http://ftp.debian.org/debian bookworm main contrib non-free non-free-firmware
deb http://ftp.debian.org/debian bookworm-updates main contrib non-free non-free-firmware

3. Update list of available packages, install kernel headers which will allow to build the r8168 driver (it builds during installation), and then install r8168 driver.
Code:
apt update
apt install pve-headers
apt install r8168-dkms

4. reboot

5. check whenever the machine is using a new driver:
Code:
# ethtool -i enp1s0
driver: r8168
version: 8.051.02-NAPI
firmware-version:
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

6. if the driver being used is NOT r8168, may need to black list r8169 but be sure that you have physical access to the machine as you may loose networking alltogether !!! I didn't had to black list it, it just worked for me.
Code:
echo blacklist r8169 >> /etc/modprobe.d/blacklist-r8169.conf

Trouble shooting:
in the article there is a mention that for somebody installation failed on dkms, and they had to do the following:
Code:
dkms build r8168/8.051.02
dkms install r8168/8.051.02
modprobe r8168
systemctl restart networking


To people in charge of 7to8 guide:
Can you please add warning that people with Realtek network cards, using 8169 driver might experience connectivity after the upgrade ?
 
Last edited:
  • Like
Reactions: atp-flo and petgoat
Just wanted to thank you for this guide. I would never have found a solution to this error.
No probs. I myself always skip "medium" articles in search results as it has a lot of buzz words trying to rank it higher in results and rarely has solution to real OS problems ... then I've seen it somewhere somebody pointing to it, and I realised I'm an idiot ... again :D
 
Thank you! I had the problem that my Proxmox just randomly went inaccessible due to this network error. Your guide worked for me*. Hope now I won't lose network connection again.

* after I unsubscribed the enterprise repo
 
Thank you! I had the problem that my Proxmox just randomly went inaccessible due to this network error. Your guide worked for me*. Hope now I won't lose network connection again.

* after I unsubscribed the enterprise repo
FYI, I'm not encouraging to unsubscribe from enterprise repo (just in case admins will misconstrue it). All the machines that I've had contact with with enterprise subscription have intel cards, and only small test (fun) clusters have realtek cards - hence I can't at this point confirm relevance of free/enterprise repos on the problem.


Edit:
Also I can confirm that since week before I made this post - everything seems to be running OK with no network issues AND when updating kernel, the driver automatically gets rebuild as part of an update.

Edit 2:
@apt-flo - can you please provide a bit more detail on the free / enterprise repos that you faced ? Just an idea that somebody might find that helpful, or it may lead to fixing some stuff in enterprise repo if those were the problem.
 
Last edited:
I've reinstalled my Proxmox to latest v8 and only after couple of months, I received the same error. I have Intel card e1000e rather than the mentioned Realtek. This is not mentioned in your post directly but the blogpost also links to a thread for e1000e cards.

https://forum.proxmox.com/threads/e1000-driver-hang.58284/
Code:
root@pve:~# lspci -nnk | grep -A2 Ethernet
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-LM [8086:153a] (rev 04)
        DeviceName:  Onboard LAN
        Subsystem: Hewlett-Packard Company EliteDesk 800 G1 [103c:1998]

Code:
root@pve:~# ethtool -i enp0s25
driver: e1000e
version: 6.8.12-2-pve
firmware-version: 0.13-4
expansion-rom-version:
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
 
Thanks for the heads up mate !
My story now is that after many falured of DKMS module with new"er" kernels, I had a quick word with my self: "What the hell are you doing ?!", and I followed with a simple math:
my_hourly_rate * hours_I've_been_messing_with_this_realtek > price_of_replacing_this_test_cluster_with_units_with_modern_intel_NIC + any_extra_electricity

Life is slightly happier now :)
 
Thanks for the heads up mate !
My story now is that after many falured of DKMS module with new"er" kernels, I had a quick word with my self: "What the hell are you doing ?!", and I followed with a simple math:
my_hourly_rate * hours_I've_been_messing_with_this_realtek > price_of_replacing_this_test_cluster_with_units_with_modern_intel_NIC + any_extra_electricity

Life is slightly happier now :)
I might to do the same. I'm just baffled why it started all the sudden when there was no change in pve host for half a year
 
I might to do the same. I'm just baffled why it started all the sudden when there was no change in pve host for half a year
Kernel upgrade;
TLDR:
Well, I've noticed that DKMS fas failing to compile against the newer version of kernel (problem with linking due to missing functions). Before it I had to hand hold kernel upgrades my self, later some machines simply god stuck on older kernel. And when I was digging through the source tree and diffing I had this epiphany about cost in time spent vs cost of new hardware.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!