Networking issues PVE8

Abide6017

New Member
Jun 28, 2023
3
7
3
I have been encountering strange networking behaviour and was hoping for help. Im, not a command line wizard and have hit a brick wall.

I have run Proxmox 7 on a 3-node cluster for the last year or so. 2x Optiplex 3080M's & 1x Optiplex 3070M.

I have never encountered any issues, but I recently upgraded from 7 to 8, where my network issues began. I was finding that after an hour or so of being online, pve3 would suddenly go offline (Optiplex 3070m). A hard restart would usually bring it back online, but an hour later, the same would happen again. It always seemed to be this single node.

I figured that perhaps I'd made a mess during the upgrade process, even though it seemed to go smoothly, so I decided to start fresh. I reflashed all 3 nodes with Proxmox 8 and clustered them without issues. I thought that was the end; however, I woke up this morning, and both pve2 & pve3 were offline. Once again, a hard restart fixed the connection issue but again, 5-6 hours later, nodes are going offline again.

I never encountered any issues before attempting the upgrade and have absolutely no idea what is causing this. Id prefer not to have to reflash them all back to 7 again if possible but I dont have the ability to troubleshoot this issue myself.

Any help or advice would be hugely appreciated. Thank you
 
Hi,
please provide the journal from all 3 nodes which fail by running journalctl --since <DATETIME> --unit <DATETIME> > journal.txt, setting a time range from around when the failures happen.
 
Hi,
please provide the journal from all 3 nodes which fail by running journalctl --since <DATETIME> --unit <DATETIME> > journal.txt, setting a time range from around when the failures happen.
Thank you for the reply. Sorry to ask, but how do I correctly format the date and time for this command?
 
  • Like
Reactions: Abide6017
I am having the same problem, funnily enough on an Optiplex 3070 SFF. It seems to be going offline a lot quicker than an hour for me, I just restarted to be able to grab the journal and midway through paging through it, it went offline again, within a few minutes. I'm not sure when the first offline happened, I didn't think much of it and just rebooted. Here is my journal from the last few minutes before/after I restarted again.
https://pastebin.com/zyYMmvmq

Sorry, I'm not sure what I'm looking for... I do see some weird stuff before reboot:
Code:
Jul 01 16:26:41 pve kernel: net_ratelimit: 9 callbacks suppressed
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: net_ratelimit: 9 callbacks suppressed
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).

EDIT: Seems this might be related to https://forum.proxmox.com/threads/system-hanging-after-upgrade-nic-driver.129366
 
Last edited:
I managed to locate the cause of this issue after much frustration.
Debian bookworm has some compatibility issues with the r8618/9 nic.
My optiplex 3080M & 3070M all have the r8618 nic.
After spending hours trying to pinpoint I eventually found that installing the r8618-dkms package from the non free repo Immediately resolved my issue.

I added the non free repo to my /etc/apt/sources.list

deb http://ftp.de.debian.org/debian bookworm main non-free non-free-firmware

Then I ran

apt update

apt install r8168-dkms -y && reboot now

Your logs seem to show the r8619 so I'm not sure if there's a different package however hopefully this will put you on the right track.
 
Last edited:
I have exactly the same problem. Thank you for your suggestion with installing the R8168-dkms package. The only thing I found before was to disable some power management features (LINK). However, using the R8168 driver seems to be the better alternative. I then found THIS site which confirms your suggestion.

Your logs seem to show the r8619 so I'm not sure if there's a different package however hopefully this will put you on the right track.
It is the R8619 driver package which is compatible to a bunch of Realtek chips. The R8618-dkms package is an alternative driver for several Realtek chips. Installing the package disables the R8619 module. More details can be found HERE
 
I managed to locate the cause of this issue after much frustration.
Debian bookworm has some compatibility issues with the r8618/9 nic.
My optiplex 3080M & 3070M all have the r8618 nic.
After spending hours trying to pinpoint I eventually found that installing the r8618-dkms package from the non free repo Immediately resolved my issue.

I added the non free repo to my /etc/apt/sources.list

deb http://ftp.de.debian.org/debian bookworm main non-free non-free-firmware

Then I ran

apt update

apt install r8168-dkms -y && reboot now

Your logs seem to show the r8619 so I'm not sure if there's a different package however hopefully this will put you on the right track.
Hi i have the same issue with my optiplex 3080M. I've already installed r8168 but if i ran the command lspci -v i can't see the r8168 to be used:


Code:
root@pve1:~# lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at 4000 [size=256]
        Memory at d1304000 (64-bit, non-prefetchable) [size=4K]
        Memory at d1300000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting
        Capabilities: [178] L1 PM Substates
        Kernel driver in use: r8169
        Kernel modules: r8168


03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
        Subsystem: Device 1d1a:0000
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at d1100000 (32-bit, non-prefetchable) [size=1M]
        I/O ports at 3000 [disabled] [size=32]
        Memory at d1200000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at d1000000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 1c-fd-08-ff-ff-74-ad-0b
        Capabilities: [1a0] Transaction Processing Hints
        Kernel driver in use: igb
        Kernel modules: igb


root@pve1:~# lsmod | grep r8
r8169                 114688  0

Please help me i've installed proxmox 8.0.3 from ISO.
 
Hi guys, I am having the same issue with my Dell Optiplex 3070

before driver installation:
Code:
lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at 3000 [size=256]
        Memory at bf404000 (64-bit, non-prefetchable) [size=4K]
        Memory at bf404000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting
        Capabilities: [178] L1 PM Substates
        Kernel driver in use: r8169
        Kernel modules: r8169

Installed the r8168 driver and now it does not work after reboot. The only access I have is directly when I connect a screen and keyboard to the Optiplex. SSH does not work, web interface does not work :-(

There is no "Kernel driver in use" line after the update. It only shows:
Code:
Kernel modules: r8168

dkms status show:
Code:
r8168/8.051.02: added

.. added, not installed.

when i try to build or install:
Code:
dkms build -m r8168 -v 8.051.02
Sign command: /lib/modules/6.2.16-6-pve/build/scripts/sign-file
Binary /lib/modules/6.2.16-6-pve/build/scripts/sign-file not found, modules won't be signed Error! Your kernel headers for kernel 6.2.16-6-pve cannot be found at /lib/modules/6.2.16-6-pve/build or /lib/modules/6.2.16-6-pve/source.
Please install the linux-headers-6.2.16-6-pve package or use the --kernelsourcedir option to tell DKMS where it's located.

futher adapter info:
Code:
lspci -nnk | grep 0200 -A3
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1028:0930]
Kernel modules: r8168
 
Last edited:
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
 
Last edited:
Same problem here with the NIC on my old HP desktop so seems this is a common problem. I am a bit newbyish but wonder if this should be flagged to those maintaining the Proxmox codebase.
 
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
Just to report that my Del Optiplex Micro 3060 with the same exact problem, with r8168-dkms is running fine since two days.

My steps were basically the same, although no need to remove r8169 since included in the kernel (at least this is my understanding):

1) Add non-free repository
2) Install pve-headers
3) reboot (maybe not necessary?)
4) Install r8168-dkms, r8169 automatically disabled (https://packages.debian.org/bookworm/r8168-dkms)
5) reboot

Thanks for the insights and the proposed solution.
R
 
Last edited:
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
This fixed my problems too, god bless
 
Just to report that my Del Optiplex Micro 3060 with the same exact problem, with r8168-dkms is running fine since two days.

My steps were basically the same, although no need to remove r8169 since included in the kernel (at least this is my understanding):

1) Add non-free repository
2) Install pve-headers
3) reboot (maybe not necessary?)
4) Install r8168-dkms, r8169 automatically disabled (https://packages.debian.org/bookworm/r8168-dkms)
5) reboot

Thanks for the insights and the proposed solution.
R
I am sorry for quoting myself, but unfortunately after 7 days the NIC stopped working again

20230901_184637.jpg

After reboot is started working again, any suggestion is very welcome.
Thanks
 
Unfortunately the same here, after several days (I think like 10). Looking into syslog there was a problem after some update.
I found this thread that says it was fixed by adding the following into kernel:
r8168.aspm=0 r8168.eee_enable=0 pcie_aspm=off loglevel=3

Rock solid since ..
 
Last edited:
  • Like
Reactions: rRobbie
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
Thank you man.

The solution presented to work for me.

Kind regards.
 
  • Like
Reactions: rRobbie
I have exactly the same problem. Thank you for your suggestion with installing the R8168-dkms package. The only thing I found before was to disable some power management features (LINK). However, using the R8168 driver seems to be the better alternative. I then found THIS site which confirms your suggestion.


It is the R8619 driver package which is compatible to a bunch of Realtek chips. The R8618-dkms package is an alternative driver for several Realtek chips. Installing the package disables the R8619 module. More details can be found HERE
That resolved my issue. thanks very much
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!