Networking issues PVE8

Abide6017

New Member
Jun 28, 2023
3
7
3
I have been encountering strange networking behaviour and was hoping for help. Im, not a command line wizard and have hit a brick wall.

I have run Proxmox 7 on a 3-node cluster for the last year or so. 2x Optiplex 3080M's & 1x Optiplex 3070M.

I have never encountered any issues, but I recently upgraded from 7 to 8, where my network issues began. I was finding that after an hour or so of being online, pve3 would suddenly go offline (Optiplex 3070m). A hard restart would usually bring it back online, but an hour later, the same would happen again. It always seemed to be this single node.

I figured that perhaps I'd made a mess during the upgrade process, even though it seemed to go smoothly, so I decided to start fresh. I reflashed all 3 nodes with Proxmox 8 and clustered them without issues. I thought that was the end; however, I woke up this morning, and both pve2 & pve3 were offline. Once again, a hard restart fixed the connection issue but again, 5-6 hours later, nodes are going offline again.

I never encountered any issues before attempting the upgrade and have absolutely no idea what is causing this. Id prefer not to have to reflash them all back to 7 again if possible but I dont have the ability to troubleshoot this issue myself.

Any help or advice would be hugely appreciated. Thank you
 
Hi,
please provide the journal from all 3 nodes which fail by running journalctl --since <DATETIME> --unit <DATETIME> > journal.txt, setting a time range from around when the failures happen.
 
Hi,
please provide the journal from all 3 nodes which fail by running journalctl --since <DATETIME> --unit <DATETIME> > journal.txt, setting a time range from around when the failures happen.
Thank you for the reply. Sorry to ask, but how do I correctly format the date and time for this command?
 
  • Like
Reactions: Abide6017
I am having the same problem, funnily enough on an Optiplex 3070 SFF. It seems to be going offline a lot quicker than an hour for me, I just restarted to be able to grab the journal and midway through paging through it, it went offline again, within a few minutes. I'm not sure when the first offline happened, I didn't think much of it and just rebooted. Here is my journal from the last few minutes before/after I restarted again.
https://pastebin.com/zyYMmvmq

Sorry, I'm not sure what I'm looking for... I do see some weird stuff before reboot:
Code:
Jul 01 16:26:41 pve kernel: net_ratelimit: 9 callbacks suppressed
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:41 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:26:42 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: net_ratelimit: 9 callbacks suppressed
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 01 16:28:33 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).

EDIT: Seems this might be related to https://forum.proxmox.com/threads/system-hanging-after-upgrade-nic-driver.129366
 
Last edited:
I managed to locate the cause of this issue after much frustration.
Debian bookworm has some compatibility issues with the r8618/9 nic.
My optiplex 3080M & 3070M all have the r8618 nic.
After spending hours trying to pinpoint I eventually found that installing the r8618-dkms package from the non free repo Immediately resolved my issue.

I added the non free repo to my /etc/apt/sources.list

deb http://ftp.de.debian.org/debian bookworm main non-free non-free-firmware

Then I ran

apt update

apt install r8168-dkms -y && reboot now

Your logs seem to show the r8619 so I'm not sure if there's a different package however hopefully this will put you on the right track.
 
Last edited:
I have exactly the same problem. Thank you for your suggestion with installing the R8168-dkms package. The only thing I found before was to disable some power management features (LINK). However, using the R8168 driver seems to be the better alternative. I then found THIS site which confirms your suggestion.

Your logs seem to show the r8619 so I'm not sure if there's a different package however hopefully this will put you on the right track.
It is the R8619 driver package which is compatible to a bunch of Realtek chips. The R8618-dkms package is an alternative driver for several Realtek chips. Installing the package disables the R8619 module. More details can be found HERE
 
I managed to locate the cause of this issue after much frustration.
Debian bookworm has some compatibility issues with the r8618/9 nic.
My optiplex 3080M & 3070M all have the r8618 nic.
After spending hours trying to pinpoint I eventually found that installing the r8618-dkms package from the non free repo Immediately resolved my issue.

I added the non free repo to my /etc/apt/sources.list

deb http://ftp.de.debian.org/debian bookworm main non-free non-free-firmware

Then I ran

apt update

apt install r8168-dkms -y && reboot now

Your logs seem to show the r8619 so I'm not sure if there's a different package however hopefully this will put you on the right track.
Hi i have the same issue with my optiplex 3080M. I've already installed r8168 but if i ran the command lspci -v i can't see the r8168 to be used:


Code:
root@pve1:~# lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at 4000 [size=256]
        Memory at d1304000 (64-bit, non-prefetchable) [size=4K]
        Memory at d1300000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting
        Capabilities: [178] L1 PM Substates
        Kernel driver in use: r8169
        Kernel modules: r8168


03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
        Subsystem: Device 1d1a:0000
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at d1100000 (32-bit, non-prefetchable) [size=1M]
        I/O ports at 3000 [disabled] [size=32]
        Memory at d1200000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at d1000000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 1c-fd-08-ff-ff-74-ad-0b
        Capabilities: [1a0] Transaction Processing Hints
        Kernel driver in use: igb
        Kernel modules: igb


root@pve1:~# lsmod | grep r8
r8169                 114688  0

Please help me i've installed proxmox 8.0.3 from ISO.
 
Hi guys, I am having the same issue with my Dell Optiplex 3070

before driver installation:
Code:
lspci -v
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
        Flags: bus master, fast devsel, latency 0, IRQ 16
        I/O ports at 3000 [size=256]
        Memory at bf404000 (64-bit, non-prefetchable) [size=4K]
        Memory at bf404000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 01
        Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
        Capabilities: [170] Latency Tolerance Reporting
        Capabilities: [178] L1 PM Substates
        Kernel driver in use: r8169
        Kernel modules: r8169

Installed the r8168 driver and now it does not work after reboot. The only access I have is directly when I connect a screen and keyboard to the Optiplex. SSH does not work, web interface does not work :-(

There is no "Kernel driver in use" line after the update. It only shows:
Code:
Kernel modules: r8168

dkms status show:
Code:
r8168/8.051.02: added

.. added, not installed.

when i try to build or install:
Code:
dkms build -m r8168 -v 8.051.02
Sign command: /lib/modules/6.2.16-6-pve/build/scripts/sign-file
Binary /lib/modules/6.2.16-6-pve/build/scripts/sign-file not found, modules won't be signed Error! Your kernel headers for kernel 6.2.16-6-pve cannot be found at /lib/modules/6.2.16-6-pve/build or /lib/modules/6.2.16-6-pve/source.
Please install the linux-headers-6.2.16-6-pve package or use the --kernelsourcedir option to tell DKMS where it's located.

futher adapter info:
Code:
lspci -nnk | grep 0200 -A3
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1028:0930]
Kernel modules: r8168
 
Last edited:
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
 
Last edited:
Same problem here with the NIC on my old HP desktop so seems this is a common problem. I am a bit newbyish but wonder if this should be flagged to those maintaining the Proxmox codebase.
 
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
Just to report that my Del Optiplex Micro 3060 with the same exact problem, with r8168-dkms is running fine since two days.

My steps were basically the same, although no need to remove r8169 since included in the kernel (at least this is my understanding):

1) Add non-free repository
2) Install pve-headers
3) reboot (maybe not necessary?)
4) Install r8168-dkms, r8169 automatically disabled (https://packages.debian.org/bookworm/r8168-dkms)
5) reboot

Thanks for the insights and the proposed solution.
R
 
Last edited:
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
This fixed my problems too, god bless
 
Just to report that my Del Optiplex Micro 3060 with the same exact problem, with r8168-dkms is running fine since two days.

My steps were basically the same, although no need to remove r8169 since included in the kernel (at least this is my understanding):

1) Add non-free repository
2) Install pve-headers
3) reboot (maybe not necessary?)
4) Install r8168-dkms, r8169 automatically disabled (https://packages.debian.org/bookworm/r8168-dkms)
5) reboot

Thanks for the insights and the proposed solution.
R
I am sorry for quoting myself, but unfortunately after 7 days the NIC stopped working again

20230901_184637.jpg

After reboot is started working again, any suggestion is very welcome.
Thanks
 
Unfortunately the same here, after several days (I think like 10). Looking into syslog there was a problem after some update.
I found this thread that says it was fixed by adding the following into kernel:
r8168.aspm=0 r8168.eee_enable=0 pcie_aspm=off loglevel=3

Rock solid since ..
 
Last edited:
  • Like
Reactions: rRobbie
Thanks. Had to revert back to r8169 as I had no internet connection. Then installed the headers and r8168 is now installed and in use. Hopefully it will work fine.

Edit: uptime now more than four days (previously a few hours at most) , so this is solved! Thanks!

removing r8168:
sudo dkms remove r8168/8.051.02 --all
sudo apt-get purge r8168-dkms
sudo apt autoremove

checking that 8169 is in use again:
lspci -v returns:
Kernel driver in use: r8169
Kernel modules: r8169

updating the headers sudo apt install pve-headers

finally installing r8168 sudo apt install r8168-dkms
Thank you man.

The solution presented to work for me.

Kind regards.
 
  • Like
Reactions: rRobbie
I have exactly the same problem. Thank you for your suggestion with installing the R8168-dkms package. The only thing I found before was to disable some power management features (LINK). However, using the R8168 driver seems to be the better alternative. I then found THIS site which confirms your suggestion.


It is the R8619 driver package which is compatible to a bunch of Realtek chips. The R8618-dkms package is an alternative driver for several Realtek chips. Installing the package disables the R8619 module. More details can be found HERE
That resolved my issue. thanks very much