[SOLVED] Network dies constantly

Eraser

Member
Sep 17, 2019
12
0
6
42
I have a newly build server with a fresh installation of Proxmox VE 6.0. I am totally new at this by the way.
The installation went fine but the network of the host keeps dying randomly. I haven't even got any VM's yet, its just a fresh installation. I can force the problem by uploading a large ISO template. The upload never makes it to the end. When the problem happens the server seems to go offline. Ping gets a timeout. On the host itself I can't even ping the gateway any more. I can only ping the hosts own IP address. To temporary solve the problem I run the following commands:

Bash:
ifdown vmbr0
ifup vmbr0

After that I am online for some time until the problem hits again. I have searched the logs but there seem to be no messages related to the problem. The interface says its state is UP. It's a wired connection by the way. No wireless connections are involved.

lspci shows me that this is my NIC:
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)

The only related problem is that sometimes I see a message at the console telling me about a PCIe Bus Error. For example:
Code:
[2397.331074] nvme 0000:01:00.0 PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Roquester ID)
[2397.331387] nvme 0000:01:00.0 device (144d: a808) error status/mask=00004000/00400000
[2397.331691] nvme 0000101:00.01 [14] Cmpit TO               (First)
That one says something about "nvme" but I have seen others like "pcieport".
I have tried kernel options like pci=nomsi, pci=noaer and pcie_aspm=off but they didn't change much. nomsi broke Proxmox so that definitely wasn't an option.

What can I do to find the problem? Am I overlooking something that kills my LAN connection?

[update]
I have installed net-tools and ifconfig shows me this:
root@pve:~# ifconfig
enp5s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 70:85:c2:d4:d6:06 txqueuelen 1000 (Ethernet)
RX packets 2738215 bytes 3847571335 (3.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1312013 bytes 72783990 (69.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 169 bytes 35396 (34.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 169 bytes 35396 (34.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vmbr0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.10.12 netmask 255.255.255.0 broadcast 192.168.10.255
inet6 fe80::7285:c2ff:fed4:d606 prefixlen 64 scopeid 0x20<link>
ether 70:85:c2:d4:d6:06 txqueuelen 1000 (Ethernet)
RX packets 730084 bytes 1119083341 (1.0 GiB)
RX errors 0 dropped 65 overruns 0 frame 0
TX packets 381440 bytes 20705175 (19.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
So vmbr0 is dropping packets. That's not normal either is it?
 
Last edited:
hello
has nvme not a relation with ssd cards?
That is correct. NVMe seems to be my SSD. I have an NVMe SSD as a second drive.
The same PCIe Bus Error is also reported for other "devices" which let me think that maybe there is a problem with the PCIe controller. Maybe my motherboard hardware is not supported by the Proxmox kernel? But how can I check this? And can I do something about it with drivers or something?
 
Hi,

You can try to update/re-flash your BIOS MB!
Thank you for your input. This is on my list for testing. I'm not eager to flash the BIOS since I can't find any good changelogs for it.
I also want to try an Ubuntu live cd to check if the network keeps working on that. If it's a hardware problem then I guess it should happen on any OS.
 
After testing an Ubuntu live boot (without installing) it was obvious that the same problems occurred. My network went down very soon after booting and the logs didn't show any information about it. Rebooting the NIC in Ubuntu didn't even work.

So I flashed my BIOS with a somewhat newer version for which I didn't knew if it would break more than it would fix. Flashing a previous version is impossible so I didn't have a backup. But... the problem is gone now! My new BIOS is somewhat messy. During boot it doesn't always respond to the F2 button and it even hangs sometimes. I take it for granted since my network seems stable now. Setting up BIOS is something you won't do every day so as long as a normal boot doesn't hang its Ok.
 
I came about this thread because I ran into the same problem with proxmox v6 on a soyoustart (ovh) dedi with a realtek card.

everytime when the server was uploading stuff maxing out the connection for a short while the network connection died. I noticed while running backups to a glusterfs server but also checked with sshfs and extensive iperf uploads.

running a bios update is not really an option, as it would require ordering an extra KVM console with the provider which in this case adds costs and there are usually other risks to any bios update anyway...

while in rescue mode or on a plain debian (buster) install the issue was not reproducable - but directly came into play after installing proxmox. no fancy stuff added, not even setup the bridge stuff yet.

it seems like pve-firmware comes with a driver for realtek r8111/8169 which gets used (check ethtool) but the card is more likely a r8111/8168 - so essentially it's using the wrong driver. you'll get a network connection but this is known to be an unstable combination, as one can read from different sources...

I followed this idea: https://unixblogger.com/how-to-get-your-realtek-rtl8111rtl8168-working-updated-guide/ to install the correct r8168 driver via dkms and also used the newest available package from sid-repo: http://ftp.de.debian.org/debian/pool/non-free/r/r8168/r8168-dkms_8.047.04-1_all.deb on top... you also need to install pve-headers first to enable dkms to compile the module for the proxmox-kernel.

this seem to have solved the problem for now. maybe the proxmox team can look into that and ship the correct and newest driver with their firmware package to avoid that problem in the future and have a better compatibility with these old/wacky realtek stuff? ;-)

will run more tests and report back, if I run into further issues...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!