Slow Networking until PVE reboot

booleanxor · Jun 27, 2021

Hi all, first time posting here, thanks for reading.

I'm having a strange issue that I can't seem to fix. If I'm downloading a large amount of things from the internet my networking will throttle from the expected 500mbps(max) to about 2mbps both directions (haven't done an iperf yet though). I can get a few hundred gigs down before this happens, however. If I shut down all VM's and reboot PVE everything will go back to normal. Today after the reboot it went back to normal but within 5 minutes or so it happened again. Any other time during normal use there's no problem. I almost wanted to blame my switch, an old ProCurve 48 Port L2 Managed Gig J4904A... it's not that though, runs like a champ.

Hardware is a z600 with 2 sockets (Intel 6 core/12 threads) and 48GB memory.

Hosting pfsense on PVE as router/firewall (all best practices for networking virtio, disabled hardware checksum offload and hardware TCP segmentation offload) and gave 3vCPU 4GB mem - resources do not seem to be a problem during the issue

There was an issue after upgrading that renamed all of my networking devices. One of which was stuck as "rename3" that I haven't been able to fix yet, showing up as an unknown device. That interface is unplugged and unconfigured.

I've turned all the Datacenter/PVE/VM firewalls off.

My /interfaces:
auto lo
iface lo inet loopback

iface ens1 inet manual

iface rename3 inet manual

iface ens5 inet manual

iface ens6 inet manual

iface enp1s0 inet manual

auto vmbr1
iface vmbr1 inet manual
bridge-ports ens1
bridge-stp off
bridge-fd 0
#pfsenseOPT

auto vmbr2
iface vmbr2 inet manual
bridge-ports ens5
bridge-stp off
bridge-fd 0
#pfsenseWAN

auto vmbr3
iface vmbr3 inet static
address 192.168.1.3/24
gateway 192.168.1.1
bridge-ports ens6
bridge-stp off
bridge-fd 0

Dunuin · Jun 27, 2021

Are you sure it is a network and not a storage problem? Maybe your drives caches or RAM is running out of space, it can't cache anymore and the storage is bottlenecking?

booleanxor · Jun 27, 2021

Dunuin said:
Are you sure it is a network and not a storage problem? Maybe your drives caches or RAM is running out of space, it can't cache anymore and the storage is bottlenecking?

I have 3 drives and I've separated things out pretty well. The server isn't particularly busy, it's just a home server. Not a lot of disk activity really. I am offloading pfsense firewall syslog to splunk, but the IO doesn't seem off the charts or anything while downloading. Downloads are typically done on other clients on the network, not the VM's.

Just for some clarification, I have to reboot PVE before things get better. Stopping the download, rebooting pfsense doesn't do anything to help.

Also if it was the ISP throttling the reboot wouldn't do anything. I am not getting a new IP from my ISP after reboot.

Tassir · Jul 1, 2021

I have noticed the same issue a few times now during the last week or so, after I installed the latest Proxmox updates.

The first time it happened, I noticed that my backup was running a lot slower, it took more than 24 hours instead of the usual couple of hours. I did some iperf testing and I narrowed it down to the Proxmox node, I thought it was a one off so I just restarted the node which fixed the issue temporarily.

The network did slow down again since then so I looked further into this:

These are the only logs I could find somewhat related

pve systemd-udevd: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable
pve systemd-udevd: Could not generate persistent MAC address for fwbr100i0: No such file or directory
...
pve kernel: igb 0000:08:00.0 eno1: igb: eno1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX
pve kernel: igb 0000:08:00.0 eno1: Link Speed was downgraded by SmartSpeed

Running ethtool eno1 did indeed confirm that the NIC was running at 100Mb/s speed.

Setting the speed manually to 1000Mb/s as suggested in this post did not change anything.

Restarting the network by running systemctl restart networking seems to set the NIC to full speed though it causes all VMs and containers to drop connection.

proxmox-ve: 6.4-1 (running kernel: 5.4.119-1-pve)
pve-manager: 6.4-8 (running version: 6.4-8/185e14db)
pve-kernel-5.4: 6.4-3
pve-kernel-helper: 6.4-3
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 14.2.20-pve1
ceph-fuse: 14.2.20-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-6
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

booleanxor · Jul 5, 2021

Tassir said:
I have noticed the same issue a few times now during the last week or so, after I installed the latest Proxmox updates.

The first time it happened, I noticed that my backup was running a lot slower, it took more than 24 hours instead of the usual couple of hours. I did some iperf testing and I narrowed it down to the Proxmox node, I thought it was a one off so I just restarted the node which fixed the issue temporarily.

The network did slow down again since then so I looked further into this:

These are the only logs I could find somewhat related

Running ethtool eno1 did indeed confirm that the NIC was running at 100Mb/s speed.

Setting the speed manually to 1000Mb/s as suggested in this post did not change anything.

Restarting the network by running systemctl restart networking seems to set the NIC to full speed though it causes all VMs and containers to drop connection.

Thanks, I did look at all the interfaces with ethtool and they're all running at 1gbit, but not currently having the issue. Next time I'm having the issue I'll see if that changes somehow. Doing some searches on google show it might be a bad cable? I mean it's possible. I made all these myself and it's not like I'm not prone to some error every now and again. If it happens again I'll replace it.

booleanxor · Jul 22, 2021

It's happening right now but all my interfaces are running at 1gbit. Would like to add that restarting networking, ifup -a, and restarting vm's works to fix the issue... still a pain though. So it's definitely network related!

Code:

root@pve:~# ethtool ens1
Settings for ens1:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: off
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes
root@pve:~# ethtool ens5
Settings for ens5:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: Transmit-only
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: d
        Current message level: 0x00000033 (51)
                               drv probe ifdown ifup
        Link detected: yes
root@pve:~# ethtool ens6
Settings for ens6:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: pumbg
        Wake-on: d
        Current message level: 0x00000033 (51)
                               drv probe ifdown ifup
        Link detected: yes
root@pve:~# ethtool enp1s0
Settings for enp1s0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: off
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x000000ff (255)
                               drv probe link timer ifdown ifup rx_err tx_err
        Link detected: yes

booleanxor · Aug 27, 2021

Just tried:

Code:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on irqpoll"

Didn't help, unfortunately...

I was able to recreate and I get the error at EXACTLY the time that my network performance goes down the tubes.

Code:

Aug 26 21:25:25 pve kernel: [<000000005c5f2180>] usb_hcd_irq
Aug 26 21:25:25 pve kernel: [<000000005c5f2180>] usb_hcd_irq
Aug 26 21:25:25 pve kernel: [<000000005c5f2180>] usb_hcd_irq
Aug 26 21:25:25 pve kernel: [<00000000ca5d25c6>] rtl8169_interrupt [r8169]
Aug 26 21:25:25 pve kernel: Disabling IRQ #20

Found another post about the same thing:
https://forum.proxmox.com/threads/network-performance-issue-appears-periodically-on-pve-host.13457/

Just found this.... I think this is the problem. I have a crappy realtek driver!? Could somebody help me out with compiling?
https://forum.proxmox.com/threads/p...l8111-rtl8168-rtl8169-on-board-ethernet.4932/

Found the latest driver... but 'make' errors.

Code:

cd /tmp
wget --no-check-certificate https://rtitwww.realtek.com/rtdrivers/cn/nic1/r8169-6.029.00.tar.bz2
tar xjf r8169-6.029.00.tar.bz2


cd r8169-6.029.00
make

root@pve:/tmp/r8169-6.029.00# make
make -C src/ clean
make[1]: Entering directory '/tmp/r8169-6.029.00/src'
make -C /lib/modules/5.4.106-1-pve/build M=/tmp/r8169-6.029.00/src clean
make[2]: Entering directory '/tmp/r8169-6.029.00/src'
make[2]: *** /lib/modules/5.4.106-1-pve/build: No such file or directory.  Stop.
make[2]: Leaving directory '/tmp/r8169-6.029.00/src'
make[1]: *** [Makefile:106: clean] Error 2
make[1]: Leaving directory '/tmp/r8169-6.029.00/src'
make: *** [Makefile:49: clean] Error 2

booleanxor · Aug 27, 2021

Just ordered an Intel EXPI9404PT from ebay. Definitely tired of this issue. Time to throw money at it lol

Search

Search

Slow Networking until PVE reboot

booleanxor

New Member

Dunuin

Distinguished Member

booleanxor

New Member

Tassir

Active Member

booleanxor

New Member

booleanxor

New Member

booleanxor

New Member

booleanxor

New Member

We value your privacy