[SOLVED] Intel NUC11 igc network up/down (exceed max 2 second)

FrankvdAa

New Member
Aug 18, 2022
15
4
3
Netherlands
Hi,

I've been running a Proxmox 7 cluster on my Intel NUC11's with dual 2.5Gbit (Intel i225-LM) without issues since August 2022.

However, today I've updated one of the two nodes to the latest (no-subscription) update and ever since I'm having problems with my network interface.

Code:
Start-Date: 2023-05-05  16:12:26
Commandline: apt-get dist-upgrade
Install: pve-kernel-5.15.107-1-pve:amd64 (5.15.107-1, automatic)
Upgrade: udev:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), proxmox-widget-toolkit:amd64 (3.6.3, 3.6.5), pve-firmware:amd64 (3.6-4, 3.6-5), libtinfo6:amd64 (6.2+20201114-2, 6.2+20201114-2+deb11u1), tzdata:amd64 (2021a-1+deb11u9, 2021a-1+deb11u10), zfs-zed:amd64 (2.1.9-pve1, 2.1.11-pve1), libpam-systemd:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), zfs-initramfs:amd64 (2.1.9-pve1, 2.1.11-pve1), usb.ids:amd64 (2022.05.20-0+deb11u1, 2023.01.16-0+deb11u1), spl:amd64 (2.1.9-pve1, 2.1.11-pve1), libnvpair3linux:amd64 (2.1.9-pve1, 2.1.11-pve1), libavahi-common-data:amd64 (0.8-5+deb11u1, 0.8-5+deb11u2), pve-ha-manager:amd64 (3.6.0, 3.6.1), libuutil3linux:amd64 (2.1.9-pve1, 2.1.11-pve1), libsystemd0:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), libzpool5linux:amd64 (2.1.9-pve1, 2.1.11-pve1), libnss-systemd:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), libxml2:amd64 (2.9.10+dfsg-6.7+deb11u3, 2.9.10+dfsg-6.7+deb11u4), systemd:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), libudev1:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), isc-dhcp-common:amd64 (4.4.1-2.3+deb11u1, 4.4.1-2.3+deb11u2), debian-archive-keyring:amd64 (2021.1.1, 2021.1.1+deb11u1), openvswitch-common:amd64 (2.15.0+ds1-2+deb11u2.1, 2.15.0+ds1-2+deb11u4), proxmox-backup-file-restore:amd64 (2.3.3-1, 2.4.1-1), libc6:amd64 (2.31-13+deb11u5, 2.31-13+deb11u6), locales:amd64 (2.31-13+deb11u5, 2.31-13+deb11u6), qemu-server:amd64 (7.4-2, 7.4-3), traceroute:amd64 (1:2.1.0-2+b1, 1:2.1.0-2+deb11u1), isc-dhcp-client:amd64 (4.4.1-2.3+deb11u1, 4.4.1-2.3+deb11u2), grep:amd64 (3.6-1, 3.6-1+deb11u1), pve-i18n:amd64 (2.11-1, 2.12-1), base-files:amd64 (11.1+deb11u6, 11.1+deb11u7), ncurses-base:amd64 (6.2+20201114-2, 6.2+20201114-2+deb11u1), libunbound8:amd64 (1.13.1-1, 1.13.1-1+deb11u1), proxmox-backup-client:amd64 (2.3.3-1, 2.4.1-1), python3-openvswitch:amd64 (2.15.0+ds1-2+deb11u2.1, 2.15.0+ds1-2+deb11u4), libavahi-common3:amd64 (0.8-5+deb11u1, 0.8-5+deb11u2), libpve-http-server-perl:amd64 (4.2-1, 4.2-3), libpve-common-perl:amd64 (7.3-3, 7.3-4), libc-l10n:amd64 (2.31-13+deb11u5, 2.31-13+deb11u6), openvswitch-switch:amd64 (2.15.0+ds1-2+deb11u2.1, 2.15.0+ds1-2+deb11u4), libc-bin:amd64 (2.31-13+deb11u5, 2.31-13+deb11u6), pve-kernel-5.15:amd64 (7.3-3, 7.4-2), libzfs4linux:amd64 (2.1.9-pve1, 2.1.11-pve1), systemd-sysv:amd64 (247.3-7+deb11u1, 247.3-7+1-pmx11u1), libncursesw6:amd64 (6.2+20201114-2, 6.2+20201114-2+deb11u1), ncurses-bin:amd64 (6.2+20201114-2, 6.2+20201114-2+deb11u1), ncurses-term:amd64 (6.2+20201114-2, 6.2+20201114-2+deb11u1), libavahi-client3:amd64 (0.8-5+deb11u1, 0.8-5+deb11u2), pve-edk2-firmware:amd64 (3.20221111-2, 3.20230228-2), libncurses6:amd64 (6.2+20201114-2, 6.2+20201114-2+deb11u1), zfsutils-linux:amd64 (2.1.9-pve1, 2.1.11-pve1), postfix:amd64 (3.5.17-0+deb11u1, 3.5.18-0+deb11u1)
End-Date: 2023-05-05  16:14:17

I have two identical NUC's connected with direct connection (enp89s0) at 2.5Gbit. The other interface (enp88s0) is connected at 1Gbit with my switch.

The direct connection is working fine, but as of today, the LAN interface is going UP and DOWN continuously making that node unusable.

Code:
[    1.421725] igc 0000:58:00.0: PTM enabled, 4ns granularity
[    1.470736] igc 0000:58:00.0 (unnamed net_device) (uninitialized): PHC added
[    1.498460] igc 0000:58:00.0: 4.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x1 link)
[    1.498466] igc 0000:58:00.0 eth0: MAC: 48:21:0b:33:47:98
[    1.498987] igc 0000:59:00.0: PTM enabled, 4ns granularity
[    1.546844] igc 0000:59:00.0 (unnamed net_device) (uninitialized): PHC added
[    1.574541] igc 0000:59:00.0: 4.000 Gb/s available PCIe bandwidth (5.0 GT/s PCIe x1 link)
[    1.574548] igc 0000:59:00.0 eth1: MAC: 54:b2:03:fd:63:fb
[    2.144664] igc 0000:59:00.0 enp89s0: renamed from eth1
[    2.162313] igc 0000:58:00.0 enp88s0: renamed from eth0
[   11.355171] igc 0000:59:00.0 enp89s0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
[   29.491548] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   29.950887] igc 0000:58:00.0 enp88s0: NIC Link is Down
[   42.923652] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   44.518873] igc 0000:58:00.0 enp88s0: NIC Link is Down
[   47.707693] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   48.211037] igc 0000:58:00.0 enp88s0: NIC Link is Down
[  143.540452] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  144.643598] igc 0000:58:00.0 enp88s0: NIC Link is Down
[  232.673170] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  234.836167] igc 0000:58:00.0 enp88s0: exceed max 2 second
[  234.836576] igc 0000:58:00.0 enp88s0: NIC Link is Down
[  280.093529] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  280.648823] igc 0000:58:00.0 enp88s0: NIC Link is Down
[  283.657562] igc 0000:58:00.0 enp88s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  284.280893] igc 0000:58:00.0 enp88s0: NIC Link is Down

Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp88s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master ovs-system state DOWN group default qlen 1000
    link/ether 48:21:0b:33:47:98 brd ff:ff:ff:ff:ff:ff
3: enp89s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    link/ether 54:b2:03:fd:63:fb brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/30 scope global enp89s0
       valid_lft forever preferred_lft forever
    inet6 fe80::56b2:3ff:fefd:63fb/64 scope link
       valid_lft forever preferred_lft forever
4: wlo1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 20:c1:9b:65:79:df brd ff:ff:ff:ff:ff:ff
    altname wlp0s20f3
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 92:13:18:55:f2:65 brd ff:ff:ff:ff:ff:ff
6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 48:21:0b:33:47:98 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::cc40:c7ff:fe3d:6540/64 scope link
       valid_lft forever preferred_lft forever
7: management: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 22:a6:61:0f:ef:f3 brd ff:ff:ff:ff:ff:ff
    inet 172.16.2.251/24 scope global management
       valid_lft forever preferred_lft forever
    inet6 fe80::20a6:61ff:fe0f:eff3/64 scope link
       valid_lft forever preferred_lft forever

Code:
auto lo
iface lo inet loopback

auto enp89s0
iface enp89s0 inet static
        address 10.0.0.1/30
        mtu 9000

iface wlo1 inet manual

auto enp88s0
iface enp88s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0

auto management
iface management inet static
        address 172.16.2.251/24
        gateway 172.16.2.254
        ovs_type OVSIntPort
        ovs_bridge vmbr0

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports enp88s0 management

I've already tried booting the previous kernel (5.15.102-1-pve), pve-kernel-5.15 (7.3-3), openvswitch and firmware packages, but that doesn't make a difference.

Anything else I could try?

Thanks!
 
Last edited:
Nevermind... I ended up re-installing the node as I messed things up trying to fix the issue.

After finishing the installation, the issue still persisted, so I rebooted the switch and the issue was solved.
 
All has been running smooth since Friday, so I decided to update the node in question again to the lastest version. After reboot, all seemed fine at first, but after a few minutes the network issue started again. I now rebooted the switch and since then all has been fine so far.

Seems like some update is triggering something on the network switch, causing the link for the Proxmox node to fail. I'll keep everything as is for at least a week, before progressing with the updates.
 
Today the updated node has again shown several connectivity issues. Before restarting the switch the issues kept on occurring continuously. Today it happened around 25 times, but disappeared without intervention.

I have some monitoring to check connectivity and that was triggered 4 times today.

It looks like some update is causing this issue, as this didn't happen before. The node that wasn't updated, is still running without any issues.

Good to know that the enp89s0 interface is using the same driver, but is directly connected to the other NUC and that hasn't been showing any issues so far, only the interface that is connected to the switch (HP Procurve 1800-24G).

Any help on how to find the root cause is appreciated.
 
Last edited:
Check cables and try to replace cable connected to problem NIC.
Also, try to upgrade switch firmware.
 
Switch firmware is up-to-date, cable has been replaced, but why would the issue only occur after update?
Well, maybe 5.15 kernel has buggy nic module.
I heard Proxmox has 6.2 kernel, can you try it?
And try to disable auto-negotiation on switch side, force connection speed to 2,5 Gbps or 1 Gpbs.
 
Last edited:
Well, problem solved!

Started by updating pve-kernel to 6.2, but that didn't improve things.

However, I noticed things were getting worse as I didn't change anything to node 2, but it began to show the same connectivity problems up to a point where the entire network was unusable.

I managed to borrow a vlan switch from a colleague and since I've replaced the switch, all issues have vanished.

Even after updating all packages to the latest version, the system kept stable during the past week.

So it turned out to be a faulty switch.
 
  • Like
Reactions: remark

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!