PVE8 Intel I350-T4 NIC driver issues

bluemax

New Member
Apr 19, 2024
5
1
3
Hello,

I have 2 thin clients (Dell Wyse 5070) with same hw configuration, each with a 4 port I350 Intel NIC.

One host (named osiris) is still at PVE 7.4 and works as expected.

The other one (named isis) has been reinstalled with PVE 8 (first 8.0 then 8.1, now at 8.1.10).

Since installing PVE 8 the Intel I350 stopped working.

The kernel spew some errors when the igb module gets loaded and refuses to use the NIC:

Apr 19 14:52:18 isis kernel: igb: Intel(R) Gigabit Ethernet Network Driver
Apr 19 14:52:18 isis kernel: igb: Copyright (c) 2007-2014 Intel Corporation.
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: flags: 64bit ncq sntf pm clo only pmp pio slum part deso sadm sds apst
Apr 19 14:52:18 isis kernel: idma64 idma64.0: Found Intel integrated DMA 64-bit
Apr 19 14:52:18 isis kernel: igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost
Apr 19 14:52:18 isis kernel: ------------[ cut here ]------------
Apr 19 14:52:18 isis kernel: igb: Failed to read reg 0x18!
Apr 19 14:52:18 isis kernel: WARNING: CPU: 3 PID: 131 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Modules linked in: intel_lpss_pci(+) cqhci igb(+) i2c_i801 intel_lpss xhci_pci(+) xhci_pci_renesas i2c_smbus sdhci i2c_algo_bit idma64 ahci(+) xhci_hcd libahci r8169 dca realtek video wmi pinctrl_geminilake aesni_intel crypto_simd cryptd
Apr 19 14:52:18 isis kernel: CPU: 3 PID: 131 Comm: (udev-worker) Not tainted 6.8.4-2-pve #1
Apr 19 14:52:18 isis kernel: Hardware name: Dell Inc. Wyse 5070 Extended Thin Client/012KND, BIOS 1.29.0 02/05/2024
Apr 19 14:52:18 isis kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Code: c7 c6 03 e4 53 c0 e8 8c 13 8e d9 48 8b bb 28 ff ff ff e8 c0 9d 3c d9 84 c0 74 c1 44 89 e6 48 c7 c7 f8 f0 53 c0 e8 bd 3a be d8 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff c3 cc cc cc cc 66 0f
Apr 19 14:52:18 isis kernel: RSP: 0018:ffffbcffc0363848 EFLAGS: 00010246
Apr 19 14:52:18 isis kernel: RAX: 0000000000000000 RBX: ffff97f712364f38 RCX: 0000000000000000
Apr 19 14:52:18 isis kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 19 14:52:18 isis kernel: RBP: ffffbcffc0363858 R08: 0000000000000000 R09: 0000000000000000
Apr 19 14:52:18 isis kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000018
Apr 19 14:52:18 isis kernel: R13: ffff97f701e7a0c0 R14: ffff97f7123649e0 R15: ffff97f712364000
Apr 19 14:52:18 isis kernel: FS: 0000701e554e48c0(0000) GS:ffff97fb6bd80000(0000) knlGS:0000000000000000
Apr 19 14:52:18 isis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 14:52:18 isis kernel: CR2: 0000605acc650098 CR3: 00000001120c0000 CR4: 0000000000350ef0
Apr 19 14:52:18 isis kernel: Call Trace:

For full details see attached boot log (isis_journalctl-b-1.txt)

Details of the 2 NICs can be seen on both isis_lspci-vv.txt and osiris_lspci-vv.txt.
But I think differences might be due to the driver changes between kernel 5.15 and 6.5/6.8

I have tried, without success, the following to solve the issue:

- flash the NIC with the latest firmware
- trigger a recreation of the NIC NVM storage (one of the complaints is that the NVM is corrupted) by resetting the NIC to the defaults (bootutil64e -ALL -DEFAULTCONFIG)
- disable/enable the NIC WOL/PXE boot
- update the Wyse 5070 system bios
- installed PVE kernel 6.8 (isis_journalctl-b.txt)
- tried some kernel boot paramenter to disable power saving fetures of the PCIe (pcie_port_pm=off pcie_aspm=off)

Thinking of an hardware issue I also moved the NIC to the other host running pve 7.4, there the NIC worked like a charm.
No NVM corruption messages or any issue.
NIC worked at full speed.

I think this might be a igb driver issue, somehow it has became more peeky about the board and no longer likes mine.

But I'm not an expert and I am quite stuck since I would like to have both hosts at PVE8.

Do you have suggestions?

Thanks,

Max
 

Attachments

  • isis_journalctl-b-1.txt.gz
    40.5 KB · Views: 0
  • isis_journalctl-b.txt.gz
    32.5 KB · Views: 1
  • isis_lscpi-vv.txt.gz
    2.5 KB · Views: 1
  • osiris_lscpi-vv.txt.gz
    2.5 KB · Views: 0
  • Like
Reactions: swarm32
I too am having issues with i350, my system isn't crashing, but the NIC carrier LED comes on during boot, but then drops and thereafter stays off.

It's not you.
 
  • Like
Reactions: swarm32
An update from further investigations, I tried booting other linux distributions to see if the NIC behave the same:

  • booting archlinux (2024.04.01 iso), that is based on kernel 6.8.2, the NIC works!!!
  • ubuntu server 24.04 instead is behaving like Proxmox spewing the same kernel trace.

What is even more puzzling is that the igb driver version (checked with "lsmod gib") is the same as the one in the PVE kernel 6.8.4

There must be something else which eludes my analysis capabilities! :)

Possibly something else in the kernel?

I really do not know, here help from somebody of the Proxmox team is needed.

Hope they keep an eye on this thread.

If you have the same issue please "vote" it.

Max
 
Code:
igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost

Did you notice this line that might be suggesting you have a loose/broken PCIe link?
 
the error message
I too am having issues with i350, my system isn't crashing, but the NIC carrier LED comes on during boot, but then drops and thereafter stays off.

It's not you.
do you also have similar Dell hardware?
 
Code:
igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost

Did you notice this line that might be suggesting you have a loose/broken PCIe link?
Thanks for the reply.

Yes,
I thought about that but then how it could be that changing OS to Arch Linux the NIC works?
Also the same board moved to the other host with PVE7 works.

Maybe there is something wrong in HW but it is not just HW.
 
I too am having issues with i350, my system isn't crashing, but the NIC carrier LED comes on during boot, but then drops and thereafter stays off.

It's not you.
Sad to read that also your system is impacted by this issue.

Also my system does not crash, just there are no more the 4 interfaces from the I350-T4.

It is only the igb driver that crashes (see the logs).

Max
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!