PVE8 Intel I350-T4 NIC driver issues

bluemax

New Member
Apr 19, 2024
8
1
3
Hello,

I have 2 thin clients (Dell Wyse 5070) with same hw configuration, each with a 4 port I350 Intel NIC.

One host (named osiris) is still at PVE 7.4 and works as expected.

The other one (named isis) has been reinstalled with PVE 8 (first 8.0 then 8.1, now at 8.1.10).

Since installing PVE 8 the Intel I350 stopped working.

The kernel spew some errors when the igb module gets loaded and refuses to use the NIC:

Apr 19 14:52:18 isis kernel: igb: Intel(R) Gigabit Ethernet Network Driver
Apr 19 14:52:18 isis kernel: igb: Copyright (c) 2007-2014 Intel Corporation.
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: flags: 64bit ncq sntf pm clo only pmp pio slum part deso sadm sds apst
Apr 19 14:52:18 isis kernel: idma64 idma64.0: Found Intel integrated DMA 64-bit
Apr 19 14:52:18 isis kernel: igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost
Apr 19 14:52:18 isis kernel: ------------[ cut here ]------------
Apr 19 14:52:18 isis kernel: igb: Failed to read reg 0x18!
Apr 19 14:52:18 isis kernel: WARNING: CPU: 3 PID: 131 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Modules linked in: intel_lpss_pci(+) cqhci igb(+) i2c_i801 intel_lpss xhci_pci(+) xhci_pci_renesas i2c_smbus sdhci i2c_algo_bit idma64 ahci(+) xhci_hcd libahci r8169 dca realtek video wmi pinctrl_geminilake aesni_intel crypto_simd cryptd
Apr 19 14:52:18 isis kernel: CPU: 3 PID: 131 Comm: (udev-worker) Not tainted 6.8.4-2-pve #1
Apr 19 14:52:18 isis kernel: Hardware name: Dell Inc. Wyse 5070 Extended Thin Client/012KND, BIOS 1.29.0 02/05/2024
Apr 19 14:52:18 isis kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Code: c7 c6 03 e4 53 c0 e8 8c 13 8e d9 48 8b bb 28 ff ff ff e8 c0 9d 3c d9 84 c0 74 c1 44 89 e6 48 c7 c7 f8 f0 53 c0 e8 bd 3a be d8 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff c3 cc cc cc cc 66 0f
Apr 19 14:52:18 isis kernel: RSP: 0018:ffffbcffc0363848 EFLAGS: 00010246
Apr 19 14:52:18 isis kernel: RAX: 0000000000000000 RBX: ffff97f712364f38 RCX: 0000000000000000
Apr 19 14:52:18 isis kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 19 14:52:18 isis kernel: RBP: ffffbcffc0363858 R08: 0000000000000000 R09: 0000000000000000
Apr 19 14:52:18 isis kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000018
Apr 19 14:52:18 isis kernel: R13: ffff97f701e7a0c0 R14: ffff97f7123649e0 R15: ffff97f712364000
Apr 19 14:52:18 isis kernel: FS: 0000701e554e48c0(0000) GS:ffff97fb6bd80000(0000) knlGS:0000000000000000
Apr 19 14:52:18 isis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 14:52:18 isis kernel: CR2: 0000605acc650098 CR3: 00000001120c0000 CR4: 0000000000350ef0
Apr 19 14:52:18 isis kernel: Call Trace:

For full details see attached boot log (isis_journalctl-b-1.txt)

Details of the 2 NICs can be seen on both isis_lspci-vv.txt and osiris_lspci-vv.txt.
But I think differences might be due to the driver changes between kernel 5.15 and 6.5/6.8

I have tried, without success, the following to solve the issue:

- flash the NIC with the latest firmware
- trigger a recreation of the NIC NVM storage (one of the complaints is that the NVM is corrupted) by resetting the NIC to the defaults (bootutil64e -ALL -DEFAULTCONFIG)
- disable/enable the NIC WOL/PXE boot
- update the Wyse 5070 system bios
- installed PVE kernel 6.8 (isis_journalctl-b.txt)
- tried some kernel boot paramenter to disable power saving fetures of the PCIe (pcie_port_pm=off pcie_aspm=off)

Thinking of an hardware issue I also moved the NIC to the other host running pve 7.4, there the NIC worked like a charm.
No NVM corruption messages or any issue.
NIC worked at full speed.

I think this might be a igb driver issue, somehow it has became more peeky about the board and no longer likes mine.

But I'm not an expert and I am quite stuck since I would like to have both hosts at PVE8.

Do you have suggestions?

Thanks,

Max
 

Attachments

  • isis_journalctl-b-1.txt.gz
    40.5 KB · Views: 1
  • isis_journalctl-b.txt.gz
    32.5 KB · Views: 1
  • isis_lscpi-vv.txt.gz
    2.5 KB · Views: 2
  • osiris_lscpi-vv.txt.gz
    2.5 KB · Views: 0
  • Like
Reactions: swarm32
I too am having issues with i350, my system isn't crashing, but the NIC carrier LED comes on during boot, but then drops and thereafter stays off.

It's not you.
 
  • Like
Reactions: swarm32
An update from further investigations, I tried booting other linux distributions to see if the NIC behave the same:

  • booting archlinux (2024.04.01 iso), that is based on kernel 6.8.2, the NIC works!!!
  • ubuntu server 24.04 instead is behaving like Proxmox spewing the same kernel trace.

What is even more puzzling is that the igb driver version (checked with "lsmod gib") is the same as the one in the PVE kernel 6.8.4

There must be something else which eludes my analysis capabilities! :)

Possibly something else in the kernel?

I really do not know, here help from somebody of the Proxmox team is needed.

Hope they keep an eye on this thread.

If you have the same issue please "vote" it.

Max
 
Code:
igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost

Did you notice this line that might be suggesting you have a loose/broken PCIe link?
 
the error message
I too am having issues with i350, my system isn't crashing, but the NIC carrier LED comes on during boot, but then drops and thereafter stays off.

It's not you.
do you also have similar Dell hardware?
 
Code:
igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost

Did you notice this line that might be suggesting you have a loose/broken PCIe link?
Thanks for the reply.

Yes,
I thought about that but then how it could be that changing OS to Arch Linux the NIC works?
Also the same board moved to the other host with PVE7 works.

Maybe there is something wrong in HW but it is not just HW.
 
I too am having issues with i350, my system isn't crashing, but the NIC carrier LED comes on during boot, but then drops and thereafter stays off.

It's not you.
Sad to read that also your system is impacted by this issue.

Also my system does not crash, just there are no more the 4 interfaces from the I350-T4.

It is only the igb driver that crashes (see the logs).

Max
 
Did anyone find a solution to this? I just purchased a new HP DL380 Gen 10 which HP fitted with the HP Ethernet 1 Gb 4-port 366FLR Adapter, but when I install the latest version of Proxmox, the installation fails as it is unable to find a working Ethernet adaptor.

I really do not want to use VMWare with all the uncertainty there, but if there is no workable solution to this, I may have to do that.
 
Hello Mark,

unfortunately no solution yet.
Are you sure you are having the same issue I describe?
Check the kernel messages and look for the ones written by the I350 kernel driver igb:
journalctl -b (then search for igb)

If you have the same hopefully we might get some attention by proxmox.

Given that, as I reported, archlinux with a 6.2 kernel and same I350 driver version, I fear it could be a very odd combination of factors that create this issue.

Cheers,

Max
 
Same issue here with a I350-T4V2. Other distros have no problem using the NIC.

The NIC also works in Proxmox 7.4
 
Having the exact same issue here. tried in 8.2 & it sees the NIC during Install but no go after install. trying 7.4 now

any other solution out there?
 
Having the exact same issue here. tried in 8.2 & it sees the NIC during Install but no go after install. trying 7.4 now

any other solution out there?
I ended up using a different NIC (Realtek RTL8125BG). Proxmox 7.4 was also a no-go for me because I had issues with pci passthrough which I didn't have in 8.2.
 
Still no solution.

I cannot and will not switch back to PVE 7.4 nor change NIC, I'm out of re$ource$ :)

I've found a possible hint at where the issue might be is in a kernel thread that, if I got it right, says this could be due to the acpi driver which, for it's own reasons, decides to power off the board before igb starts because he thinks it s not being used.
See:
https://lore.kernel.org/netdev/35dfc9e8-431c-362d-450e-4c6ac1e55434@molgen.mpg.de/T/

This might also explain why on my test on archlinux, with the same igb driver, the board works, and possibly also why during Proxmox setup, as reported in this thread, it also works, install kernel is different.

Too much time has passed since the last time I fiddled with the kernel compilation, I do not know if I will be able to try recompiling the acpi driver with the proposed patch.
But currently I do not have the time to do it.

Hope somebody more skilled than me might be able to give it a check.

Cheers,

Max
 
Still no solution.

I cannot and will not switch back to PVE 7.4 nor change NIC, I'm out of re$ource$ :)

I've found a possible hint at where the issue might be is in a kernel thread that, if I got it right, says this could be due to the acpi driver which, for it's own reasons, decides to power off the board before igb starts because he thinks it s not being used.
See:
https://lore.kernel.org/netdev/35dfc9e8-431c-362d-450e-4c6ac1e55434@molgen.mpg.de/T/

This might also explain why on my test on archlinux, with the same igb driver, the board works, and possibly also why during Proxmox setup, as reported in this thread, it also works, install kernel is different.

Too much time has passed since the last time I fiddled with the kernel compilation, I do not know if I will be able to try recompiling the acpi driver with the proposed patch.
But currently I do not have the time to do it.

Hope somebody more skilled than me might be able to give it a check.

Cheers,

Max
thanks.. better to swap out the NIC.. cant deal with this
 
Same issue with the Asrockrack B650D4U motherboard and the Intel i210 NIC, Proxmox 8. Has anyone found a solution? I have clients losing access to their machines randomly.
 
Hello fellows experimenting a misfortune similar to mine :cool:

Still no solution, but I found out other conditions where the issue is not present:

If I boot proxmox with kernel parameter
pci=noacpi
which means: turns off power management for the PCI subsystem, PCI and pci-e

the I350 NIC works, unfortunately the systems no longer finish the boot since the kernel with a hampered ACPI does not see the emmc with the root zfs (not to mention that the power off command or reboot via CTRL-ALT-DEL and who knows what else are no longer working).

I have not found anything useful to change in the BIOS that might help.

I still think there must be a way out from this condition, but there is the need of somebody more skilled than this old cat (hey Proxmox support gurus, what about this challenge? ;-))

I can live in this condition since I still have the onboard RTL NIC, the I350 is needed by my virtual pfsense, which for the time being keeps running happily on the other twin system still running on PVE 7.4.

Cheers,

Max
 
No improvement on kernel 6.8.8-3-pve / PVE 8.2.4.
Pinning kernel to 6.5 (6.5.13-5-pve) works but not exactly a long-term solution.
 
Same issue with the Asrockrack B650D4U motherboard and the Intel i210 NIC, Proxmox 8. Has anyone found a solution? I have clients losing access to their machines randomly.
@marciglesias17: I do have issues with an onboard i210 on a Supermicro board, too. The NIC is available in Proxmox 8.2.4 and kernel Linux 6.8.8-4-pve (2024-07-26T11:15Z) but there are random conection losses.
 
Last edited:
Same issue here. failed to update igb driver I ended up with passthrough the entire card to Mikrotik CHR.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!