Hello,
I have 2 thin clients (Dell Wyse 5070) with same hw configuration, each with a 4 port I350 Intel NIC.
One host (named osiris) is still at PVE 7.4 and works as expected.
The other one (named isis) has been reinstalled with PVE 8 (first 8.0 then 8.1, now at 8.1.10).
Since installing PVE 8 the Intel I350 stopped working.
The kernel spew some errors when the igb module gets loaded and refuses to use the NIC:
Apr 19 14:52:18 isis kernel: igb: Intel(R) Gigabit Ethernet Network Driver
Apr 19 14:52:18 isis kernel: igb: Copyright (c) 2007-2014 Intel Corporation.
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: flags: 64bit ncq sntf pm clo only pmp pio slum part deso sadm sds apst
Apr 19 14:52:18 isis kernel: idma64 idma64.0: Found Intel integrated DMA 64-bit
Apr 19 14:52:18 isis kernel: igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost
Apr 19 14:52:18 isis kernel: ------------[ cut here ]------------
Apr 19 14:52:18 isis kernel: igb: Failed to read reg 0x18!
Apr 19 14:52:18 isis kernel: WARNING: CPU: 3 PID: 131 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Modules linked in: intel_lpss_pci(+) cqhci igb(+) i2c_i801 intel_lpss xhci_pci(+) xhci_pci_renesas i2c_smbus sdhci i2c_algo_bit idma64 ahci(+) xhci_hcd libahci r8169 dca realtek video wmi pinctrl_geminilake aesni_intel crypto_simd cryptd
Apr 19 14:52:18 isis kernel: CPU: 3 PID: 131 Comm: (udev-worker) Not tainted 6.8.4-2-pve #1
Apr 19 14:52:18 isis kernel: Hardware name: Dell Inc. Wyse 5070 Extended Thin Client/012KND, BIOS 1.29.0 02/05/2024
Apr 19 14:52:18 isis kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Code: c7 c6 03 e4 53 c0 e8 8c 13 8e d9 48 8b bb 28 ff ff ff e8 c0 9d 3c d9 84 c0 74 c1 44 89 e6 48 c7 c7 f8 f0 53 c0 e8 bd 3a be d8 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff c3 cc cc cc cc 66 0f
Apr 19 14:52:18 isis kernel: RSP: 0018:ffffbcffc0363848 EFLAGS: 00010246
Apr 19 14:52:18 isis kernel: RAX: 0000000000000000 RBX: ffff97f712364f38 RCX: 0000000000000000
Apr 19 14:52:18 isis kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 19 14:52:18 isis kernel: RBP: ffffbcffc0363858 R08: 0000000000000000 R09: 0000000000000000
Apr 19 14:52:18 isis kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000018
Apr 19 14:52:18 isis kernel: R13: ffff97f701e7a0c0 R14: ffff97f7123649e0 R15: ffff97f712364000
Apr 19 14:52:18 isis kernel: FS: 0000701e554e48c0(0000) GS:ffff97fb6bd80000(0000) knlGS:0000000000000000
Apr 19 14:52:18 isis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 14:52:18 isis kernel: CR2: 0000605acc650098 CR3: 00000001120c0000 CR4: 0000000000350ef0
Apr 19 14:52:18 isis kernel: Call Trace:
For full details see attached boot log (isis_journalctl-b-1.txt)
Details of the 2 NICs can be seen on both isis_lspci-vv.txt and osiris_lspci-vv.txt.
But I think differences might be due to the driver changes between kernel 5.15 and 6.5/6.8
I have tried, without success, the following to solve the issue:
- flash the NIC with the latest firmware
- trigger a recreation of the NIC NVM storage (one of the complaints is that the NVM is corrupted) by resetting the NIC to the defaults (bootutil64e -ALL -DEFAULTCONFIG)
- disable/enable the NIC WOL/PXE boot
- update the Wyse 5070 system bios
- installed PVE kernel 6.8 (isis_journalctl-b.txt)
- tried some kernel boot paramenter to disable power saving fetures of the PCIe (pcie_port_pm=off pcie_aspm=off)
Thinking of an hardware issue I also moved the NIC to the other host running pve 7.4, there the NIC worked like a charm.
No NVM corruption messages or any issue.
NIC worked at full speed.
I think this might be a igb driver issue, somehow it has became more peeky about the board and no longer likes mine.
But I'm not an expert and I am quite stuck since I would like to have both hosts at PVE8.
Do you have suggestions?
Thanks,
Max
I have 2 thin clients (Dell Wyse 5070) with same hw configuration, each with a 4 port I350 Intel NIC.
One host (named osiris) is still at PVE 7.4 and works as expected.
The other one (named isis) has been reinstalled with PVE 8 (first 8.0 then 8.1, now at 8.1.10).
Since installing PVE 8 the Intel I350 stopped working.
The kernel spew some errors when the igb module gets loaded and refuses to use the NIC:
Apr 19 14:52:18 isis kernel: igb: Intel(R) Gigabit Ethernet Network Driver
Apr 19 14:52:18 isis kernel: igb: Copyright (c) 2007-2014 Intel Corporation.
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
Apr 19 14:52:18 isis kernel: ahci 0000:00:12.0: flags: 64bit ncq sntf pm clo only pmp pio slum part deso sadm sds apst
Apr 19 14:52:18 isis kernel: idma64 idma64.0: Found Intel integrated DMA 64-bit
Apr 19 14:52:18 isis kernel: igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost
Apr 19 14:52:18 isis kernel: ------------[ cut here ]------------
Apr 19 14:52:18 isis kernel: igb: Failed to read reg 0x18!
Apr 19 14:52:18 isis kernel: WARNING: CPU: 3 PID: 131 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Modules linked in: intel_lpss_pci(+) cqhci igb(+) i2c_i801 intel_lpss xhci_pci(+) xhci_pci_renesas i2c_smbus sdhci i2c_algo_bit idma64 ahci(+) xhci_hcd libahci r8169 dca realtek video wmi pinctrl_geminilake aesni_intel crypto_simd cryptd
Apr 19 14:52:18 isis kernel: CPU: 3 PID: 131 Comm: (udev-worker) Not tainted 6.8.4-2-pve #1
Apr 19 14:52:18 isis kernel: Hardware name: Dell Inc. Wyse 5070 Extended Thin Client/012KND, BIOS 1.29.0 02/05/2024
Apr 19 14:52:18 isis kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Apr 19 14:52:18 isis kernel: Code: c7 c6 03 e4 53 c0 e8 8c 13 8e d9 48 8b bb 28 ff ff ff e8 c0 9d 3c d9 84 c0 74 c1 44 89 e6 48 c7 c7 f8 f0 53 c0 e8 bd 3a be d8 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff c3 cc cc cc cc 66 0f
Apr 19 14:52:18 isis kernel: RSP: 0018:ffffbcffc0363848 EFLAGS: 00010246
Apr 19 14:52:18 isis kernel: RAX: 0000000000000000 RBX: ffff97f712364f38 RCX: 0000000000000000
Apr 19 14:52:18 isis kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 19 14:52:18 isis kernel: RBP: ffffbcffc0363858 R08: 0000000000000000 R09: 0000000000000000
Apr 19 14:52:18 isis kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000018
Apr 19 14:52:18 isis kernel: R13: ffff97f701e7a0c0 R14: ffff97f7123649e0 R15: ffff97f712364000
Apr 19 14:52:18 isis kernel: FS: 0000701e554e48c0(0000) GS:ffff97fb6bd80000(0000) knlGS:0000000000000000
Apr 19 14:52:18 isis kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 19 14:52:18 isis kernel: CR2: 0000605acc650098 CR3: 00000001120c0000 CR4: 0000000000350ef0
Apr 19 14:52:18 isis kernel: Call Trace:
For full details see attached boot log (isis_journalctl-b-1.txt)
Details of the 2 NICs can be seen on both isis_lspci-vv.txt and osiris_lspci-vv.txt.
But I think differences might be due to the driver changes between kernel 5.15 and 6.5/6.8
I have tried, without success, the following to solve the issue:
- flash the NIC with the latest firmware
- trigger a recreation of the NIC NVM storage (one of the complaints is that the NVM is corrupted) by resetting the NIC to the defaults (bootutil64e -ALL -DEFAULTCONFIG)
- disable/enable the NIC WOL/PXE boot
- update the Wyse 5070 system bios
- installed PVE kernel 6.8 (isis_journalctl-b.txt)
- tried some kernel boot paramenter to disable power saving fetures of the PCIe (pcie_port_pm=off pcie_aspm=off)
Thinking of an hardware issue I also moved the NIC to the other host running pve 7.4, there the NIC worked like a charm.
No NVM corruption messages or any issue.
NIC worked at full speed.
I think this might be a igb driver issue, somehow it has became more peeky about the board and no longer likes mine.
But I'm not an expert and I am quite stuck since I would like to have both hosts at PVE8.
Do you have suggestions?
Thanks,
Max