Servernetzwerkkarte ist sporadisch link down

DonaldK

New Member
Mar 19, 2024
5
0
1
Hallo zusammen,

wir haben ein scheinbar systemisches Problem: bei zwei baugleichen Servern fällt von Zeit zu Zeit die Netzwerkarte aus. Zuerst haben wir bei Kunde 1 gesucht, Kabel getauscht, Port auf dem Switch getauscht. Nichts Auffälliges. Heute dann Kunde 2, selbes Problem mitten am Tag.

Es handelt sich auf dem Board um Dual 1GbE LAN (Intel® I210-AT).

Anbei habe ich im syslog Folgendes gefunden:

------------------------------------------------------------------------------------------------------------------------------------------------

Sep 25 10:36:50 prx kernel: igb 0000:07:00.0 eno1: PCIe link lost
Sep 25 10:36:50 prx kernel: ------------[ cut here ]------------
Sep 25 10:36:50 prx kernel: igb: Failed to read reg 0xc030!
Sep 25 10:36:50 prx kernel: WARNING: CPU: 1 PID: 3289658 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Sep 25 10:36:50 prx kernel: Modules linked in: usblp tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables sunrpc bonding tls softdog binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd amdgpu ipmi_ssif kvm_amd amdxcp kvm iommu_v2 drm_buddy gpu_sched drm_suballoc_helper drm_ttm_helper ttm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel drm_display_helper aesni_intel cec crypto_simd cryptd ast rc_core rapl video drm_shmem_helper pcspkr k10temp wmi ccp drm_kms_helper acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor rndis_host cdc_ether usbnet mii usbmouse raid6_pq libcrc32c hid_generic usbkbd usbhid uas hid usb_storage xhci_pci nvme xhci_pci_renesas crc32_pclmul igb ahci nvme_core xhci_hcd i2c_piix4 i2c_algo_bit libahci dca
Sep 25 10:36:50 prx kernel: nvme_common
Sep 25 10:36:50 prx kernel: CPU: 1 PID: 3289658 Comm: kworker/1:1 Tainted: P O 6.5.11-4-pve #1
Sep 25 10:36:50 prx kernel: Hardware name: Supermicro AS -3015A-I/H13SAE-MF, BIOS 1.2 12/18/2023
Sep 25 10:36:50 prx kernel: Workqueue: events igb_watchdog_task [igb]
Sep 25 10:36:50 prx kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Sep 25 10:36:50 prx kernel: Code: c7 c6 03 64 3d c0 e8 1c 4d 99 c1 48 8b bb 28 ff ff ff e8 a0 16 4f c1 84 c0 74 c1 44 89 e6 48 c7 c7 f8 70 3d c0 e8 dd 8d d4 c0 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff e9 69 00 ce c1 66 0f
Sep 25 10:36:50 prx kernel: RSP: 0018:ffffa7b3413abd98 EFLAGS: 00010246
Sep 25 10:36:50 prx kernel: RAX: 0000000000000000 RBX: ffff902213d7cf18 RCX: 0000000000000000
Sep 25 10:36:50 prx kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 25 10:36:50 prx kernel: RBP: ffffa7b3413abda8 R08: 0000000000000000 R09: 0000000000000000
Sep 25 10:36:50 prx kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000c030
Sep 25 10:36:50 prx kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff902212d75340
Sep 25 10:36:50 prx kernel: FS: 0000000000000000(0000) GS:ffff90291e440000(0000) knlGS:0000000000000000
Sep 25 10:36:50 prx kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 10:36:50 prx kernel: CR2: 0000335002023000 CR3: 0000000456e34000 CR4: 0000000000750ee0
Sep 25 10:36:50 prx kernel: PKRU: 55555554
Sep 25 10:36:50 prx kernel: Call Trace:
Sep 25 10:36:50 prx kernel: <TASK>
Sep 25 10:36:50 prx kernel: ? show_regs+0x6d/0x80
Sep 25 10:36:50 prx kernel: ? __warn+0x89/0x160
Sep 25 10:36:50 prx kernel: ? igb_rd32+0x93/0xb0 [igb]
Sep 25 10:36:50 prx kernel: ? report_bug+0x17e/0x1b0
Sep 25 10:36:50 prx kernel: ? handle_bug+0x46/0x90
Sep 25 10:36:50 prx kernel: ? exc_invalid_op+0x18/0x80
Sep 25 10:36:50 prx kernel: ? asm_exc_invalid_op+0x1b/0x20
Sep 25 10:36:50 prx kernel: ? igb_rd32+0x93/0xb0 [igb]
Sep 25 10:36:50 prx kernel: ? igb_rd32+0x93/0xb0 [igb]
Sep 25 10:36:50 prx kernel: igb_update_stats+0x89/0x830 [igb]
Sep 25 10:36:50 prx kernel: igb_watchdog_task+0x12d/0x880 [igb]
Sep 25 10:36:50 prx kernel: ? psi_avgs_work+0x6b/0xf0
Sep 25 10:36:50 prx kernel: process_one_work+0x23b/0x450
Sep 25 10:36:50 prx kernel: worker_thread+0x50/0x3f0
Sep 25 10:36:50 prx kernel: ? __pfx_worker_thread+0x10/0x10
Sep 25 10:36:50 prx kernel: kthread+0xef/0x120
Sep 25 10:36:50 prx kernel: ? __pfx_kthread+0x10/0x10
Sep 25 10:36:50 prx kernel: ret_from_fork+0x44/0x70
Sep 25 10:36:50 prx kernel: ? __pfx_kthread+0x10/0x10
Sep 25 10:36:50 prx kernel: ret_from_fork_asm+0x1b/0x30
Sep 25 10:36:50 prx kernel: </TASK>
Sep 25 10:36:50 prx kernel: ---[ end trace 0000000000000000 ]---
Sep 25 10:36:55 prx kernel: ------------[ cut here ]------------
Sep 25 10:36:55 prx kernel: NETDEV WATCHDOG: eno1 (igb): transmit queue 0 timed out 6112 ms
Sep 25 10:36:55 prx kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x260/0x270
Sep 25 10:36:55 prx kernel: Modules linked in: usblp tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables sunrpc bonding tls softdog binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd amdgpu ipmi_ssif kvm_amd amdxcp kvm iommu_v2 drm_buddy gpu_sched drm_suballoc_helper drm_ttm_helper ttm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel drm_display_helper aesni_intel cec crypto_simd cryptd ast rc_core rapl video drm_shmem_helper pcspkr k10temp wmi ccp drm_kms_helper acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor rndis_host cdc_ether usbnet mii usbmouse raid6_pq libcrc32c hid_generic usbkbd usbhid uas hid usb_storage xhci_pci nvme xhci_pci_renesas crc32_pclmul igb ahci nvme_core xhci_hcd i2c_piix4 i2c_algo_bit libahci dca
Sep 25 10:36:55 prx kernel: nvme_common
Sep 25 10:36:55 prx kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P W O 6.5.11-4-pve #1
Sep 25 10:36:55 prx kernel: Hardware name: Supermicro AS -3015A-I/H13SAE-MF, BIOS 1.2 12/18/2023
Sep 25 10:36:55 prx kernel: RIP: 0010:dev_watchdog+0x260/0x270
Sep 25 10:36:55 prx kernel: Code: ff ff 48 89 df c6 05 77 3b 78 01 01 e8 b9 80 f9 ff 44 8b 45 cc 44 89 f9 48 89 de 48 89 c2 48 c7 c7 b0 9e a3 82 e8 70 ce 33 ff <0f> 0b e9 1d ff ff ff 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
Sep 25 10:36:55 prx kernel: RSP: 0018:ffffa7b34024ce40 EFLAGS: 00010246
Sep 25 10:36:55 prx kernel: RAX: 0000000000000000 RBX: ffff902213d7c000 RCX: 0000000000000000
Sep 25 10:36:55 prx kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 25 10:36:55 prx kernel: RBP: ffffa7b34024ce78 R08: 0000000000000000 R09: 0000000000000000
Sep 25 10:36:55 prx kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff902213d7c4c8
Sep 25 10:36:55 prx kernel: R13: ffff902213d7c41c R14: 0000000000000000 R15: 0000000000000000
Sep 25 10:36:55 prx kernel: FS: 0000000000000000(0000) GS:ffff90291e440000(0000) knlGS:0000000000000000
Sep 25 10:36:55 prx kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 10:36:55 prx kernel: CR2: 0000335002023000 CR3: 0000000456e34000 CR4: 0000000000750ee0
Sep 25 10:36:55 prx kernel: PKRU: 55555554
Sep 25 10:36:55 prx kernel: Call Trace:
Sep 25 10:36:55 prx kernel: <IRQ>
Sep 25 10:36:55 prx kernel: ? show_regs+0x6d/0x80
Sep 25 10:36:55 prx kernel: ? __warn+0x89/0x160
Sep 25 10:36:55 prx kernel: ? dev_watchdog+0x260/0x270
Sep 25 10:36:55 prx kernel: ? report_bug+0x17e/0x1b0
Sep 25 10:36:55 prx kernel: ? handle_bug+0x46/0x90
Sep 25 10:36:55 prx kernel: ? exc_invalid_op+0x18/0x80
Sep 25 10:36:55 prx kernel: ? asm_exc_invalid_op+0x1b/0x20
Sep 25 10:36:55 prx kernel: ? dev_watchdog+0x260/0x270
Sep 25 10:36:55 prx kernel: ? __pfx_dev_watchdog+0x10/0x10
Sep 25 10:36:55 prx kernel: call_timer_fn+0x29/0x160
Sep 25 10:36:55 prx kernel: ? __pfx_dev_watchdog+0x10/0x10
Sep 25 10:36:55 prx kernel: __run_timers+0x259/0x310
Sep 25 10:36:55 prx kernel: run_timer_softirq+0x1d/0x40
Sep 25 10:36:55 prx kernel: __do_softirq+0xd1/0x303
Sep 25 10:36:55 prx kernel: __irq_exit_rcu+0x75/0xa0
Sep 25 10:36:55 prx kernel: irq_exit_rcu+0xe/0x20
Sep 25 10:36:55 prx kernel: sysvec_apic_timer_interrupt+0x92/0xd0
Sep 25 10:36:55 prx kernel: </IRQ>
Sep 25 10:36:55 prx kernel: <TASK>
Sep 25 10:36:55 prx kernel: asm_sysvec_apic_timer_interrupt+0x1b/0x20
Sep 25 10:36:55 prx kernel: RIP: 0010:cpuidle_enter_state+0xce/0x470
Sep 25 10:36:55 prx kernel: Code: 28 10 ff e8 64 f6 ff ff 8b 53 04 49 89 c6 0f 1f 44 00 00 31 ff e8 22 25 0f ff 80 7d d7 00 0f 85 e7 01 00 00 fb 0f 1f 44 00 00 <45> 85 ff 0f 88 83 01 00 00 49 63 d7 4c 89 f1 48 8d 04 52 48 8d 04
Sep 25 10:36:55 prx kernel: RSP: 0018:ffffa7b34019fe50 EFLAGS: 00000246
Sep 25 10:36:55 prx kernel: RAX: 0000000000000000 RBX: ffff90220b459400 RCX: 0000000000000000
Sep 25 10:36:55 prx kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Sep 25 10:36:55 prx kernel: RBP: ffffa7b34019fe88 R08: 0000000000000000 R09: 0000000000000000
Sep 25 10:36:55 prx kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Sep 25 10:36:55 prx kernel: R13: ffffffff83477c60 R14: 0011d13d78487ac3 R15: 0000000000000001
Sep 25 10:36:55 prx kernel: cpuidle_enter+0x2e/0x50
Sep 25 10:36:55 prx kernel: call_cpuidle+0x23/0x60
Sep 25 10:36:55 prx kernel: do_idle+0x202/0x260
Sep 25 10:36:55 prx kernel: cpu_startup_entry+0x2a/0x30
Sep 25 10:36:55 prx kernel: start_secondary+0x119/0x140
Sep 25 10:36:55 prx kernel: secondary_startup_64_no_verify+0x17e/0x18b
Sep 25 10:36:55 prx kernel: </TASK>
Sep 25 10:36:55 prx kernel: ---[ end trace 0000000000000000 ]---
Sep 25 10:36:55 prx kernel: igb 0000:07:00.0 eno1: Reset adapter
Sep 25 10:36:55 prx kernel: vmbr0: port 1(eno1) entered disabled state

------------------------------------------------------------------------------------------------------------------------------------------------

Das System läuft weiter, wenn der Fehler auftritt. Über die Konsole kann man neu starten, dann ist die Karte auch wieder da. Nur das Netzwerk neu zu starten hilft nicht.

Ist jemandem etwas bekannt? Wer kann helfen?

MfG Donald
 
Hi, ist das zufällig ein Supermicro X10 Board? Da hate letztens schon einer hier im Forum von Problemen mit dem neuen Kernel berichtet.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!