Got this last night, lost connection to server. The server is mostly idle, one vm doing nothing.
Network is 2 x 1Gb/s i210 onboard.
Dec 04 03:03:30 PVE-02 kernel: igb 0000:07:00.0 eno1: PCIe link lost
Dec 04 03:03:30 PVE-02 kernel: ------------[ cut here ]------------
Dec 04 03:03:30 PVE-02 kernel: igb: Failed to read reg 0xc030!
Dec 04 03:03:30 PVE-02 kernel: WARNING: CPU: 24 PID: 2007428 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Dec 04 03:03:30 PVE-02 kernel: Modules linked in: veth 8021q garp mrp ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables sunrpc bonding tls softdog nfnetlink_log nfnetlink binfmt_misc ipmi_ssif intel_rapl_msr intel_rapl_common amdgpu snd_sof_amd_rembrandt amd64_edac snd_sof_amd_renoir snd_sof_amd_acp edac_mce_amd snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_realtek snd_sof kvm_amd snd_hda_codec_generic snd_sof_utils ledtrig_audio snd_hda_codec_hdmi snd_soc_core kvm amdxcp snd_hda_intel iommu_v2 snd_compress snd_intel_dspcfg drm_buddy ac97_bus snd_intel_sdw_acpi gpu_sched snd_pcm_dmaengine drm_suballoc_helper snd_hda_codec snd_pci_ps irqbypass drm_ttm_helper crct10dif_pclmul snd_rpl_pci_acp6x snd_hda_core ttm snd_acp_pci polyval_clmulni polyval_generic snd_hwdep snd_pci_acp6x ghash_clmulni_intel drm_display_helper aesni_intel snd_pcm snd_pci_acp5x cec ast snd_rn_pci_acp3x crypto_simd snd_timer snd_acp_config rc_core cryptd acpi_ipmi drm_shmem_helper snd snd_soc_acpi
Dec 04 03:03:30 PVE-02 kernel: ipmi_si drm_kms_helper soundcore rapl snd_pci_acp3x ccp pcspkr ipmi_devintf k10temp joydev input_leds ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb xhci_pci nvme xhci_pci_renesas crc32_pclmul igb ahci xhci_hcd nvme_core i2c_algo_bit i2c_piix4 libahci dca nvme_common video wmi
Dec 04 03:03:30 PVE-02 kernel: CPU: 24 PID: 2007428 Comm: kworker/24:2 Tainted: P O 6.5.11-6-pve #1
Dec 04 03:03:30 PVE-02 kernel: Hardware name: Supermicro AS -1015A-MT/H13SAE-MF, BIOS 1.1a 10/19/2023
Dec 04 03:03:30 PVE-02 kernel: Workqueue: events igb_watchdog_task [igb]
Dec 04 03:03:30 PVE-02 kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Dec 04 03:03:30 PVE-02 kernel: Code: c7 c6 03 f4 56 c0 e8 1c bd 7f d7 48 8b bb 28 ff ff ff e8 a0 86 35 d7 84 c0 74 c1 44 89 e6 48 c7 c7 f8 00 57 c0 e8 dd fd ba d6 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff e9 69 70 b4 d7 66 0f
Dec 04 03:03:30 PVE-02 kernel: RSP: 0018:ffffa97c01afbd98 EFLAGS: 00010246
Dec 04 03:03:30 PVE-02 kernel: RAX: 0000000000000000 RBX: ffff8fd90461cf18 RCX: 0000000000000000
Dec 04 03:03:30 PVE-02 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 04 03:03:30 PVE-02 kernel: RBP: ffffa97c01afbda8 R08: 0000000000000000 R09: 0000000000000000
Dec 04 03:03:30 PVE-02 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000c030
Dec 04 03:03:30 PVE-02 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff8fd9170feb40
Dec 04 03:03:30 PVE-02 kernel: FS: 0000000000000000(0000) GS:ffff8ff798800000(0000) knlGS:0000000000000000
Dec 04 03:03:30 PVE-02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 04 03:03:30 PVE-02 kernel: CR2: 00007f5ba5afd0a0 CR3: 0000000d28034000 CR4: 0000000000750ee0
Dec 04 03:03:30 PVE-02 kernel: PKRU: 55555554
Dec 04 03:03:30 PVE-02 kernel: Call Trace:
Dec 04 03:03:30 PVE-02 kernel: <TASK>
Dec 04 03:03:30 PVE-02 kernel: ? show_regs+0x6d/0x80
Dec 04 03:03:30 PVE-02 kernel: ? __warn+0x89/0x160
Dec 04 03:03:30 PVE-02 kernel: ? igb_rd32+0x93/0xb0 [igb]
Dec 04 03:03:30 PVE-02 kernel: ? report_bug+0x17e/0x1b0
Dec 04 03:03:30 PVE-02 kernel: ? handle_bug+0x46/0x90
Dec 04 03:03:30 PVE-02 kernel: ? exc_invalid_op+0x18/0x80
Dec 04 03:03:30 PVE-02 kernel: ? asm_exc_invalid_op+0x1b/0x20
Dec 04 03:03:30 PVE-02 kernel: ? igb_rd32+0x93/0xb0 [igb]
Dec 04 03:03:30 PVE-02 kernel: ? igb_rd32+0x93/0xb0 [igb]
Dec 04 03:03:30 PVE-02 kernel: igb_update_stats+0x89/0x830 [igb]
Dec 04 03:03:30 PVE-02 kernel: igb_watchdog_task+0x12d/0x880 [igb]
Dec 04 03:03:30 PVE-02 kernel: process_one_work+0x23b/0x450
Dec 04 03:03:30 PVE-02 kernel: worker_thread+0x50/0x3f0
Dec 04 03:03:30 PVE-02 kernel: ? __pfx_worker_thread+0x10/0x10
Dec 04 03:03:30 PVE-02 kernel: kthread+0xef/0x120
Dec 04 03:03:30 PVE-02 kernel: ? __pfx_kthread+0x10/0x10
Dec 04 03:03:30 PVE-02 kernel: ret_from_fork+0x44/0x70
Dec 04 03:03:30 PVE-02 kernel: ? __pfx_kthread+0x10/0x10
Dec 04 03:03:30 PVE-02 kernel: ret_from_fork_asm+0x1b/0x30
Dec 04 03:03:30 PVE-02 kernel: </TASK>
Dec 04 03:03:30 PVE-02 kernel: ---[ end trace 0000000000000000 ]---
Got another one, same hardware different server.
Dec 07 02:28:22 PVE-01 kernel: igb 0000:07:00.0 eno1: PCIe link lost
Dec 07 02:28:22 PVE-01 kernel: ------------[ cut here ]------------
Dec 07 02:28:22 PVE-01 kernel: igb: Failed to read reg 0xc030!
Dec 07 02:28:22 PVE-01 kernel: WARNING: CPU: 13 PID: 1845536 at drivers/net/ethernet/intel/igb/igb_main.c:745 igb_rd32+0x93/0xb0 [igb]
Dec 07 02:28:22 PVE-01 kernel: Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common amd64_edac snd_sof_amd_rembrandt edac_mce_amd snd_sof_amd_renoir snd_sof_amd_acp amdgpu snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_realtek snd_sof kvm_amd snd_hda_codec_generic snd_sof_utils ledtrig_audio snd_hda_codec_hdmi snd_soc_core kvm amdxcp snd_hda_intel iommu_v2 snd_intel_dspcfg drm_buddy snd_compress snd_intel_sdw_acpi ac97_bus gpu_sched snd_pcm_dmaengine snd_hda_codec drm_suballoc_helper irqbypass snd_pci_ps drm_ttm_helper ipmi_ssif crct10dif_pclmul ttm snd_rpl_pci_acp6x polyval_clmulni polyval_generic snd_acp_pci snd_hda_core ghash_clmulni_intel snd_pci_acp6x snd_hwdep drm_display_helper aesni_intel snd_pcm cec snd_pci_acp5x crypto_simd snd_timer ast cryptd rc_core snd_rn_pci_acp3x snd drm_shmem_helper snd_acp_config rapl snd_soc_acpi pcspkr soundcore
Dec 07 02:28:22 PVE-01 kernel: snd_pci_acp3x ccp drm_kms_helper k10temp acpi_ipmi ipmi_si joydev input_leds ipmi_devintf ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 rndis_host cdc_ether usbnet mii usbmouse zfs(PO) spl(O) btrfs blake2b_generic xor uas usb_storage hid_generic usbhid hid raid6_pq libcrc32c xhci_pci nvme xhci_pci_renesas crc32_pclmul igb ahci xhci_hcd nvme_core i2c_piix4 i2c_algo_bit libahci dca video nvme_common wmi
Dec 07 02:28:22 PVE-01 kernel: CPU: 13 PID: 1845536 Comm: kworker/13:1 Tainted: P O 6.5.11-4-pve #1
Dec 07 02:28:22 PVE-01 kernel: Hardware name: Supermicro AS -3015A-I/H13SAE-MF, BIOS 1.1a 10/19/2023
Dec 07 02:28:22 PVE-01 kernel: Workqueue: events igb_watchdog_task [igb]
Dec 07 02:28:22 PVE-01 kernel: RIP: 0010:igb_rd32+0x93/0xb0 [igb]
Dec 07 02:28:22 PVE-01 kernel: Code: c7 c6 03 64 5f c0 e8 1c 4d b7 d5 48 8b bb 28 ff ff ff e8 a0 16 6d d5 84 c0 74 c1 44 89 e6 48 c7 c7 f8 70 5f c0 e8 dd 8d f2 d4 <0f> 0b eb ae b8 ff ff ff ff 31 d2 31 f6 31 ff e9 69 00 ec d5 66 0f
Dec 07 02:28:22 PVE-01 kernel: RSP: 0018:ffffb9e6c45bfd98 EFLAGS: 00010246
Dec 07 02:28:22 PVE-01 kernel: RAX: 0000000000000000 RBX: ffff9ef896278f18 RCX: 0000000000000000
Dec 07 02:28:22 PVE-01 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Dec 07 02:28:22 PVE-01 kernel: RBP: ffffb9e6c45bfda8 R08: 0000000000000000 R09: 0000000000000000
Dec 07 02:28:22 PVE-01 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000c030
Dec 07 02:28:22 PVE-01 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: ffff9ef8962f2b40
Dec 07 02:28:22 PVE-01 kernel: FS: 0000000000000000(0000) GS:ffff9f1718740000(0000) knlGS:0000000000000000
Dec 07 02:28:22 PVE-01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 07 02:28:22 PVE-01 kernel: CR2: 00007f6946566438 CR3: 0000001b7ec34000 CR4: 0000000000750ee0
Dec 07 02:28:22 PVE-01 kernel: PKRU: 55555554
Dec 07 02:28:22 PVE-01 kernel: Call Trace:
Dec 07 02:28:22 PVE-01 kernel: <TASK>
Dec 07 02:28:22 PVE-01 kernel: ? show_regs+0x6d/0x80
Dec 07 02:28:22 PVE-01 kernel: ? __warn+0x89/0x160
Dec 07 02:28:22 PVE-01 kernel: ? igb_rd32+0x93/0xb0 [igb]
Dec 07 02:28:22 PVE-01 kernel: ? report_bug+0x17e/0x1b0
Dec 07 02:28:22 PVE-01 kernel: ? handle_bug+0x46/0x90
Dec 07 02:28:22 PVE-01 kernel: ? exc_invalid_op+0x18/0x80
Dec 07 02:28:22 PVE-01 kernel: ? asm_exc_invalid_op+0x1b/0x20
Dec 07 02:28:22 PVE-01 kernel: ? igb_rd32+0x93/0xb0 [igb]
Dec 07 02:28:22 PVE-01 kernel: ? igb_rd32+0x93/0xb0 [igb]
Dec 07 02:28:22 PVE-01 kernel: igb_update_stats+0x89/0x830 [igb]
Dec 07 02:28:22 PVE-01 kernel: igb_watchdog_task+0x12d/0x880 [igb]
Dec 07 02:28:22 PVE-01 kernel: process_one_work+0x23b/0x450
Dec 07 02:28:22 PVE-01 kernel: worker_thread+0x50/0x3f0
Dec 07 02:28:22 PVE-01 kernel: ? __pfx_worker_thread+0x10/0x10
Dec 07 02:28:22 PVE-01 kernel: kthread+0xef/0x120
Dec 07 02:28:22 PVE-01 kernel: ? __pfx_kthread+0x10/0x10
Dec 07 02:28:22 PVE-01 kernel: ret_from_fork+0x44/0x70
Dec 07 02:28:22 PVE-01 kernel: ? __pfx_kthread+0x10/0x10
Dec 07 02:28:22 PVE-01 kernel: ret_from_fork_asm+0x1b/0x30
Dec 07 02:28:22 PVE-01 kernel: </TASK>
Dec 07 02:28:22 PVE-01 kernel: ---[ end trace 0000000000000000 ]---