I have been experiencing random crashes of Proxmox v8.0.4 (Linux version 6.2.16-3-pve) for many months since installing it on a Tuofudun 2.5GbE Firewall Mirco Appliance. It has taken a significant effort but I now believe I have at least one repeatable cause ... every time my Windows laptop resumes from sleep, Proxmox crashes. Please find an example of the kernel syslog below:
Please also find the output from
I suspect that there is either a hardware problem with the Intel I226-V NIC or a software problem with its kernel driver.
I am new to Proxmox and, in fact, Linux so I don't know how much help I can be but if you could suggest a troubleshooting strategy then I will do my best to follow.
Regards,
David.
edit: correct typo.
Code:
.........
Aug 13 17:05:03 pve pvestatd[924]: auth key pair too old, rotating..
Aug 13 17:17:01 pve CRON[196694]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 13 17:17:01 pve CRON[196695]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Aug 13 17:17:01 pve CRON[196694]: pam_unix(cron:session): session closed for user root
Aug 13 18:17:01 pve CRON[206380]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 13 18:17:01 pve CRON[206381]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Aug 13 18:17:01 pve CRON[206380]: pam_unix(cron:session): session closed for user root
Aug 13 18:34:23 pve kernel: perf: interrupt took too long (3919 > 3917), lowering kernel.perf_event_max_sample_rate to 51000
WINDOWS LAPTOP RESUMES FROM SLEEP
Aug 13 18:59:48 pve kernel: igc 0000:04:00.0 enp4s0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
Aug 13 18:59:48 pve kernel: vmbr0: port 2(enp4s0) entered blocking state
Aug 13 18:59:48 pve kernel: vmbr0: port 2(enp4s0) entered forwarding state
Aug 13 18:59:52 pve kernel: igc 0000:05:00.0 enp5s0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
Aug 13 18:59:52 pve kernel: vmbr0: port 3(enp5s0) entered blocking state
Aug 13 18:59:52 pve kernel: vmbr0: port 3(enp5s0) entered forwarding state
Aug 13 19:00:03 pve kernel: igc 0000:05:00.0 enp5s0: NIC Link is Down
Aug 13 19:00:03 pve kernel: vmbr0: port 3(enp5s0) entered disabled state
Aug 13 19:00:03 pve kernel: igc 0000:04:00.0 enp4s0: NIC Link is Down
Aug 13 19:00:04 pve kernel: vmbr0: port 2(enp4s0) entered disabled state
Aug 13 19:00:07 pve kernel: igc 0000:04:00.0 enp4s0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
Aug 13 19:00:07 pve kernel: vmbr0: port 2(enp4s0) entered blocking state
Aug 13 19:00:07 pve kernel: vmbr0: port 2(enp4s0) entered forwarding state
Aug 13 19:00:07 pve kernel: igc 0000:05:00.0 enp5s0: NIC Link is Up 2500 Mbps Full Duplex, Flow Control: RX/TX
Aug 13 19:00:07 pve kernel: vmbr0: port 3(enp5s0) entered blocking state
Aug 13 19:00:07 pve kernel: vmbr0: port 3(enp5s0) entered forwarding state
Aug 13 19:02:44 pve kernel: ------------[ cut here ]------------
Aug 13 19:02:44 pve kernel: refcount_t: underflow; use-after-free.
Aug 13 19:02:44 pve kernel: WARNING: CPU: 3 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xa3/0x150
Aug 13 19:02:44 pve kernel: Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog sunrpc nfnetlink_log binfmt_misc nfnetlink snd_hda_codec_hdmi snd_sof_pci_intel_icl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus x86_pkg_temp_thermal intel_powerclamp i915 coretemp snd_soc_core snd_compress drm_buddy ac97_bus snd_pcm_dmaengine ttm kvm_intel drm_display_helper cec snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm cmdlinepart crct10dif_pclmul snd_hda_codec polyval_generic ghash_clmulni_intel sha512_ssse3 rc_core snd_hda_core spi_nor aesni_intel intel_rapl_msr drm_kms_helper crypto_simd snd_hwdep processor_thermal_device_pci_legacy i2c_algo_bit cryptd processor_thermal_device snd_pcm
Aug 13 19:02:44 pve kernel: processor_thermal_rfim processor_thermal_mbox processor_thermal_rapl intel_rapl_common mei_me snd_timer int340x_thermal_zone syscopyarea snd sysfillrect intel_cstate mtd pcspkr wmi_bmof sysimgblt soundcore mei intel_soc_dts_iosf zfs(PO) mac_hid zunicode(PO) acpi_pad acpi_tad zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb uas usb_storage dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c i2c_i801 spi_intel_pci crc32_pclmul spi_intel i2c_smbus igc sdhci_pci nvme xhci_pci cqhci xhci_pci_renesas sdhci nvme_core nvme_common xhci_hcd video wmi pinctrl_jasperlake
Aug 13 19:02:44 pve kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: P W O 6.2.16-3-pve #1
Aug 13 19:02:44 pve kernel: Hardware name: Default string Default string/Default string, BIOS 5.19 11/11/2022
Aug 13 19:02:44 pve kernel: RIP: 0010:refcount_warn_saturate+0xa3/0x150
Aug 13 19:02:44 pve kernel: Code: cc cc 0f b6 1d 30 7f e0 01 80 fb 01 0f 87 e9 8b 88 00 83 e3 01 75 dd 48 c7 c7 08 e5 96 9d c6 05 14 7f e0 01 01 e8 ad bb 93 ff <0f> 0b eb c6 0f b6 1d 07 7f e0 01 80 fb 01 0f 87 a9 8b 88 00 83 e3
Aug 13 19:02:44 pve kernel: RSP: 0018:ffffb49b401d4d78 EFLAGS: 00010246
Aug 13 19:02:44 pve kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Aug 13 19:02:44 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 13 19:02:44 pve kernel: RBP: ffffb49b401d4d80 R08: 0000000000000000 R09: 0000000000000000
Aug 13 19:02:44 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff913bd18a5c00
Aug 13 19:02:44 pve kernel: R13: ffff913bca2afff0 R14: ffff913bca2af000 R15: 00000000ffffffff
Aug 13 19:02:44 pve kernel: FS: 0000000000000000(0000) GS:ffff913d38180000(0000) knlGS:0000000000000000
Aug 13 19:02:44 pve kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 13 19:02:44 pve kernel: CR2: 00003d7f66d00a20 CR3: 0000000116eca000 CR4: 0000000000352ee0
Aug 13 19:02:44 pve kernel: Call Trace:
Aug 13 19:02:44 pve kernel: <IRQ>
Aug 13 19:02:44 pve kernel: napi_consume_skb+0x15b/0x180
Aug 13 19:02:44 pve kernel: igc_poll+0x874/0x17d0 [igc]
Aug 13 19:02:44 pve kernel: ? __mod_timer+0x28c/0x400
Aug 13 19:02:44 pve kernel: __napi_poll+0x30/0x1f0
Aug 13 19:02:44 pve kernel: net_rx_action+0x180/0x2d0
Aug 13 19:02:44 pve kernel: ? __napi_schedule+0x71/0xa0
Aug 13 19:02:44 pve kernel: __do_softirq+0xd6/0x346
Aug 13 19:02:44 pve kernel: ? handle_irq_event+0x52/0x80
Aug 13 19:02:44 pve kernel: ? handle_edge_irq+0xda/0x250
Aug 13 19:02:44 pve kernel: __irq_exit_rcu+0xa2/0xd0
Aug 13 19:02:44 pve kernel: irq_exit_rcu+0xe/0x20
Aug 13 19:02:44 pve kernel: common_interrupt+0xa4/0xb0
Aug 13 19:02:44 pve kernel: </IRQ>
Aug 13 19:02:44 pve kernel: <TASK>
Aug 13 19:02:44 pve kernel: asm_common_interrupt+0x27/0x40
Aug 13 19:02:44 pve kernel: RIP: 0010:native_safe_halt+0xb/0x10
Aug 13 19:02:44 pve kernel: Code: 20 5f 25 9e e8 a6 ee 7d ff e9 3e ff ff ff cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d a9 d4 37 00 fb f4 <c3> cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66
Aug 13 19:02:44 pve kernel: RSP: 0018:ffffb49b4012fde0 EFLAGS: 00000246
Aug 13 19:02:44 pve kernel: RAX: 0000000000004800 RBX: ffff913bc12cf464 RCX: 0000000000000000
Aug 13 19:02:44 pve kernel: RDX: 0000000000000001 RSI: ffff913bc12cf400 RDI: 0000000000000001
Aug 13 19:02:44 pve kernel: RBP: ffffb49b4012fdf0 R08: 0000000000000000 R09: 0000000000000000
Aug 13 19:02:44 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff913bc12cf464
Aug 13 19:02:44 pve kernel: R13: 0000000000000003 R14: ffffffff9e4d6600 R15: ffff913d38180000
Aug 13 19:02:44 pve kernel: ? acpi_idle_do_entry+0x82/0xc0
Aug 13 19:02:44 pve kernel: acpi_idle_enter+0xbb/0x180
Aug 13 19:02:44 pve kernel: cpuidle_enter_state+0x9a/0x6f0
Aug 13 19:02:44 pve kernel: cpuidle_enter+0x2e/0x50
Aug 13 19:02:44 pve kernel: do_idle+0x216/0x2a0
Aug 13 19:02:44 pve kernel: cpu_startup_entry+0x1d/0x20
Aug 13 19:02:44 pve kernel: start_secondary+0x122/0x160
Aug 13 19:02:44 pve kernel: secondary_startup_64_no_verify+0xe5/0xeb
Aug 13 19:02:44 pve kernel: </TASK>
Aug 13 19:02:44 pve kernel: ---[ end trace 0000000000000000 ]---
Aug 13 19:02:44 pve kernel: ------------[ cut here ]------------
Aug 13 19:02:44 pve kernel: kernel BUG at lib/dynamic_queue_limits.c:27!
-- Reboot --
Aug 13 19:06:33 pve kernel: Linux version 6.2.16-3-pve (tom@sbuild) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) ()
Aug 13 19:06:33 pve kernel: Command line: B
.........
Please also find the output from
lspci -nn | grep Ethernet
below:
Code:
02:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 04)
03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 04)
04:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 04)
05:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I226-V [8086:125c] (rev 04)
I suspect that there is either a hardware problem with the Intel I226-V NIC or a software problem with its kernel driver.
I am new to Proxmox and, in fact, Linux so I don't know how much help I can be but if you could suggest a troubleshooting strategy then I will do my best to follow.
Regards,
David.
edit: correct typo.
Attachments
Last edited: