Hi there,
had two hard crashes in the span of two days. I suspect a higher IO load as the trigger, although I can't be sure. system is up to date, running kernel 6.2.16-4-pve
I found this thread which seems to suggest the problem hasn't been fixed.
https://lore.kernel.org/lkml/6c389fde-4c8d-300b-8c3c-300d6105c30a@eikelenboom.it/T/
Seems like nothing can be done about it for the moment.....
had two hard crashes in the span of two days. I suspect a higher IO load as the trigger, although I can't be sure. system is up to date, running kernel 6.2.16-4-pve
Bash:
Jul 20 20:07:19 server kernel: ------------[ cut here ]------------
Jul 20 20:07:19 server kernel: NETDEV WATCHDOG: enp6s0 (igc): transmit queue 2 timed out
Jul 20 20:07:19 server kernel: WARNING: CPU: 11 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
Jul 20 20:07:19 server kernel: Modules linked in: tcp_diag inet_diag ipt_REJECT nf_reject_ipv4 nft_chain_nat nft_compat nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE xfr>
Jul 20 20:07:19 server kernel: ledtrig_audio intel_rapl_msr nouveau snd_pcm_dmaengine snd_hda_codec_hdmi intel_rapl_common x86_pkg_temp_thermal intel_powerclamp drm_ttm_helper snd_hda_int>
Jul 20 20:07:19 server kernel: polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel xhci_pci crypto_simd nvme spi_intel_pci xhci_pci_renesas cryptd i2c_i801 hpsa ahci e1000e spi_i>
Jul 20 20:07:19 server kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: P O 6.2.16-4-pve #1
Jul 20 20:07:19 server kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D09/Z590-A PRO (MS-7D09), BIOS 1.80 09/29/2022
Jul 20 20:07:19 server kernel: RIP: 0010:dev_watchdog+0x23a/0x250
Jul 20 20:07:19 server kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 4a 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 64 80 9b 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66>
Jul 20 20:07:19 server kernel: RSP: 0018:ffffa387403d0e38 EFLAGS: 00010246
Jul 20 20:07:19 server kernel: RAX: 0000000000000000 RBX: ffff8e3693b6c000 RCX: 0000000000000000
Jul 20 20:07:19 server kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 20 20:07:19 server kernel: RBP: ffffa387403d0e68 R08: 0000000000000000 R09: 0000000000000000
Jul 20 20:07:19 server kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e3693b6c4c8
Jul 20 20:07:19 server kernel: R13: ffff8e3693b6c41c R14: 0000000000000002 R15: 0000000000000000
Jul 20 20:07:19 server kernel: FS: 0000000000000000(0000) GS:ffff8e45bfcc0000(0000) knlGS:0000000000000000
Jul 20 20:07:19 server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 20 20:07:19 server kernel: CR2: 00007fa8d0036a90 CR3: 000000030f89e006 CR4: 0000000000772ee0
Jul 20 20:07:19 server kernel: PKRU: 55555554
Jul 20 20:07:19 server kernel: Call Trace:
Jul 20 20:07:19 server kernel: <IRQ>
Jul 20 20:07:19 server kernel: ? __pfx_dev_watchdog+0x10/0x10
Jul 20 20:07:19 server kernel: call_timer_fn+0x29/0x160
Jul 20 20:07:19 server kernel: ? __pfx_dev_watchdog+0x10/0x10
Jul 20 20:07:19 server kernel: __run_timers+0x259/0x310
Jul 20 20:07:19 server kernel: run_timer_softirq+0x1d/0x40
Jul 20 20:07:19 server kernel: __do_softirq+0xd6/0x346
Jul 20 20:07:19 server kernel: ? hrtimer_interrupt+0x11f/0x250
Jul 20 20:07:19 server kernel: __irq_exit_rcu+0xa2/0xd0
Jul 20 20:07:19 server kernel: irq_exit_rcu+0xe/0x20
Jul 20 20:07:19 server kernel: sysvec_apic_timer_interrupt+0x92/0xd0
Jul 20 20:07:19 server kernel: </IRQ>
Jul 20 20:07:19 server kernel: <TASK>
Jul 20 20:07:19 server kernel: asm_sysvec_apic_timer_interrupt+0x1b/0x20
Jul 20 20:07:19 server kernel: RIP: 0010:cpuidle_enter_state+0xde/0x6f0
Jul 20 20:07:19 server kernel: Code: 27 57 65 e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00>
Jul 20 20:07:19 server kernel: RSP: 0018:ffffa3874018be38 EFLAGS: 00000246
Jul 20 20:07:19 server kernel: RAX: 0000000000000000 RBX: ffffc3873fcc0300 RCX: 0000000000000000
Jul 20 20:07:19 server kernel: RDX: 000000000000000b RSI: 0000000000000000 RDI: 0000000000000000
Jul 20 20:07:19 server kernel: RBP: ffffa3874018be88 R08: 0000000000000000 R09: 0000000000000000
Jul 20 20:07:19 server kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9c2c33a0
Jul 20 20:07:19 server kernel: R13: 0000000000000002 R14: 0000000000000002 R15: 000020cf14dd4384
Jul 20 20:07:19 server kernel: ? cpuidle_enter_state+0xce/0x6f0
Jul 20 20:07:19 server kernel: cpuidle_enter+0x2e/0x50
Jul 20 20:07:19 server kernel: do_idle+0x216/0x2a0
Jul 20 20:07:19 server kernel: cpu_startup_entry+0x1d/0x20
Jul 20 20:07:19 server kernel: start_secondary+0x122/0x160
Jul 20 20:07:19 server kernel: secondary_startup_64_no_verify+0xe5/0xeb
Jul 20 20:07:19 server kernel: </TASK>
Jul 20 20:07:19 server kernel: ---[ end trace 0000000000000000 ]---
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: Register Dump
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: Register Name Value
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: CTRL 081c0641
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: STATUS 40380683
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: CTRL_EXT 10000040
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: MDIC 180a3800
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: ICR 000000c1
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RCTL 0440803a
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RDLEN[0-3] 00001000 00001000 00001000 00001000
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RDH[0-3] 000000e0 000000e2 0000005a 00000094
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RDT[0-3] 000000df 000000e1 00000058 00000093
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RXDCTL[0-3] 02040808 02040808 02040808 02040808
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RDBAL[0-3] 0fa78000 17940000 1511c000 19fc6000
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: RDBAH[0-3] 00000001 00000001 00000001 00000001
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TCTL a503f0fa
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TDBAL[0-3] 18f7d000 0fbaa000 19b99000 0cc5d000
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TDBAH[0-3] 00000001 00000001 00000001 00000001
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TDLEN[0-3] 00001000 00001000 00001000 00001000
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TDH[0-3] 00000059 00000002 0000003f 000000df
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TDT[0-3] 00000059 00000002 0000003f 000000df
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: TXDCTL[0-3] 02100108 02100108 02100108 02100108
Jul 20 20:07:19 server kernel: igc 0000:06:00.0 enp6s0: Reset adapter
Jul 20 20:07:20 server kernel: vmbr0: port 1(enp6s0) entered disabled state
Jul 20 20:07:20 server kernel: vmbr0v20: port 1(enp6s0.20) entered disabled state
Jul 20 20:07:20 server kernel: vmbr0v50: port 1(enp6s0.50) entered disabled state
Jul 20 20:07:20 server kernel: vmbr0v1000: port 1(enp6s0.1000) entered disabled state
Jul 20 20:07:23 server kernel: igc 0000:06:00.0 enp6s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Jul 20 20:07:23 server kernel: vmbr0: port 1(enp6s0) entered blocking state
Jul 20 20:07:23 server kernel: vmbr0: port 1(enp6s0) entered forwarding state
Jul 20 20:07:23 server kernel: vmbr0v20: port 1(enp6s0.20) entered blocking state
Jul 20 20:07:23 server kernel: vmbr0v20: port 1(enp6s0.20) entered forwarding state
Jul 20 20:07:23 server kernel: vmbr0v50: port 1(enp6s0.50) entered blocking state
Jul 20 20:07:23 server kernel: vmbr0v50: port 1(enp6s0.50) entered forwarding state
Jul 20 20:07:23 server kernel: vmbr0v1000: port 1(enp6s0.1000) entered blocking state
Jul 20 20:07:23 server kernel: vmbr0v1000: port 1(enp6s0.1000) entered forwarding state
Jul 20 20:07:27 server kernel: ------------[ cut here ]------------
Jul 20 20:07:27 server kernel: kernel BUG at lib/dynamic_queue_limits.c:27!
I found this thread which seems to suggest the problem hasn't been fixed.
https://lore.kernel.org/lkml/6c389fde-4c8d-300b-8c3c-300d6105c30a@eikelenboom.it/T/
Seems like nothing can be done about it for the moment.....