Thanks for your report, can you please post the full error line from the kernel (e.g. check journalctl -b-1
for the system log of the last boot)?
Yes, thank you.
bnxt_en 0000:3d:00.0 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog
bnxt_en 0000:3d:00.0: Unable to read VPD
Apr 07 00:07:44 fbo-vmh-024 kernel: ------------[ cut here ]------------
Apr 07 00:07:44 fbo-vmh-024 kernel: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
Apr 07 00:07:44 fbo-vmh-024 kernel: shift exponent 64 is too large for 64-bit type 'long unsigned int'
Apr 07 00:07:44 fbo-vmh-024 kernel: CPU: 45 PID: 1471 Comm: (udev-worker) Tainted: P O 6.8.1-1-pve #1
Apr 07 00:07:44 fbo-vmh-024 kernel: Hardware name: Supermicro Super Server/X13DEI-T, BIOS 2.1 12/13/2023
Apr 07 00:07:44 fbo-vmh-024 kernel: Call Trace:
Apr 07 00:07:44 fbo-vmh-024 kernel: <TASK>
Apr 07 00:07:44 fbo-vmh-024 kernel: dump_stack_lvl+0x48/0x70
Apr 07 00:07:44 fbo-vmh-024 kernel: dump_stack+0x10/0x20
Apr 07 00:07:44 fbo-vmh-024 kernel: __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_qplib_alloc_init_hwq.cold+0x8c/0xd7 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_qplib_create_qp+0x1d5/0x8c0 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_re_create_qp+0x71d/0xf30 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? bnxt_qplib_create_cq+0x247/0x330 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __kmalloc+0x1ab/0x400
Apr 07 00:07:44 fbo-vmh-024 kernel: create_qp+0x17a/0x290 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? create_qp+0x17a/0x290 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ib_create_qp_kernel+0x3b/0xe0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: create_mad_qp+0x8e/0x100 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_qp_event_handler+0x10/0x10 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ib_mad_init_device+0x2c2/0x8a0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: add_client_context+0x127/0x1c0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: enable_device_and_get+0xe6/0x1e0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ib_register_device+0x506/0x610 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: auxiliary_bus_probe+0x3e/0xa0
Apr 07 00:07:44 fbo-vmh-024 kernel: really_probe+0x1c9/0x430
Apr 07 00:07:44 fbo-vmh-024 kernel: __driver_probe_device+0x8c/0x190
Apr 07 00:07:44 fbo-vmh-024 kernel: driver_probe_device+0x24/0xd0
Apr 07 00:07:44 fbo-vmh-024 kernel: __driver_attach+0x10b/0x210
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx___driver_attach+0x10/0x10
Apr 07 00:07:44 fbo-vmh-024 kernel: bus_for_each_dev+0x8a/0xf0
Apr 07 00:07:44 fbo-vmh-024 kernel: driver_attach+0x1e/0x30
Apr 07 00:07:44 fbo-vmh-024 kernel: bus_add_driver+0x156/0x260
Apr 07 00:07:44 fbo-vmh-024 kernel: driver_register+0x5e/0x130
Apr 07 00:07:44 fbo-vmh-024 kernel: __auxiliary_driver_register+0x73/0xf0
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: do_one_initcall+0x5b/0x340
Apr 07 00:07:44 fbo-vmh-024 kernel: do_init_module+0x97/0x290
Apr 07 00:07:44 fbo-vmh-024 kernel: load_module+0x213a/0x22a0
Apr 07 00:07:44 fbo-vmh-024 kernel: init_module_from_file+0x96/0x100
Apr 07 00:07:44 fbo-vmh-024 kernel: ? init_module_from_file+0x96/0x100
Apr 07 00:07:44 fbo-vmh-024 kernel: idempotent_init_module+0x11c/0x2b0
Apr 07 00:07:44 fbo-vmh-024 kernel: __x64_sys_finit_module+0x64/0xd0
Apr 07 00:07:44 fbo-vmh-024 kernel: do_syscall_64+0x84/0x180
Apr 07 00:07:44 fbo-vmh-024 kernel: ? syscall_exit_to_user_mode+0x86/0x260
Apr 07 00:07:44 fbo-vmh-024 kernel: ? do_syscall_64+0x93/0x180
Apr 07 00:07:44 fbo-vmh-024 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 07 00:07:44 fbo-vmh-024 kernel: RIP: 0033:0x7ac146137719
Apr 07 00:07:44 fbo-vmh-024 kernel: Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 f>
Apr 07 00:07:44 fbo-vmh-024 kernel: RSP: 002b:00007ffc8a83b208 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Apr 07 00:07:44 fbo-vmh-024 kernel: RAX: ffffffffffffffda RBX: 00005f4b75018a80 RCX: 00007ac146137719
Apr 07 00:07:44 fbo-vmh-024 kernel: RDX: 0000000000000000 RSI: 00007ac1462caefd RDI: 000000000000000f
Apr 07 00:07:44 fbo-vmh-024 kernel: RBP: 00007ac1462caefd R08: 0000000000000000 R09: 00005f4b74fd8720
Apr 07 00:07:44 fbo-vmh-024 kernel: R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
Apr 07 00:07:44 fbo-vmh-024 kernel: R13: 0000000000000000 R14: 00005f4b7500f170 R15: 00005f4b74858ec1
Apr 07 00:07:44 fbo-vmh-024 kernel: </TASK>
Apr 07 00:07:44 fbo-vmh-024 kernel: ---[ end trace ]---
Apr 07 00:08:45 fbo-vmh-024 systemd-udevd[1463]: bnxt_en.rdma.0: Worker [1642] processing SEQNUM=18223 is taking a long time
Apr 07 00:08:45 fbo-vmh-024 systemd-udevd[1463]: bnxt_en.rdma.1: Worker [1471] processing SEQNUM=18226 is taking a long time
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102422 > 100000) msec active 1
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0 bnxt_re0: Failed to modify HW QP
Apr 07 00:09:26 fbo-vmh-024 kernel: infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
Apr 07 00:09:26 fbo-vmh-024 kernel: infiniband bnxt_re0: Couldn't start port
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0 bnxt_re0: Failed to destroy HW QP
Apr 07 00:09:26 fbo-vmh-024 kernel: ------------[ cut here ]------------
Apr 07 00:09:26 fbo-vmh-024 kernel: WARNING: CPU: 11 PID: 1471 at drivers/infiniband/core/cq.c:322 ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: Modules linked in: ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit x86_pk>
Apr 07 00:09:26 fbo-vmh-024 kernel: nvme_auth i2c_i801 spi_intel_pci megaraid_sas xhci_hcd libahci i2c_smbus spi_intel i2c_ismt wmi pinctrl_emmitsburg
Apr 07 00:09:26 fbo-vmh-024 kernel: CPU: 11 PID: 1471 Comm: (udev-worker) Tainted: P O 6.8.1-1-pve #1
Apr 07 00:09:26 fbo-vmh-024 kernel: Hardware name: Supermicro Super Server/X13DEI-T, BIOS 2.1 12/13/2023
Apr 07 00:09:26 fbo-vmh-024 kernel: RIP: 0010:ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: Code: e8 fc 9c 02 00 65 ff 0d 9d 87 e5 3e 0f 85 70 ff ff ff 0f 1f 44 00 00 e9 66 ff ff ff 48 8d 7f 50 e8 0c 3a 33 df e9 35 ff ff ff <0f> 0b 31 c0 3>
Apr 07 00:09:26 fbo-vmh-024 kernel: RSP: 0018:ff6fb876ceb3b6f0 EFLAGS: 00010202
Apr 07 00:09:26 fbo-vmh-024 kernel: RAX: 0000000000000002 RBX: 0000000000000001 RCX: 0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff4118c220ef4400
Apr 07 00:09:26 fbo-vmh-024 kernel: RBP: ff6fb876ceb3b760 R08: 0000000000000000 R09: 0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff4118c235c00000
Apr 07 00:09:26 fbo-vmh-024 kernel: R13: ff4118c209bb8500 R14: 00000000ffffff92 R15: ff4118c22e88f000
Apr 07 00:09:26 fbo-vmh-024 kernel: FS: 00007ac145a2a8c0(0000) GS:ff4118e0ff780000(0000) knlGS:0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 07 00:09:26 fbo-vmh-024 kernel: CR2: 00005f4b7509f1e8 CR3: 0000000131da2003 CR4: 0000000000f71ef0
Apr 07 00:09:26 fbo-vmh-024 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Apr 07 00:09:26 fbo-vmh-024 kernel: PKRU: 55555554
Apr 07 00:09:26 fbo-vmh-024 kernel: Call Trace:
Apr 07 00:09:26 fbo-vmh-024 kernel: <TASK>
Apr 07 00:09:26 fbo-vmh-024 kernel: ? show_regs+0x6d/0x80
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __warn+0x89/0x160
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? report_bug+0x17e/0x1b0
Apr 07 00:09:26 fbo-vmh-024 kernel: ? handle_bug+0x46/0x90
Apr 07 00:09:26 fbo-vmh-024 kernel: ? exc_invalid_op+0x18/0x80
Apr 07 00:09:26 fbo-vmh-024 kernel: ? asm_exc_invalid_op+0x1b/0x20
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_mad_init_device+0x54c/0x8a0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: add_client_context+0x127/0x1c0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: enable_device_and_get+0xe6/0x1e0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_mad_init_device+0x54c/0x8a0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: add_client_context+0x127/0x1c0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: enable_device_and_get+0xe6/0x1e0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ib_register_device+0x506/0x610 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: auxiliary_bus_probe+0x3e/0xa0
Apr 07 00:09:26 fbo-vmh-024 kernel: really_probe+0x1c9/0x430
Apr 07 00:09:26 fbo-vmh-024 kernel: __driver_probe_device+0x8c/0x190
Apr 07 00:09:26 fbo-vmh-024 kernel: driver_probe_device+0x24/0xd0
Apr 07 00:09:26 fbo-vmh-024 kernel: __driver_attach+0x10b/0x210
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx___driver_attach+0x10/0x10
Apr 07 00:09:26 fbo-vmh-024 kernel: bus_for_each_dev+0x8a/0xf0
Apr 07 00:09:26 fbo-vmh-024 kernel: driver_attach+0x1e/0x30
Apr 07 00:09:26 fbo-vmh-024 kernel: bus_add_driver+0x156/0x260
Apr 07 00:09:26 fbo-vmh-024 kernel: driver_register+0x5e/0x130
Apr 07 00:09:26 fbo-vmh-024 kernel: __auxiliary_driver_register+0x73/0xf0
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: do_one_initcall+0x5b/0x340
Apr 07 00:09:26 fbo-vmh-024 kernel: do_init_module+0x97/0x290
Apr 07 00:09:26 fbo-vmh-024 kernel: load_module+0x213a/0x22a0
Apr 07 00:09:26 fbo-vmh-024 kernel: init_module_from_file+0x96/0x100
Apr 07 00:09:26 fbo-vmh-024 kernel: ? init_module_from_file+0x96/0x100
Apr 07 00:09:26 fbo-vmh-024 kernel: idempotent_init_module+0x11c/0x2b0
Apr 07 00:09:26 fbo-vmh-024 kernel: __x64_sys_finit_module+0x64/0xd0
Apr 07 00:09:26 fbo-vmh-024 kernel: do_syscall_64+0x84/0x180
Apr 07 00:09:26 fbo-vmh-024 kernel: ? syscall_exit_to_user_mode+0x86/0x260
Apr 07 00:09:26 fbo-vmh-024 kernel: ? do_syscall_64+0x93/0x180
Apr 07 00:09:26 fbo-vmh-024 kernel: ? exc_page_fault+0x94/0x1b0
Apr 07 00:09:26 fbo-vmh-024 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 07 00:09:26 fbo-vmh-024 kernel: RIP: 0033:0x7ac146137719
Apr 07 00:09:26 fbo-vmh-024 kernel: Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 f>
Apr 07 00:09:26 fbo-vmh-024 kernel: RSP: 002b:00007ffc8a83b208 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Apr 07 00:09:26 fbo-vmh-024 kernel: RAX: ffffffffffffffda RBX: 00005f4b75018a80 RCX: 00007ac146137719
Apr 07 00:09:26 fbo-vmh-024 kernel: RDX: 0000000000000000 RSI: 00007ac1462caefd RDI: 000000000000000f
Apr 07 00:09:26 fbo-vmh-024 kernel: RBP: 00007ac1462caefd R08: 0000000000000000 R09: 00005f4b74fd8720
Apr 07 00:09:26 fbo-vmh-024 kernel: R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
Apr 07 00:09:26 fbo-vmh-024 kernel: R13: 0000000000000000 R14: 00005f4b7500f170 R15: 00005f4b74858ec1
Apr 07 00:09:26 fbo-vmh-024 kernel: </TASK>
Apr 07 00:09:26 fbo-vmh-024 kernel: ---[ end trace 0000000000000000 ]---
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0 bnxt_re0: Free MW failed: 0xffffff92
Apr 07 00:09:26 fbo-vmh-024 kernel: infiniband bnxt_re0: Couldn't open port 1
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102345 > 100000) msec active 1
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1 bnxt_re1: Failed to modify HW QP
Apr 07 00:11:09 fbo-vmh-024 kernel: infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
Apr 07 00:11:09 fbo-vmh-024 kernel: infiniband bnxt_re1: Couldn't start port
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1 bnxt_re1: Failed to destroy HW QP
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1 bnxt_re1: Free MW failed: 0xffffff92
Apr 07 00:11:09 fbo-vmh-024 kernel: infiniband bnxt_re1: Couldn't open port 1