Hey all,
I have a 3 node Proxmox/ceph cluster and I decided to update. After updating the 1st node, networking no longer works on it. Others have said their 8.2.x upgrade changed the device name but that's not what I'm seeing. Upon booting up and and logging in, I can run manually run
I've tested disabling all virtual capabilities in the BIOS, but that did not help anything. I also installed intel-microcode just to see if that would fix it.
Any help would be appreciated.
Thanks!
I have a 3 node Proxmox/ceph cluster and I decided to update. After updating the 1st node, networking no longer works on it. Others have said their 8.2.x upgrade changed the device name but that's not what I'm seeing. Upon booting up and and logging in, I can run manually run
systemctl restart networking
and everything seems to start connecting and working as it should. I've tested disabling all virtual capabilities in the BIOS, but that did not help anything. I also installed intel-microcode just to see if that would fix it.
Any help would be appreciated.
Thanks!
Code:
#dmesg -l warn
[ 2.032488] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 2.032492] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[ 2.032493] MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
[ 2.143659] Invalid PCCT: 0 PCC subspaces
[ 2.962993] i8042: probe of i8042 failed with error -5
[ 2.991094] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[ 2.991773] platform eisa.0: EISA: Cannot allocate resource for mainboard
[ 2.991774] platform eisa.0: Cannot allocate resource for EISA slot 1
[ 2.991776] platform eisa.0: Cannot allocate resource for EISA slot 2
[ 2.991777] platform eisa.0: Cannot allocate resource for EISA slot 3
[ 2.991779] platform eisa.0: Cannot allocate resource for EISA slot 4
[ 2.991780] platform eisa.0: Cannot allocate resource for EISA slot 5
[ 2.991782] platform eisa.0: Cannot allocate resource for EISA slot 6
[ 2.991783] platform eisa.0: Cannot allocate resource for EISA slot 7
[ 2.991784] platform eisa.0: Cannot allocate resource for EISA slot 8
[ 3.057920] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[ 3.593471] lpc_ich 0000:00:1f.0: No MFD cells added
[ 3.627784] bnxt_en 0000:29:00.0 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog
[ 3.658620] bnxt_en 0000:29:00.1 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog
[ 4.777073] device-mapper: thin: Data device (dm-7) discard unsupported: Disabling discard passdown.
[ 5.955566] ERST: [Firmware Warn]: too many record IDs!
[ 6.204355] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[ 6.205157] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[ 6.205560] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[ 6.205955] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
[ 6.320226] pstore: backend 'erst' already in use: ignoring 'efi_pstore'
[ 6.346943] spl: loading out-of-tree module taints kernel.
[ 6.393504] zfs: module license 'CDDL' taints kernel.
[ 6.393508] Disabling lock debugging due to kernel taint
[ 6.393528] zfs: module license taints kernel.
[ 7.139662] power_meter ACPI000D:00: Ignoring unsafe software power cap!
[ 7.139666] power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
[ 7.197992] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[ 7.256194] ------------[ cut here ]------------
[ 7.256200] CPU: 14 PID: 2304 Comm: (udev-worker) Tainted: P O 6.8.4-2-pve #1
[ 7.256203] Hardware name: HPE ProLiant DL580 Gen10/ProLiant DL580 Gen10, BIOS U34 07/20/2023
[ 7.256204] Call Trace:
[ 7.256206] <TASK>
[ 7.256208] dump_stack_lvl+0x48/0x70
[ 7.256214] dump_stack+0x10/0x20
[ 7.256215] __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
[ 7.256221] bnxt_qplib_alloc_init_hwq.cold+0x8c/0xd7 [bnxt_re]
[ 7.256238] bnxt_qplib_create_qp+0x1d5/0x8c0 [bnxt_re]
[ 7.256250] ? bnxt_re_create_qp+0x5f4/0xf30 [bnxt_re]
[ 7.256264] bnxt_re_create_qp+0x71d/0xf30 [bnxt_re]
[ 7.256273] ? __kmalloc+0x1ab/0x400
[ 7.256278] create_qp+0x17a/0x290 [ib_core]
[ 7.256310] ? create_qp+0x17a/0x290 [ib_core]
[ 7.256336] ib_create_qp_kernel+0x3b/0xe0 [ib_core]
[ 7.256361] create_mad_qp+0x8e/0x100 [ib_core]
[ 7.256393] ? __pfx_qp_event_handler+0x10/0x10 [ib_core]
[ 7.256423] ib_mad_init_device+0x2c2/0x8a0 [ib_core]
[ 7.256454] add_client_context+0x127/0x1c0 [ib_core]
[ 7.256482] enable_device_and_get+0xe6/0x1e0 [ib_core]
[ 7.256509] ib_register_device+0x506/0x610 [ib_core]
[ 7.256539] bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
[ 7.256550] ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
[ 7.256559] auxiliary_bus_probe+0x3e/0xa0
[ 7.256562] really_probe+0x1c9/0x430
[ 7.256566] __driver_probe_device+0x8c/0x190
[ 7.256568] driver_probe_device+0x24/0xd0
[ 7.256571] __driver_attach+0x10b/0x210
[ 7.256573] ? __pfx___driver_attach+0x10/0x10
[ 7.256576] bus_for_each_dev+0x8a/0xf0
[ 7.256578] driver_attach+0x1e/0x30
[ 7.256580] bus_add_driver+0x156/0x260
[ 7.256583] driver_register+0x5e/0x130
[ 7.256586] __auxiliary_driver_register+0x73/0xf0
[ 7.256589] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
[ 7.256597] bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
[ 7.256605] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
[ 7.256612] do_one_initcall+0x5b/0x340
[ 7.256617] do_init_module+0x97/0x290
[ 7.256620] load_module+0x213a/0x22a0
[ 7.256627] init_module_from_file+0x96/0x100
[ 7.256630] ? init_module_from_file+0x96/0x100
[ 7.256634] idempotent_init_module+0x11c/0x2b0
[ 7.256639] __x64_sys_finit_module+0x64/0xd0
[ 7.256640] do_syscall_64+0x84/0x180
[ 7.256643] ? do_syscall_64+0x93/0x180
[ 7.256646] ? syscall_exit_to_user_mode+0x86/0x260
[ 7.256648] ? do_syscall_64+0x93/0x180
[ 7.256650] ? do_syscall_64+0x93/0x180
[ 7.256651] ? exc_page_fault+0x94/0x1b0
[ 7.256653] entry_SYSCALL_64_after_hwframe+0x73/0x7b
[ 7.256656] RIP: 0033:0x749e762ff719
[ 7.256667] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
[ 7.256669] RSP: 002b:00007ffee1332be8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 7.256671] RAX: ffffffffffffffda RBX: 000062fd9e78b920 RCX: 0000749e762ff719
[ 7.256673] RDX: 0000000000000000 RSI: 0000749e76492efd RDI: 000000000000000f
[ 7.256674] RBP: 0000749e76492efd R08: 0000000000000000 R09: 000062fd9e7399b0
[ 7.256675] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
[ 7.256676] R13: 0000000000000000 R14: 000062fd9e770e70 R15: 000062fd9dc51ec1
[ 7.256679] </TASK>
[ 7.256680] ---[ end trace ]---
[ 109.288123] bnxt_en 0000:29:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102032 > 100000) msec active 1
[ 109.288328] ------------[ cut here ]------------
[ 109.288331] WARNING: CPU: 14 PID: 2304 at drivers/infiniband/core/cq.c:322 ib_free_cq+0x109/0x150 [ib_core]
[ 109.288436] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl bnxt_re(+) ib_uverbs intel_cstate pcspkr acpi_power_meter mgag200 mei_me ib_core ipmi_si ioatdma acpi_ipmi mei intel_pch_thermal i2c_algo_bit hpilo dca ipmi_devintf acpi_tad ipmi_msghandler joydev input_leds mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic usbmouse usbkbd usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ses enclosure uas usb_storage xhci_pci xhci_pci_renesas crc32_pclmul smartpqi ehci_pci bnxt_en scsi_transport_sas xhci_hcd ehci_hcd lpc_ich wmi
[ 109.288581] CPU: 14 PID: 2304 Comm: (udev-worker) Tainted: P O 6.8.4-2-pve #1
[ 109.288583] Hardware name: HPE ProLiant DL580 Gen10/ProLiant DL580 Gen10, BIOS U34 07/20/2023
[ 109.288585] RIP: 0010:ib_free_cq+0x109/0x150 [ib_core]
[ 109.288610] Code: e8 fc 9c 02 00 65 ff 0d 9d 07 1f 3e 0f 85 70 ff ff ff 0f 1f 44 00 00 e9 66 ff ff ff 48 8d 7f 50 e8 6c ba cc e2 e9 35 ff ff ff <0f> 0b 31 c0 31 f6 31 ff c3 cc cc cc cc 0f 0b eb 80 44 0f b6 25 64
[ 109.288612] RSP: 0018:ffffacb261967630 EFLAGS: 00010202
[ 109.288614] RAX: 0000000000000002 RBX: 0000000000000001 RCX: 0000000000000000
[ 109.288616] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9d4cde98ac00
[ 109.288617] RBP: ffffacb2619676a0 R08: 0000000000000000 R09: 0000000000000000
[ 109.288618] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d4d07200000
[ 109.288620] R13: ffff9d4cc2245300 R14: 00000000ffffff92 R15: ffff9d4ce79e8000
[ 109.288621] FS: 0000749e7610e8c0(0000) GS:ffff9d588f300000(0000) knlGS:0000000000000000
[ 109.288623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.288624] CR2: 000062fd9e76c0c8 CR3: 0000000c6e1c6004 CR4: 00000000007706f0
[ 109.288626] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 109.288627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 109.288628] PKRU: 55555554
[ 109.288629] Call Trace:
[ 109.288631] <TASK>
[ 109.288632] ? show_regs+0x6d/0x80
[ 109.288638] ? __warn+0x89/0x160
[ 109.288643] ? ib_free_cq+0x109/0x150 [ib_core]
[ 109.288668] ? report_bug+0x17e/0x1b0
[ 109.288673] ? handle_bug+0x46/0x90
[ 109.288678] ? exc_invalid_op+0x18/0x80
[ 109.288680] ? asm_exc_invalid_op+0x1b/0x20
[ 109.288685] ? ib_free_cq+0x109/0x150 [ib_core]
[ 109.288709] ? ib_mad_init_device+0x54c/0x8a0 [ib_core]
[ 109.288739] add_client_context+0x127/0x1c0 [ib_core]
[ 109.288765] enable_device_and_get+0xe6/0x1e0 [ib_core]
[ 109.288791] ib_register_device+0x506/0x610 [ib_core]
[ 109.288819] bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
[ 109.288832] ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
[ 109.288841] auxiliary_bus_probe+0x3e/0xa0
[ 109.288845] really_probe+0x1c9/0x430
[ 109.288848] __driver_probe_device+0x8c/0x190
[ 109.288851] driver_probe_device+0x24/0xd0
[ 109.288854] __driver_attach+0x10b/0x210
[ 109.288856] ? __pfx___driver_attach+0x10/0x10
[ 109.288859] bus_for_each_dev+0x8a/0xf0
[ 109.288861] driver_attach+0x1e/0x30
[ 109.288863] bus_add_driver+0x156/0x260
[ 109.288866] driver_register+0x5e/0x130
[ 109.288869] __auxiliary_driver_register+0x73/0xf0
[ 109.288871] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
[ 109.288880] bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
[ 109.288887] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
[ 109.288894] do_one_initcall+0x5b/0x340
[ 109.288899] do_init_module+0x97/0x290
[ 109.288903] load_module+0x213a/0x22a0
[ 109.288909] init_module_from_file+0x96/0x100
[ 109.288912] ? init_module_from_file+0x96/0x100
[ 109.288916] idempotent_init_module+0x11c/0x2b0
[ 109.288921] __x64_sys_finit_module+0x64/0xd0
[ 109.288923] do_syscall_64+0x84/0x180
[ 109.288925] ? do_syscall_64+0x93/0x180
[ 109.288927] ? syscall_exit_to_user_mode+0x86/0x260
[ 109.288930] ? do_syscall_64+0x93/0x180
[ 109.288931] ? do_syscall_64+0x93/0x180
[ 109.288933] ? exc_page_fault+0x94/0x1b0
[ 109.288935] entry_SYSCALL_64_after_hwframe+0x73/0x7b
[ 109.288937] RIP: 0033:0x749e762ff719
[ 109.288952] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
[ 109.288953] RSP: 002b:00007ffee1332be8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 109.288955] RAX: ffffffffffffffda RBX: 000062fd9e78b920 RCX: 0000749e762ff719
[ 109.288957] RDX: 0000000000000000 RSI: 0000749e76492efd RDI: 000000000000000f
[ 109.288958] RBP: 0000749e76492efd R08: 0000000000000000 R09: 000062fd9e7399b0
[ 109.288959] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
[ 109.288961] R13: 0000000000000000 R14: 000062fd9e770e70 R15: 000062fd9dc51ec1
[ 109.288964] </TASK>
[ 109.288965] ---[ end trace 0000000000000000 ]---