Networking won't come up on a fresh install of Proxmox 8.2 [Solved]

koiji

Member
Jul 6, 2022
7
0
6
[SOLVED]

tl;dr - there seems to be a bug in Proxmox 8.2 and Linux kernel 6.8. Installing Proxmox 8.1 with kernel 6.5 solved the issue. See this thread for more info.

-------------------------

Hi,

I have a brand new Supermicro WIO A+ Server AS -2015SV-WTNRT with the following hardware:
  • Motherboard: H13SVW-NT (latest BIOS version, 1.1b)
  • AMD EPYC 8124P
  • 2x Micron 7450 PRO 7.68 TB SSD (NVMe PCIe 4.0)
  • 2x onboard Broadcom 10 Gbps RJ45 network interfaces (BCM57416)
  • 1x dual-port Intel PCIe 10 Gbps SFP+ network card (Intel X520-DA2, 82599ES)
I intend to use the Intel SFP+ network card for our networking so after reboot I switched the interface in vmbr0 to the one assigned to one of the ports of the SFP+ card and rebooted but I can't seem to make the interface to come up after reboot on its own.
If I log in locally via IPMI and issue systemctl restart networking the bridge will come up no problem.

I have updated to the newest version of everying and I am curently on Proxmox 8.2.4 with the kernel 6.8.8-1-pve.

Here's the output from lshw:
Code:
# lshw -C network -short
H/W path              Device           Class          Description
=================================================================
/0/100/5.1/0          enp3s0f0np0      network        BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
/0/100/5.1/0.1        enp3s0f1np1      network        BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
/0/10f/1.1/0          ens2f0           network        82599ES 10-Gigabit SFI/SFP+ Network Connection
/0/10f/1.1/0.1        ens2f1           network        82599ES 10-Gigabit SFI/SFP+ Network Connection

Here's the /etc/network/interfaces:
Code:
# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!


auto lo
iface lo inet loopback


iface enp3s0f0np0 inet manual


iface enp3s0f1np1 inet manual


iface enxbe3af2b6059f inet manual


iface ens2f0 inet manual


iface ens2f1 inet manual


auto vmbr0
iface vmbr0 inet static
        address 192.168.101.101/23
        gateway 192.168.100.1
        bridge-ports ens2f1
        bridge-stp off
        bridge-fd 0


source /etc/network/interfaces.d/*

I also tried to use the integrated NIC (enp3s0f0np0) and the behavior is the same (i.e. no network after boot, manually restarting networking fixes it up).

There seems to be a whole bunch of errors in dmesg but this is the point where I am at the end of my linux-fu so any help would be appreciated.

During boot (dmesg):

Code:
[  112.292718] bnxt_en 0000:03:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102195 > 100000) msec active 1
[  112.293297] bnxt_en 0000:03:00.0 bnxt_re0: Failed to modify HW QP
[  112.293557] infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
[  112.293818] infiniband bnxt_re0: Couldn't start port
[  112.294162] bnxt_en 0000:03:00.0 bnxt_re0: Failed to destroy HW QP
[  112.294602] ------------[ cut here ]------------
[  112.294873] WARNING: CPU: 14 PID: 1048 at drivers/infiniband/core/cq.c:322 ib_free_cq+0x109/0x150 [ib_core]
[  112.295156] Modules linked in: intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd bnxt_re(+) acpi_ipmi ipmi_si dax_hmem ib_uverbs ast cxl_acpi ipmi_devi
ntf rapl cxl_core i2c_algo_bit pcspkr k10temp ib_core ccp ipmi_msghandler joydev input_leds mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbmouse rndis_host usbhid cdc_ether usbnet hid mii ixgbe xhci_pc
i nvme xhci_pci_renesas xfrm_algo crc32_pclmul nvme_core ahci dca bnxt_en nvme_auth xhci_hcd mdio libahci i2c_piix4
[  112.296804] CPU: 14 PID: 1048 Comm: (udev-worker) Tainted: P           O       6.8.8-1-pve #1
[  112.297080] Hardware name: Supermicro AS -2015SV-WTNRT/H13SVW-NT, BIOS 1.1a 10/27/2023
[  112.297360] RIP: 0010:ib_free_cq+0x109/0x150 [ib_core]
[  112.297657] Code: e8 fc 9c 02 00 65 ff 0d 9d 77 27 3f 0f 85 70 ff ff ff 0f 1f 44 00 00 e9 66 ff ff ff 48 8d 7f 50 e8 4c 94 35 cb e9 35 ff ff ff <0f> 0b 31 c0 31 f6 31 ff e9 f5 74 54 cc 0f 0b eb 80 44 0f b6 25 64
[  112.298252] RSP: 0018:ff6c60f440f5b660 EFLAGS: 00010202
[  112.298547] RAX: 0000000000000002 RBX: 0000000000000001 RCX: 0000000000000000
[  112.298850] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff4bfa85de79ec00
[  112.299151] RBP: ff6c60f440f5b6d0 R08: 0000000000000000 R09: 0000000000000000
[  112.299454] R10: 0000000000000000 R11: 0000000000000000 R12: ff4bfa8614800000
[  112.299764] R13: ff4bfa85ca5e8300 R14: 00000000ffffff92 R15: ff4bfa85d0586000
[  112.300071] FS:  00007228b03c58c0(0000) GS:ff4bfae38cf00000(0000) knlGS:0000000000000000
[  112.300385] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  112.300703] CR2: 000057bb6650dccc CR3: 0000000113dda001 CR4: 0000000000f71ef0
[  112.301024] PKRU: 55555554
[  112.301342] Call Trace:
[  112.301662]  <TASK>
[  112.301665]  ? show_regs+0x6d/0x80
[  112.301671]  ? __warn+0x89/0x160
[  112.301675]  ? ib_free_cq+0x109/0x150 [ib_core]
[  112.301699]  ? report_bug+0x17e/0x1b0
[  112.307231]  ? handle_bug+0x46/0x90
[  112.307237]  ? exc_invalid_op+0x18/0x80
[  112.307240]  ? asm_exc_invalid_op+0x1b/0x20
[  112.307246]  ? ib_free_cq+0x109/0x150 [ib_core]
[  112.307264]  ? ib_mad_init_device+0x54c/0x8a0 [ib_core]
[  112.307288]  add_client_context+0x127/0x1c0 [ib_core]
[  112.307308]  enable_device_and_get+0xe6/0x1e0 [ib_core]
[  112.307326]  ib_register_device+0x506/0x610 [ib_core]
[  112.307344]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.307350]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.307355]  bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
[  112.307370]  ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
[  112.307379]  auxiliary_bus_probe+0x3e/0xa0
[  112.307384]  really_probe+0x1c9/0x430
[  112.307389]  __driver_probe_device+0x8c/0x190
[  112.307392]  driver_probe_device+0x24/0xd0
[  112.307396]  __driver_attach+0x10b/0x210
[  112.307399]  ? __pfx___driver_attach+0x10/0x10
[  112.307402]  bus_for_each_dev+0x8a/0xf0
[  112.307407]  driver_attach+0x1e/0x30
[  112.307410]  bus_add_driver+0x156/0x260
[  112.307414]  driver_register+0x5e/0x130
[  112.317972]  __auxiliary_driver_register+0x73/0xf0
[  112.317980]  ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
[  112.317988]  bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
[  112.317996]  ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
[  112.318003]  do_one_initcall+0x5b/0x340
[  112.318009]  do_init_module+0x97/0x290
[  112.318015]  load_module+0x213a/0x22a0
[  112.318023]  init_module_from_file+0x96/0x100
[  112.318025]  ? init_module_from_file+0x96/0x100
[  112.318027]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.318036]  idempotent_init_module+0x11c/0x2b0
[  112.318040]  __x64_sys_finit_module+0x64/0xd0
[  112.318043]  x64_sys_call+0x169c/0x24b0
[  112.318045]  do_syscall_64+0x81/0x170
[  112.318049]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.318052]  ? do_syscall_64+0x8d/0x170
[  112.318055]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.318057]  ? restore_fpregs_from_fpstate+0x47/0xf0
[  112.318061]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.318064]  ? switch_fpu_return+0x55/0xf0
[  112.318067]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.318070]  ? syscall_exit_to_user_mode+0x89/0x260
[  112.318073]  ? srso_alias_return_thunk+0x5/0xfbef5
[  112.318076]  ? do_syscall_64+0x8d/0x170
[  112.318079]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[  112.318081] RIP: 0033:0x7228b0ad2719
[  112.318099] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b7 06 0d 00 f7 d8 64 89 01 48
[  112.318101] RSP: 002b:00007ffdd54b75a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  112.318105] RAX: ffffffffffffffda RBX: 000057bb665475d0 RCX: 00007228b0ad2719
[  112.318106] RDX: 0000000000000000 RSI: 00007228b0c65efd RDI: 000000000000000f
[  112.318107] RBP: 00007228b0c65efd R08: 0000000000000000 R09: 000057bb664cba20
[  112.318108] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
[  112.318109] R13: 0000000000000000 R14: 000057bb665049a0 R15: 000057bb65467ec1
[  112.318114]  </TASK>
[  112.318115] ---[ end trace 0000000000000000 ]---
[  112.318120] bnxt_en 0000:03:00.0 bnxt_re0: Free MW failed: 0xffffff92
[  112.318130] infiniband bnxt_re0: Couldn't open port 1
[  112.318371] infiniband bnxt_re0: Device registered with IB successfully
[  130.448732] audit: type=1400 audit(1718896637.740:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="pve-container-mounthotplug" pid=1277 comm="apparmor_parser"
[  130.459814] audit: type=1400 audit(1718896637.740:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-start" pid=1280 comm="apparmor_parser"
[  130.459822] audit: type=1400 audit(1718896637.740:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/lxc-copy" pid=1279 comm="apparmor_parser"
[  130.459827] audit: type=1400 audit(1718896637.740:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1275 comm="apparmor_parser"
[  130.480369] audit: type=1400 audit(1718896637.740:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1275 comm="apparmor_parser"
[  130.480375] audit: type=1400 audit(1718896637.740:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="swtpm" pid=1283 comm="apparmor_parser"
[  130.480379] audit: type=1400 audit(1718896637.740:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1273 comm="apparmor_parser"
[  130.480382] audit: type=1400 audit(1718896637.742:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1281 comm="apparmor_parser"
[  130.480385] audit: type=1400 audit(1718896637.742:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=1281 comm="apparmor_parser"
[  130.480388] audit: type=1400 audit(1718896637.742:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=1281 comm="apparmor_parser"
[  130.573230] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
[  130.573737] softdog:              soft_reboot_cmd=<not set> soft_active_on_boot=0
[  131.434969] RPC: Registered named UNIX socket transport module.
[  131.435344] RPC: Registered udp transport module.
[  131.435660] RPC: Registered tcp transport module.
[  131.435978] RPC: Registered tcp-with-tls transport module.

After issuing systemctl restart networking (dmesg):
Code:
[  214.692740] bnxt_en 0000:03:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102322 > 100000) msec active 1
[  214.692778] bnxt_en 0000:03:00.1 bnxt_re1: Failed to modify HW QP
[  214.692793] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[  214.692808] infiniband bnxt_re1: Couldn't start port
[  214.692880] bnxt_en 0000:03:00.1 bnxt_re1: Failed to destroy HW QP
[  214.692941] bnxt_en 0000:03:00.1 bnxt_re1: Free MW failed: 0xffffff92
[  214.692963] infiniband bnxt_re1: Couldn't open port 1
[  214.693195] infiniband bnxt_re1: Device registered with IB successfully
[  215.460890] vmbr0: port 1(enp3s0f0np0) entered blocking state
[  215.460912] vmbr0: port 1(enp3s0f0np0) entered disabled state
[  215.460944] bnxt_en 0000:03:00.0 enp3s0f0np0: entered allmulticast mode
[  215.461017] bnxt_en 0000:03:00.0 enp3s0f0np0: entered promiscuous mode
[  215.520579] bnxt_en 0000:03:00.0 enp3s0f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
[  215.520611] bnxt_en 0000:03:00.0 enp3s0f0np0: EEE is not active
[  215.520624] bnxt_en 0000:03:00.0 enp3s0f0np0: FEC autoneg off encoding: None
[  215.535647] bnxt_en 0000:03:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  215.535676] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  215.535691] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:6565 error=-110
[  215.535718] bnxt_en 0000:03:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  215.535731] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  215.535745] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:6565 error=-110
[  215.536031] vmbr0: port 1(enp3s0f0np0) entered blocking state
[  215.536069] vmbr0: port 1(enp3s0f0np0) entered forwarding state
[  215.536173] bnxt_en 0000:03:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  215.537967] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  215.538299] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:6565 error=-110
[  215.541063] bnxt_en 0000:03:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  215.543072] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  215.546741] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:6565 error=-110

The server also seems to "freeze" or get stuck on reboot and I have to manually reset the server.
Here's the screenshot from IPMI (or imgur link):
image.jpg

I also tried installing Ubuntu Server 24.04:
  • During installation network interfaces were detected correctly and got IP addresses from the DHCP server (i.e. one onboard, one SFP+)
  • Installation to the disk also went fine
  • When it was time to reboot I got pretty much the same messages as in the screenshot above and the server didn't really reboot, I had to reset it manually again.
  • After a very lengthy boot (after POST) the network came online, though not in bridge. There seem to be same kernel error messages in dmesg (kernel version 6.8.0-35-generic)
-- EDIT --

So I tried installing Windows Server 2022 and everything worked out of the box without any issues whatsoever (both integrated and Intel X520-DA2). So it seems to be a Linux issue.

-- EDIT #2 --

I tried installing Proxmox 8.1 and it seems to be working. There were still some AMD-related errors in dmesg but networking was working as expected. Once I manually upgraded to Proxmox 8.2 and rebooted, the issue returned.

So it seems it's somehow related to the new kernel and therefore might be an upstream issue? (kernel versions 6.8 vs. 6.5)

In any case, for the moment the issue is solved for me and I have to make a note not to upgrade this particular machine to Proxmox 8.2 and I'll guess I'll try to find some time to file a report to Supermicro.

Thank you
 
Last edited:
@Neobin Yeah, that seems to be the case. It's interesting that the drivers malfunction in such a spectacular way that it also affects the PCI-e Intel card and the networking seems to be b0rked in general.

At any rate staying on Proxmox 8.1 and kernel 6.5 seems to be the way to go for the moment.

Closing this thread and will follow the one you linked.

Thank you
 
I have to agree with Koiji we experienced a spectacular driver failure for the same reasons see attached picture. I tried all the possible solutions including banning the infiniband module ecc... nothing worked I had to resort to the proxmox 8.1. I'm not going to mess up with the niccli to try to make thingh work, I'm expecting to see a proxmox kernel release that will fix this issue --hopefully.

Thank you for posting this issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!