Network interfaces down on reboot

GrigoriOh

New Member
Aug 16, 2024
3
0
1
Hello there,

I just installed Proxmox VE 8.2.2 on a new Supermicro server hardware with a Broadcom CM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet. Installation works like a charm and the Proxmox boots up to the point where I am shown the IP and port of the web interface.

Bash:
pveversion
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)
Unfortunately, the interface is unreachable, prompting detailed bug-tracking.

As of now, I found out, that the network interface underlying the bridge is down on boot. This results in the bridge not being present in ip a at all.

1.jpg


Running systemctl restart networking results in a fully functional Proxmox VE with functioning interfaces and bridge.

2.jpg

Restarting the system leads to the same error again, prompting the need of restarting the networking stack manually (or via cron) every boot - obviously sub-optimal.

Furthermore, I noticed that during boot ifupdown2-pre and systemd-udev-settle do not load properly.

boot_1.jpg

boot_2.jpg

This is my interfaces config:
Code:
auto lo
iface lo inet loopback

iface eno1np0 inet manual

#auto vmbr0
#iface vmbr0 inet static
auto vlan290
iface vlan290 inet static
        address 10.28.252.1/25
        gateway 10.28.252.126
        bridge-ports eno1np0
        bridge-stp off
        bridge-fd 0

iface eno2np1 inet manual

iface enxbe3af2b6059f inet manual

iface enp129s0f0 inet manual

iface enp129s0f1 inet manual

iface enp129s0f2 inet manual

iface enp129s0f3 inet manual


source /etc/network/interfaces.d/*

In case some specific logs are of interest, please let me know, and I will do my best to assist further bug tracing.

Best regards
Grigori
 
Welcome to the forum Grigori!

It seems like your interfaces setup is fine, but threads from other users suggested that there are known issues with your NIC. It would be helpful to know if there were any errors setting up your NICs from the syslog (e.g. journalctl -g bnxt) and what errors were while starting the two systemd units mentioned with systemctl status ifupdown2-pre.service and systemctl status systemd-udev-settle.service.

I have found the following workarounds in the forum. Please assess if you are in similar situations and act accordingly.

1. Update the firmware of your NIC. [1]
2a. Disable the RDMA feature (if you don't need it) on your NIC itself (enabled by default, you need to install the niccli tool). [2]
Code:
niccli -i 1 nvm -setoption support_rdma -scope 0 -value 0
niccli -i 1 reset
2b. Disable loading the RDMA driver (if you don't need it). [1]
Code:
echo "blacklist bnxt_re" >> /etc/modprobe.d/blacklist-bnxt_re.conf
update-initramfs -u

[1] https://forum.proxmox.com/threads/broadcom-nics-down-after-pve-8-2-kernel-6-8.146185/
[2] https://forum.proxmox.com/threads/o...est-no-subscription.144557/page-3#post-652507
 
Hello dakralex,

thank you for the warm and helpful welcome!

Regarding the NIC specific logs:
Code:
-- Boot 3e0f8cb38374461bb781609c57b1b483 --
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0: Unable to read VPD
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eth0: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem 28080110000, node addr 3c:ec:ef:a1:df:7c
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1: Unable to read VPD
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eth2: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem 28080100000, node addr 3c:ec:ef:a1:df:7d
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eno1np0: renamed from eth0
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eno2np1: renamed from eth2
Aug 16 14:50:00 sy-prxmx-sd-1 kernel: bnxt_re: Broadcom NetXtreme-C/E RoCE Driver
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  bnxt_qplib_alloc_init_hwq.cold+0x8c/0xd7 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  bnxt_qplib_create_qp+0x1d5/0x8c0 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  ? bnxt_re_create_qp+0x5f4/0xf30 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  bnxt_re_create_qp+0x71d/0xf30 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  ? bnxt_qplib_create_cq+0x247/0x330 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
Aug 16 14:50:00 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Aug 16 14:51:01 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.0: Worker [1691] processing SEQNUM=21407 is taking a long time
Aug 16 14:51:01 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.1: Worker [1627] processing SEQNUM=21410 is taking a long time
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102891 > 100000) msec active 1
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to modify HW QP
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: Couldn't start port
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to destroy HW QP
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: Modules linked in: ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul polyval>
Aug 16 14:51:43 sy-prxmx-sd-1 kernel:  bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
Aug 16 14:51:43 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
Aug 16 14:51:43 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Aug 16 14:51:43 sy-prxmx-sd-1 kernel:  bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
Aug 16 14:51:43 sy-prxmx-sd-1 kernel:  ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Free MW failed: 0xffffff92
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: Couldn't open port 1
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: Device registered with IB successfully
Aug 16 14:53:01 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.0: Worker [1691] processing SEQNUM=21407 killed
Aug 16 14:53:01 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.1: Worker [1627] processing SEQNUM=21410 killed
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102332 > 100000) msec active 1
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Failed to modify HW QP
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: Couldn't start port
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Failed to destroy HW QP
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Free MW failed: 0xffffff92
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: Couldn't open port 1
Aug 16 14:53:25 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: Device registered with IB successfully
Aug 16 14:53:25 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.0: Worker [1691] terminated by signal 9 (KILL).
Aug 16 14:53:25 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.1: Worker [1627] terminated by signal 9 (KILL).
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eno1np0: entered allmulticast mode
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eno1np0: entered promiscuous mode
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eno1np0: EEE is not active
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 eno1np0: FEC autoneg off encoding: None
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eno2np1: entered allmulticast mode
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eno2np1: entered promiscuous mode
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eno2np1: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eno2np1: EEE is not active
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 eno2np1: FEC autoneg off encoding: None
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=2
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=2
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=4
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.1 bnxt_re1: Failed to add GID: 0xffffff92
Aug 16 14:57:09 sy-prxmx-sd-1 kernel: infiniband bnxt_re1: add_roce_gid GID add failed port=1 index=4

Text extraction for systemctl status is a bit finicky, as it only shows errors before systemctl restart networking and there, I cannot access it via web interface, hence the following screenshots.

ifupdown2-pre.png

systemctl-udev-settle.png

As soon as I am able to try the suggested workarounds, I will post an update - Thanks again for the fast an helpful reply.

Bests,
Grigori
 
Aug 16 14:51:01 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.0: Worker [1691] processing SEQNUM=21407 is taking a long time
Aug 16 14:51:01 sy-prxmx-sd-1 systemd-udevd[1606]: bnxt_en.rdma.1: Worker [1627] processing SEQNUM=21410 is taking a long time
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102891 > 100000) msec active 1
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: bnxt_en 0000:43:00.0 bnxt_re0: Failed to modify HW QP
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
Aug 16 14:51:43 sy-prxmx-sd-1 kernel: infiniband bnxt_re0: Couldn't start port
Good to see the logs for verification! It seems clear to me that you suffer from the same issues as described in the linked threads and I would recommend to you to disable the RDMA feature on either the NIC itself and/or disabling the driver, however you see fit best.
 
Thanks again for your kind assistance!

Disabeling the RDMA driver helped and sped up the booting by a lot:

Bash:
echo "blacklist bnxt_re" >> /etc/modprobe.d/blacklist-bnxt_re.conf
update-initramfs -u

I will install checkMK to monitor the stability and keep you updated on the issue, if it returns.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!