ifupdown2 defunct after updating to 8.2

itkfm

Member
Apr 1, 2021
8
2
8
Long story short, with the new kernel of PVE 8.2 this machine here…
  • hangs at the systemd-udev thing during boot for a while.
  • boots with all network interfaces being down.
  • takes forever to shutdown because of Failed to deinitialize RCFW followed by a stream of AMD-Vi IO_PAGE_FAULTs.

Please note that the interface names did not change between the kernel versions; they are still the same.

ifupdown2 does not apply the network configuration at boot; bridges like vmbr0 are never created.
ifreload -a does not work either: error: Another instance of this program is already running.

However, configuring the network interfaces manually does work (although each command is followed by a bunch of errors) and allows me to reach the webinterface:
Code:
ip addr add 192.168.70.185/24 dev eno1np0
ip link set eno1np0 up
ip route add default via 192.168.70.1

Code:
[  208.744897] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  208.744943] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  208.744977] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  208.745020] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  208.745052] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  208.745083] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  210.130377] bnxt_en 0000:45:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102231 > 100000) msec active 1
[  210.130433] bnxt_en 0000:45:00.1 bnxt_re1: Failed to modify HW QP
[  210.130465] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[  210.130496] infiniband bnxt_re1: Couldn't start port
[  210.130614] bnxt_en 0000:45:00.1 bnxt_re1: Failed to destroy HW QP
[  210.131811] bnxt_en 0000:45:00.1 bnxt_re1: Free MW failed: 0xffffff92
[  210.132341] infiniband bnxt_re1: Couldn't open port 1
[  210.133872] infiniband bnxt_re1: Device registered with IB successfully
[  214.163648] bnxt_en 0000:45:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
[  214.163714] bnxt_en 0000:45:00.0 eno1np0: EEE is not active
[  214.163731] bnxt_en 0000:45:00.0 eno1np0: FEC autoneg off encoding: None
[  214.179839] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  214.179865] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  214.179886] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  214.179911] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  214.179931] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  214.179950] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110

This should be the error that comes up during boot and prevents ifupdown2 from working:
→ see attachment

The network configuration in /etc/network/interfaces is fairly trivial:
Code:
auto lo
iface lo inet loopback

iface eno1np0 inet manual

iface eno2np1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.70.185/24
        gateway 192.168.70.1
        bridge-ports eno1np0
        bridge-stp off
        bridge-fd 0

iface enxbe3af2b6059f inet manual


This system is essentially a fresh install (8.0 or 8.1) updated to 8.2 via apt.

These issues do not occur when booting the old Linux 6.2 kernel. It only happens with the new 6.8 kernel.
 

Attachments

  • journal82_.txt
    16.5 KB · Views: 5
Same issue here after Upgrading Proxmox-Backup-Server to kernel "6.8.4-2-pve".
No network after boot!
"systemctl restart networking.service" from CLI brings up network-interfaces.

Code:
service systemd-udev-settle status
Failed to start systemd-udev-settle.service - Wait for udev To Complete Device Initialization.
 
Last edited:
Hi All,


Same issue with fresh intall of Proxmox 8.2-1 on DL380G10.

Terminal prompt "Failed to start ifupdown2-pre.service" buring boot.
ifipdown2-pre.png

"ip add show" all nic down. If I set them up with right ip address and I reboot, they are down after restart.

"journalctl -u systemd-udev-settle" show me warning with "wait for Complete Device initialization..."

So it looks exactly same issue.

When install version 8.1-2, it working fine but when I do update to current version it fails.

Regards.

Foreman21.
 
Long story short, with the new kernel of PVE 8.2 this machine here…
  • hangs at the systemd-udev thing during boot for a while.
  • boots with all network interfaces being down.
  • takes forever to shutdown because of Failed to deinitialize RCFW followed by a stream of AMD-Vi IO_PAGE_FAULTs.

Please note that the interface names did not change between the kernel versions; they are still the same.

ifupdown2 does not apply the network configuration at boot; bridges like vmbr0 are never created.
ifreload -a does not work either: error: Another instance of this program is already running.

However, configuring the network interfaces manually does work (although each command is followed by a bunch of errors) and allows me to reach the webinterface:
Code:
ip addr add 192.168.70.185/24 dev eno1np0
ip link set eno1np0 up
ip route add default via 192.168.70.1

Code:
[  208.744897] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  208.744943] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  208.744977] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  208.745020] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  208.745052] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  208.745083] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  210.130377] bnxt_en 0000:45:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102231 > 100000) msec active 1
[  210.130433] bnxt_en 0000:45:00.1 bnxt_re1: Failed to modify HW QP
[  210.130465] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[  210.130496] infiniband bnxt_re1: Couldn't start port
[  210.130614] bnxt_en 0000:45:00.1 bnxt_re1: Failed to destroy HW QP
[  210.131811] bnxt_en 0000:45:00.1 bnxt_re1: Free MW failed: 0xffffff92
[  210.132341] infiniband bnxt_re1: Couldn't open port 1
[  210.133872] infiniband bnxt_re1: Device registered with IB successfully
[  214.163648] bnxt_en 0000:45:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
[  214.163714] bnxt_en 0000:45:00.0 eno1np0: EEE is not active
[  214.163731] bnxt_en 0000:45:00.0 eno1np0: FEC autoneg off encoding: None
[  214.179839] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  214.179865] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  214.179886] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  214.179911] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  214.179931] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  214.179950] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110

This should be the error that comes up during boot and prevents ifupdown2 from working:
→ see attachment

The network configuration in /etc/network/interfaces is fairly trivial:
Code:
auto lo
iface lo inet loopback

iface eno1np0 inet manual

iface eno2np1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.70.185/24
        gateway 192.168.70.1
        bridge-ports eno1np0
        bridge-stp off
        bridge-fd 0

iface enxbe3af2b6059f inet manual


This system is essentially a fresh install (8.0 or 8.1) updated to 8.2 via apt.

These issues do not occur when booting the old Linux 6.2 kernel. It only happens with the new 6.8 kernel.
This look like a bug in bnxt_en driver in kernel 6.8. (not related to ifupdown2)
 
  • akes forever to shutdown because of Failed to deinitialize RCFW followed by a stream of AMD-Vi IO_PAGE_FAULTs.
This issue - more precisely the trace pointing to bnxt_re seems like the issue described in:
https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-651394

See the following threads for workarounds and fixes:
https://forum.proxmox.com/threads/o...est-no-subscription.144557/page-3#post-652507
https://forum.proxmox.com/threads/broadcom-nics-down-after-pve-8-2-kernel-6-8.146185/

(put shortly adding a `blacklist bnxt_re` to the modprobe configuration (and running `update-intramfs -kall -u` ) should fix it, disabling infiniband on the NIC itself might be more future-proof)
 
  • Like
Reactions: shabsta
This issue - more precisely the trace pointing to bnxt_re seems like the issue described in:
https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-651394

See the following threads for workarounds and fixes:
https://forum.proxmox.com/threads/o...est-no-subscription.144557/page-3#post-652507
https://forum.proxmox.com/threads/broadcom-nics-down-after-pve-8-2-kernel-6-8.146185/

(put shortly adding a `blacklist bnxt_re` to the modprobe configuration (and running `update-intramfs -kall -u` ) should fix it, disabling infiniband on the NIC itself might be more future-proof)
Thanks for your advice Stoiko,

Fresh install on new SuperMicro servers and ran into the same issue.
 
TLDR - updating Broadcom ethernet adaptors to latest HPE firmware fixed similar issues for me.

I am currently migrating from Vmware to PVE and encountered issues similar to this after upgrading the first two nodes to 8.2.2. All nodes are running relatively identical hardware including each having 2 x Broadcom based10/25Gb Ethernet adapters. The 2 nodes behaved differently with one node failing to load the network on boot and hanging for a long time.

journalctl -u ifupdown2-pre.service

provided:

Code:
-- Boot d8a4df93bcff4236bdc0331d86f03170 --
Jun 06 10:08:46 pve1 systemd[1]: Starting ifupdown2-pre.service - Helper to synchroni>
Jun 06 10:10:46 pve1 udevadm[717]: Timed out for waiting the udev queue being empty.
Jun 06 10:10:46 pve1 systemd[1]: ifupdown2-pre.service: Main process exited, code=exi>
Jun 06 10:10:46 pve1 systemd[1]: ifupdown2-pre.service: Failed with result 'exit-code>

Networking was not avalabile with ifreload -a giving error: Another instance of this program is already running networking could be restarted with systemctl restart networking and then the node would behave. This occured on my test deployment before committing to using PVE and after a clean install before preparing to migrate production servers.

The other node started fine and behaved but gave numerous network and bnxt_en errors on shutdown.

Each node has 1 x HPE 631SFP28 and 1 x HPE P255p network adaptors that were new and had been supplied with different firmwares. I updated all nodes to the latest firmware (228.1.111.0 [P255p] and 228.1.111000 [631SFP28]) and all issues were resolved after a restart.
 
For me it was 2 failed drives that caused the timeout.
I changed them and everything is fine now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!