ifupdown2 defunct after updating to 8.2

itkfm · Apr 29, 2024

Long story short, with the new kernel of PVE 8.2 this machine here…

hangs at the systemd-udev thing during boot for a while.
boots with all network interfaces being down.
takes forever to shutdown because of Failed to deinitialize RCFW followed by a stream of AMD-Vi IO_PAGE_FAULTs.

Please note that the interface names did not change between the kernel versions; they are still the same.

ifupdown2 does not apply the network configuration at boot; bridges like vmbr0 are never created.
ifreload -a does not work either: error: Another instance of this program is already running.

However, configuring the network interfaces manually does work (although each command is followed by a bunch of errors) and allows me to reach the webinterface:

Code:

ip addr add 192.168.70.185/24 dev eno1np0
ip link set eno1np0 up
ip route add default via 192.168.70.1

Code:

[  208.744897] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  208.744943] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  208.744977] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  208.745020] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  208.745052] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  208.745083] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  210.130377] bnxt_en 0000:45:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102231 > 100000) msec active 1
[  210.130433] bnxt_en 0000:45:00.1 bnxt_re1: Failed to modify HW QP
[  210.130465] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
[  210.130496] infiniband bnxt_re1: Couldn't start port
[  210.130614] bnxt_en 0000:45:00.1 bnxt_re1: Failed to destroy HW QP
[  210.131811] bnxt_en 0000:45:00.1 bnxt_re1: Free MW failed: 0xffffff92
[  210.132341] infiniband bnxt_re1: Couldn't open port 1
[  210.133872] infiniband bnxt_re1: Device registered with IB successfully
[  214.163648] bnxt_en 0000:45:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none
[  214.163714] bnxt_en 0000:45:00.0 eno1np0: EEE is not active
[  214.163731] bnxt_en 0000:45:00.0 eno1np0: FEC autoneg off encoding: None
[  214.179839] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  214.179865] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  214.179886] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110
[  214.179911] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92
[  214.179931] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2
[  214.179950] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110

This should be the error that comes up during boot and prevents ifupdown2 from working:
→ see attachment

The network configuration in /etc/network/interfaces is fairly trivial:

Code:

auto lo
iface lo inet loopback

iface eno1np0 inet manual

iface eno2np1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.70.185/24
        gateway 192.168.70.1
        bridge-ports eno1np0
        bridge-stp off
        bridge-fd 0

iface enxbe3af2b6059f inet manual

This system is essentially a fresh install (8.0 or 8.1) updated to 8.2 via apt.

These issues do not occur when booting the old Linux 6.2 kernel. It only happens with the new 6.8 kernel.

Julius Glatz GmbH · May 2, 2024

Same issue here after Upgrading Proxmox-Backup-Server to kernel "6.8.4-2-pve".
No network after boot!
"systemctl restart networking.service" from CLI brings up network-interfaces.

Code:

service systemd-udev-settle status
Failed to start systemd-udev-settle.service - Wait for udev To Complete Device Initialization.

Foreman21 · May 2, 2024

Hi All,

Same issue with fresh intall of Proxmox 8.2-1 on DL380G10.

Terminal prompt "Failed to start ifupdown2-pre.service" buring boot.

"ip add show" all nic down. If I set them up with right ip address and I reboot, they are down after restart.

"journalctl -u systemd-udev-settle" show me warning with "wait for Complete Device initialization..."

So it looks exactly same issue.

When install version 8.1-2, it working fine but when I do update to current version it fails.

Regards.

Foreman21.

spirit · May 2, 2024

itkfm said:
Long story short, with the new kernel of PVE 8.2 this machine here…

hangs at the systemd-udev thing during boot for a while.

boots with all network interfaces being down.

takes forever to shutdown because of Failed to deinitialize RCFW followed by a stream of AMD-Vi IO_PAGE_FAULTs.

Please note that the interface names did not change between the kernel versions; they are still the same.

ifupdown2 does not apply the network configuration at boot; bridges like vmbr0 are never created.
ifreload -a does not work either: error: Another instance of this program is already running.

However, configuring the network interfaces manually does work (although each command is followed by a bunch of errors) and allows me to reach the webinterface:

Code:

ip addr add 192.168.70.185/24 dev eno1np0 ip link set eno1np0 up ip route add default via 192.168.70.1

Code:

[ 208.744897] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92 [ 208.744943] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2 [ 208.744977] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110 [ 208.745020] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92 [ 208.745052] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2 [ 208.745083] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110 [ 210.130377] bnxt_en 0000:45:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102231 > 100000) msec active 1 [ 210.130433] bnxt_en 0000:45:00.1 bnxt_re1: Failed to modify HW QP [ 210.130465] infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110 [ 210.130496] infiniband bnxt_re1: Couldn't start port [ 210.130614] bnxt_en 0000:45:00.1 bnxt_re1: Failed to destroy HW QP [ 210.131811] bnxt_en 0000:45:00.1 bnxt_re1: Free MW failed: 0xffffff92 [ 210.132341] infiniband bnxt_re1: Couldn't open port 1 [ 210.133872] infiniband bnxt_re1: Device registered with IB successfully [ 214.163648] bnxt_en 0000:45:00.0 eno1np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: none [ 214.163714] bnxt_en 0000:45:00.0 eno1np0: EEE is not active [ 214.163731] bnxt_en 0000:45:00.0 eno1np0: FEC autoneg off encoding: None [ 214.179839] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92 [ 214.179865] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2 [ 214.179886] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110 [ 214.179911] bnxt_en 0000:45:00.0 bnxt_re0: Failed to add GID: 0xffffff92 [ 214.179931] infiniband bnxt_re0: add_roce_gid GID add failed port=1 index=2 [ 214.179950] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:c0a8:46b9 error=-110

This should be the error that comes up during boot and prevents ifupdown2 from working:
→ see attachment

The network configuration in /etc/network/interfaces is fairly trivial:

Code:

auto lo iface lo inet loopback iface eno1np0 inet manual iface eno2np1 inet manual auto vmbr0 iface vmbr0 inet static address 192.168.70.185/24 gateway 192.168.70.1 bridge-ports eno1np0 bridge-stp off bridge-fd 0 iface enxbe3af2b6059f inet manual

This system is essentially a fresh install (8.0 or 8.1) updated to 8.2 via apt.

These issues do not occur when booting the old Linux 6.2 kernel. It only happens with the new 6.8 kernel.

This look like a bug in bnxt_en driver in kernel 6.8. (not related to ifupdown2)

epelc · May 2, 2024

Also have this exact same issue after upgrading to 8.2.2 and kernel 6.5 -> 6.8.4-2-pve

tl5k5 · May 2, 2024

I appear to be having the same issue.
https://forum.proxmox.com/threads/8-2-2-upgrade-breaks-1st-node-manual-network-start-needed.146192/

Stoiko Ivanov · May 3, 2024

itkfm said:
akes forever to shutdown because of Failed to deinitialize RCFW followed by a stream of AMD-Vi IO_PAGE_FAULTs.

This issue - more precisely the trace pointing to bnxt_re seems like the issue described in:
https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-651394

See the following threads for workarounds and fixes:
https://forum.proxmox.com/threads/o...est-no-subscription.144557/page-3#post-652507
https://forum.proxmox.com/threads/broadcom-nics-down-after-pve-8-2-kernel-6-8.146185/

(put shortly adding a `blacklist bnxt_re` to the modprobe configuration (and running `update-intramfs -kall -u` ) should fix it, disabling infiniband on the NIC itself might be more future-proof)

shabsta · May 17, 2024

Stoiko Ivanov said:
This issue - more precisely the trace pointing to bnxt_re seems like the issue described in:
https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-651394

See the following threads for workarounds and fixes:
https://forum.proxmox.com/threads/o...est-no-subscription.144557/page-3#post-652507
https://forum.proxmox.com/threads/broadcom-nics-down-after-pve-8-2-kernel-6-8.146185/

(put shortly adding a `blacklist bnxt_re` to the modprobe configuration (and running `update-intramfs -kall -u` ) should fix it, disabling infiniband on the NIC itself might be more future-proof)

Thanks for your advice Stoiko,

Fresh install on new SuperMicro servers and ran into the same issue.

skinnzl · Jun 6, 2024

TLDR - updating Broadcom ethernet adaptors to latest HPE firmware fixed similar issues for me.

I am currently migrating from Vmware to PVE and encountered issues similar to this after upgrading the first two nodes to 8.2.2. All nodes are running relatively identical hardware including each having 2 x Broadcom based10/25Gb Ethernet adapters. The 2 nodes behaved differently with one node failing to load the network on boot and hanging for a long time.

journalctl -u ifupdown2-pre.service

provided:

Code:

-- Boot d8a4df93bcff4236bdc0331d86f03170 --
Jun 06 10:08:46 pve1 systemd[1]: Starting ifupdown2-pre.service - Helper to synchroni>
Jun 06 10:10:46 pve1 udevadm[717]: Timed out for waiting the udev queue being empty.
Jun 06 10:10:46 pve1 systemd[1]: ifupdown2-pre.service: Main process exited, code=exi>
Jun 06 10:10:46 pve1 systemd[1]: ifupdown2-pre.service: Failed with result 'exit-code>

Networking was not avalabile with ifreload -a giving error: Another instance of this program is already running networking could be restarted with systemctl restart networking and then the node would behave. This occured on my test deployment before committing to using PVE and after a clean install before preparing to migrate production servers.

The other node started fine and behaved but gave numerous network and bnxt_en errors on shutdown.

Each node has 1 x HPE 631SFP28 and 1 x HPE P255p network adaptors that were new and had been supplied with different firmwares. I updated all nodes to the latest firmware (228.1.111.0 [P255p] and 228.1.111000 [631SFP28]) and all issues were resolved after a restart.

florian-n · Jun 25, 2024

I was able to fix this by disabling the the RDMA-feature of both onboard adapters in the BIOS.

chup5 · Sep 12, 2024

florian-n said:
I was able to fix this by disabling the the RDMA-feature of both onboard adapters in the BIOS.

you changed my life - i wasted like 4 days with this issue.
thx by a linux newb.

fxandrei · Nov 8, 2024

For me it was 2 failed drives that caused the timeout.
I changed them and everything is fine now.

slot · Nov 18, 2024

florian-n said:
I was able to fix this by disabling the the RDMA-feature of both onboard adapters in the BIOS.

Hi,
have the same problem and disabling RDMA fix it but does this have any impact on performance?

Search

Search

ifupdown2 defunct after updating to 8.2

itkfm

Member

Attachments

Julius Glatz GmbH

Member

Foreman21

New Member

spirit

Distinguished Member

epelc

New Member

tl5k5

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

shabsta

Member

skinnzl

New Member

florian-n

Member

chup5

New Member

fxandrei

Renowned Member

slot

Member