5.15.102-1-pve fails to boot on HP DL 560

adamb

Famous Member
Mar 1, 2012
1,329
77
113
The 5.15.102-1-pve kernel fails to boot on HP DL 560's. I haven't had a chance to test other hardware, but it definitely has issues on the 560. Hoping this kernel doesn't get released to the enterprise repo's as we have ALOT of 560's in production.

Attached a screen shot of what happens during boot. No problems on older 5.15.74 or 5.15.83. No issues on the 6.1.x boot wise either.
 

Attachments

  • unnamed.jpg
    unnamed.jpg
    52.7 KB · Views: 36
could you get the full console output (e.g., via serial console or similar)?
 
could you get the full console output (e.g., via serial console or similar)?

Thats it. It was a brand new install as well. It would sit for 5-10 minutes, then it would print those hung task messages and that was it. I let it sit for 10+ minutes, but there was nothing else after that.

I reinstalled a 2nd time and same issue. If there is anything else I can provide let me know. I have iLO access to the console if anything else would be beneficial.
 
could you try with the "quiet" parameter removed from the kernel command line? thanks!
 
  • Like
Reactions: Stoiko Ivanov
Hmm - which gen are the dl560 - and what kind of CPU/Storage?

Is the latest available BIOS installed?

Our hp dl380 g8 works with 5.15.102-1-pve - the only thing it needs is `intremap=off` on the kernel-commandline (but that has been this way since kernel 5.11 afair)
 
Hmm - which gen are the dl560 - and what kind of CPU/Storage?

Is the latest available BIOS installed?

Our hp dl380 g8 works with 5.15.102-1-pve - the only thing it needs is `intremap=off` on the kernel-commandline (but that has been this way since kernel 5.11 afair)

Its a Gen 10 with 4x Intel Gold 6254's.

Bios is a little older but not to bad (2022), might be one new revision.

After lots of testing I have narrowed it down to the below NIC.

13:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
13:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

If its removed, server boots.
 
Its a Gen 10 with 4x Intel Gold 6254's.

Bios is a little older but not to bad (2022), might be one new revision.

After lots of testing I have narrowed it down to the below NIC.

13:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
13:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

If its removed, server boots.
Thanks! - hmm that's odd - the other report in our bugzilla seems to have issues with Intel's X810 NICs

any chance you have a test system where you could see if the potential BIOS (and especially NIC-firmware if available) upgrade changes anything?
 
Thanks! - hmm that's odd - the other report in our bugzilla seems to have issues with Intel's X810 NICs

any chance you have a test system where you could see if the potential BIOS (and especially NIC-firmware if available) upgrade changes anything?

It just so happens that this server also has Intel E810 based NIC's (Both 25G and 100G). They are working great with the ice drivers. Moving data as we speak over the E810 NIC's on the 5.15.74-1-pve kernel. ..

root@gppctestprox:~# lspci | grep Ethernet
11:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
11:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
13:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
13:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
28:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
28:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
37:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
37:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
37:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
37:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
45:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
45:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-C for QSFP (rev 02)
53:00.0 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
53:00.1 Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)

I just found another gen10 560 that we have which also has the BCM57810 10 Gigabit card, but it doesn't have any of the Intel E810 cards. Its booting fine on the newer kernel.

I never actually removed the BCM57810 card, I simply tested on a server that doesn't have that card and assumed it was the issue.

Starting to think its actually a issue with the E810 cards and newer kernels. These E810 cards are brand new, firmware looks to be new as well.

Hoping to get into the office soon and ill pull the E810 cards and see what happens. It would certainly make more sense than the BCM57810 cards.
 
Can confirm, it is the

5.15.102-1-pve with the E810 NIC - more complaints here: https://forum.proxmox.com/threads/opt-in-linux-6-2-kernel-for-proxmox-ve-7-x-available.124189/

I am having the same problem, server won't boot, hangs at NetworkManager, going into rescue mode, network commands like ip l hang the system. Note the 102 version is in the Enterprise repository. I have to manually downgrade and pin the 85 version. (pve-kernel-5.15.85-1).

Upstream bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262

The apparent fix:
https://github.com/torvalds/linux/commit/248401cb2c4612d83eb0c352ee8103b78b8eb365
 
Last edited:
Can confirm, it is the

5.15.102-1-pve with the E810 NIC - more complaints here: https://forum.proxmox.com/threads/opt-in-linux-6-2-kernel-for-proxmox-ve-7-x-available.124189/

I am having the same problem, server won't boot, hangs at NetworkManager, going into rescue mode, network commands like ip l hang the system. Note the 102 version is in the Enterprise repository. I have to manually downgrade and pin the 85 version. (pve-kernel-5.15.85-1).

Upstream bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2004262

The apparent fix:
https://github.com/torvalds/linux/commit/248401cb2c4612d83eb0c352ee8103b78b8eb365

Posting this here also since it's relevant:
https://bugzilla.proxmox.com/show_bug.cgi?id=4604#c10
 
Thanks for the digging into the issue and gathering the potential fix in https://github.com/torvalds/linux/commit/248401cb2c4612d83eb0c352ee8103b78b8eb365

the patch has been ported to the 5.15 kernel series - so we'll see what the best option is to include it in our kernel
(cherry-pick the one patch or release it with a new version of 5.15. (it's included in 5.15.104)

I'll update the bugzilla entry as well - and will post further updates there (as it's the more appropriate channel, and duplicating the information does not help anyone)
 
  • Like
Reactions: adamb and Neobin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!