HP DL 380 Gen 9 issues on 5.4.73 & 5.4.78 kernel

adamb

Famous Member
Mar 1, 2012
1,329
77
113
Updated some of my test nodes this morning.

On one of them, which is a HP DL 380 Gen9, it boots up without any networking. This is specific to 5.4.73-1, if I go back to 5.4.65-1 all is well. I don't think its a driver issue as the host is using ixgbe and bnx2x.

Ethtool just reports the interfaces as Unknown for the connection state.

On the 5.4.73-1 kernel everything looks ok besides networking. Any suggestions or ideas what the issue could be?
 
Last edited:
hi,

do you see anything interesting in journalctl or dmesg outputs?
 
See lots of these entries as well.

[Tue Dec 1 07:03:03 2020] scsi host10: BC_298 : MBX Cmd Completion timed out
[Tue Dec 1 07:03:03 2020] scsi host10: BG_1108 : MBX CMD get_boot_target Failed
[Tue Dec 1 07:03:31 2020] INFO: task systemd-udevd:504 blocked for more than 120 seconds.
[Tue Dec 1 07:03:31 2020] Tainted: P O 5.4.73-1-pve #1

- If I restart networking once the host has booted, networking starts and looks ok

We have tons of these setup out in the field, really need some attention on this issue.
 
Last edited:
Service that are failing to start on these latest kernel's.

root@supprox1:~# systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● pvesr.service loaded failed failed Proxmox VE replication runner
● systemd-udev-settle.service loaded failed failed udev Wait for Complete Device Initialization

LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.

2 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
 
can you post your dmesg and journal?

Ethtool just reports the interfaces as Unknown for the connection state.
what does ethtool -i <interface> say?

root@supprox1:~# systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● pvesr.service loaded failed failed Proxmox VE replication runner
● systemd-udev-settle.service loaded failed failed udev Wait for Complete Device Initialization
and systemctl status for these services can be helpful
 
can you post your dmesg and journal?


what does ethtool -i <interface> say?


and systemctl status for these services can be helpful
I attached some screen shots for ethtool and the service status outputs. Also attached dmesg and journalctl outputs. Keep in mind, networking starts fine once I run systemctl restart networking.service.

I am starting to wonder if its related to the SCSI HP MSA Central Storage. The below messages seem to be related to the SCSI storage.

On the 5.4.73 and 5.4.78 kernel I get the below messages.

[Tue Dec 1 07:03:03 2020] scsi host10: BC_298 : MBX Cmd Completion timed out
[Tue Dec 1 07:03:03 2020] scsi host10: BG_1108 : MBX CMD get_boot_target Failed

On 5.4.65 I don't see the above messages, the host boots as expected and there are no issues.
 

Attachments

  • ethtool.jpg
    ethtool.jpg
    109.9 KB · Views: 17
  • services.jpg
    services.jpg
    330.5 KB · Views: 17
  • dmesg.txt
    dmesg.txt
    161.2 KB · Views: 11
  • journalctl.txt
    journalctl.txt
    262.9 KB · Views: 5
Last edited:
could you also post ethtool -i <interface (-i flag is important)
 
could you also post ethtool -i <interface (-i flag is important)
Here it is. Keep in mind the same thing is happening for my bnx2x adapters. Networking is 100% once I get on it and run "systemctl restart networking.service"
 

Attachments

  • ethtool_i.jpg
    ethtool_i.jpg
    67.4 KB · Views: 21
Well I take that back. After mask'ing "systemd-udev-settle.service" I then noticed that "ifupdown-pre.service" was also failing. I masked "ifupdown-pre.service" and now the server is booting with networking as expected.

However it is still throwing the kernel oops and its booting very slow.

I can also still go back to 5.4.65 and everything works as expected, no kernel oops, boots fast.

Something just doesn't seem right with these new kernels.
 
Updated 3 other HP DL 380 Gen9's that we have inhouse, but aren't using HP MSA storage.

Same exact issue.

So far all my HP Gen10's aren't doing this. I have some vanilla Super Micro hardware that seems aok as well. Trying to find a HP Gen8 inhouse but I don't think we have one anymore.
 
Last edited:
The same issue on HP BL460c Gen9!

Masking the systemd-udev-settle.service doesn't help.
 

Attachments

  • Screenshot_2020-12-02_17-29-46.png
    Screenshot_2020-12-02_17-29-46.png
    17.9 KB · Views: 16
Last edited:
1. can you please ensure that the latest firmware is installed
2. there's a new kernel (pve-kernel-5.4.78-1-pve) available on pvetest and pve-no-subscription), it would be worth to test that one

We have a gen8 in our testlab, which booted up fine with that kernel, IIRC, but we'll do some more test while we investigate this, just to be sure.
 
Updating firmware of all components in blade was the first I tried even before starting to search the internet.
 

Attachments

  • Screenshot_2020-12-02_16-28-16.png
    Screenshot_2020-12-02_16-28-16.png
    104.1 KB · Views: 25
1. can you please ensure that the latest firmware is installed
2. there's a new kernel (pve-kernel-5.4.78-1-pve) available on pvetest and pve-no-subscription), it would be worth to test that one

We have a gen8 in our testlab, which booted up fine with that kernel, IIRC, but we'll do some more test while we investigate this, just to be sure.

1. All the iLO and Bios related updates are recent on my one test machine.
2. I did test with the 5.4.78 kernel from testing, and it was the same issue. 5.4.65 does work.
 
pve-kernel-5.4.78-1-pve has the same issue as pve-kernel-5.4.73-1-pve

Last known good version for HP Gen9 servers is pve-kernel-5.4.65-1-pve

Maybe attached screenshot helps to find the cause?
 

Attachments

  • Screenshot_2020-12-02_18-35-18.png
    Screenshot_2020-12-02_18-35-18.png
    9.8 KB · Views: 22
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!