Did Kernel 6.2.16-5 kill IPMI? How to fix?

Riesling.Dry

Renowned Member
Jul 17, 2014
90
7
73
Hi all,
about three weeks ago, we completely reinstalled the server (HP Proliant DL160 G6 w. HP iLO100) and installed Proxmox 8 with pve-kernel-6.2 - everything worked perfectly finer. A few days ago, the kernel was updated from version 6.2.16-4 to version 6.2.16-5 (and subsequently to 6.2.16-6 yesterday).

Since 6.2.16-5, the iLO (Integrated Lights-Out) is no longer accessible (it still responds to ping, but http and ssh are "down"), and it seems that the updated kernel somehow cannot communicate with the iLO anymore; of course, another cause cannot be completely ruled out... :°)

The host system shows the following error messages:
Code:
root@pve:~# ipmitool channel info
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
root@pve:~# systemctl status ipmievd.service
× ipmievd.service - IPMI event daemon
     Loaded: loaded (/lib/systemd/system/ipmievd.service; disabled; preset: enabled)
     Active: failed (Result: exit-code) since Wed 2023-07-26 15:53:13 CEST; 10s ago
    Process: 279491 ExecStart=/usr/sbin/ipmievd open daemon (code=exited, status=1/FAILURE)
        CPU: 52ms
root@pve:~#  dmesg |grep ipmi
[   33.533060] ipmi device interface
[   33.646428] ipmi_si: IPMI System Interface driver
[   33.646452] ipmi_si dmi-ipmi-si.0: ipmi_platform: probing via SMBIOS
[   33.646455] ipmi_platform: ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
[   33.646459] ipmi_si: Adding SMBIOS-specified kcs state machine
[   33.646523] ipmi_si IPI0001:00: ipmi_platform: probing via ACPI
[   33.646571] ipmi_si IPI0001:00: ipmi_platform: [io  0x0ca2] regsize 1 spacing 1 irq 0
[   33.668225] ipmi_si dmi-ipmi-si.0: Removing SMBIOS-specified kcs state machine in favor of ACPI
[   33.668231] ipmi_si: Adding ACPI-specified kcs state machine
[   33.668312] ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x20, irq 0
[   33.998685] ipmi_si IPI0001:00: There appears to be no BMC at this location
[   34.045562] ipmi_ssif: IPMI SSIF Interface driver
root@pve:/etc# systemctl status ipmievd.service
× ipmievd.service - IPMI event daemon
     Loaded: loaded (/lib/systemd/system/ipmievd.service; disabled; preset: enabled)
     Active: failed (Result: exit-code) since Wed 2023-07-26 15:53:13 CEST; 37min ago
    Process: 279491 ExecStart=/usr/sbin/ipmievd open daemon (code=exited, status=1/FAILURE)
        CPU: 52ms


Jul 26 15:53:13 pve systemd[1]: Starting ipmievd.service - IPMI event daemon...
Jul 26 15:53:13 pve ipmievd[279491]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Jul 26 15:53:13 pve systemd[1]: ipmievd.service: Control process exited, code=exited, status=1/FAILURE
Jul 26 15:53:13 pve systemd[1]: ipmievd.service: Failed with result 'exit-code'.
Jul 26 15:53:13 pve systemd[1]: Failed to start ipmievd.service - IPMI event daemon.

iLO Reset not possible:
Code:
root@pve:~# ipmitool mc reset warm
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
root@pve:~# ipmitool mc reset cold
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory

The device seems to be present:
Code:
root@pve:~# cat /proc/devices | grep ipmi
241 ipmidev

but apparently the nodes are not generated.
Manually adding them with
root@pve:~# mknod /dev/ipmi0 c 241 0x0 doesn't help.

Kernel Modules are loaded:
Code:
root@pve:~#  lsmod | grep ipmi
ipmi_watchdog          32768  0
ipmi_ssif              49152  0
acpi_ipmi              24576  0
ipmi_si                90112  0
ipmi_poweroff          16384  0
ipmi_devintf           20480  0
ipmi_msghandler        86016  6 ipmi_devintf,ipmi_si,ipmi_watchdog,acpi_ipmi,ipmi_ssif,ipmi_poweroff
root@pve:~# find /lib/modules/$(uname -r)/kernel/drivers/char/ipmi/ -type f -name '*.ko*' -exec modinfo {} \; | egrep "^filename|^description"
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/char/ipmi/ipmi_msghandler.ko
description:    Incoming and outgoing message routing for an IPMI interface.
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/char/ipmi/ipmi_watchdog.ko
description:    watchdog timer based upon the IPMI interface.
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/char/ipmi/ipmi_poweroff.ko
description:    IPMI Poweroff extension to sys_reboot
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/char/ipmi/ipmi_devintf.ko
description:    Linux device interface for the IPMI message handler.
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/char/ipmi/ipmi_si.ko
description:    Interface to the IPMI driver for the KCS, SMIC, and BT system interfaces.
filename:       /lib/modules/6.2.16-5-pve/kernel/drivers/char/ipmi/ipmi_ssif.ko
description:    IPMI driver for management controllers on a SMBus

No errors on:
Code:
modprobe ipmi_devintf
modprobe ipmi_msghandler
modprobe ipmi_poweroff
modprobe ipmi_si
modprobe ipmi_ssif
modprobe ipmi_watchdog

Manually adding the modules to /etc/modules doesn't seem to make a difference.

apt purge ipmitool and re-installing latest ( ipmitool_1.8.19-6_amd64.deb) or going back to the previous version -4 doesn't help.

IPMIUTIL shows:
Code:
ipmiutil health -x
ipmiutil health ver 3.18
ipmi_open: driver type =
ipmi_open_mv: cannot open /dev/ipmi/0
ipmi_open_mv: cannot open /dev/ipmi0
ipmi_open_mv: cannot open /dev/ipmidev0
ipmi_open_mv: cannot open /dev/ipmidev/0
imbapi ipmi_open_ia: open(/dev/imb) failed, No such file or directory
smbios: Driver=7(KCS), sa=20, Base=0x0ca2, Spacing=1
BMC KCS Initialized at 0x0ca2
ipmidir Cmd=01 NetFn=06 Lun=00 Sa=20 Data(0):
Send Netfn=06 Cmd=01, raw: 00 20 18 01
ipmidir Resp(1,1): status=-2 cc=00, Data(250):
open_direct: ProcessMessage(KCS) error = -2
ipmidir Cmd=01 NetFn=06 Lun=00 Sa=20 Data(0):
Send Netfn=06 Cmd=01, raw: 00 20 18 01
ipmidir Resp(1,1): status=-2 cc=00, Data(250):
open_direct: status=-400, KCS drv, ipmi=0
ipmi_open rc = -16 type =
Driver type , open rc = -16
Cannot open an IPMI driver: /dev/imb, /dev/ipmi0, /dev/ipmi/0,
     or direct driverless.
ipmiutil health, cannot open IPMI driver

What else can be tried to fix iLO/IPMI and solve this issue?

Can anybody confirm or rule out, the problem was caused be the upgrade from pve Kernel 6.2.16-4 to 6.2.16-5?

Cheers,
~R.
 
Last edited:
Thank you for you responses.

Generally speaking you might have to reset iLO.
Do you use a dedicated IPMI port?
iLO has its own IP and own/dedicated LAN connection.
RESET via ipmitool mc reset cold from pve host does not work as outlined above.

So, you cannot access your ILO anymore besides running PVE or not? Have you tried booting your old, working kernel?
PVE is fine, only the iLO is "down".
I also thought of trying the old Kernel, but as I do not have proper remote access neither physical access to the box, and in consequence might "lose" the box if anything goes wrong, I'd rather not try that until I get to the hosting facility.

...but even if: what would it mean, if iLO/IPMI works w. the 6.2.16-4 kernel?
 
Since iLO has it's own port, it should work independently from OS
[...]it should not be affected by host OS.
It "should" yes, but as it uses drivers from the host it still might be affected...
It is quite unlikely, that the kernel-update did NOT cause the outage, yet it is not positively confirmed.
That's why I started this thread.

Are there any other users out there w. a hp iLO100 and the PVE 6.2.16-5 or 6.2.16-6 kernel?

The issue most likely has nothing to do with PVE.
On what do you base this assumption? :°)

But thanks for the train of thought, which made me try resetting it from an independent remote machine:

Code:
:~# ipmitool -H 201.***.***.*** -U *** -P *** bmc reset cold
Error: Unable to establish LAN session
Error: Unable to establish IPMI v1.5 / RMCP session
 
It is quite unlikely, that the kernel-update did NOT cause the outage, yet it is not positively confirmed.
That would really be a shitty thing if those two would be related and totally useless as a server. If this is reproducible, I'd contact the system vendor to get it fixed or swap out the hardware for something more stable.

I encountered a broken management controller multiple times and often you have to reset it by pressing the product-related button on the hardware to get it to work again. Try keeping up to date with your controller software yet in the end, it's just hardware that can and will fail.
 
o.k. - we reproduced the setup locally w. identical server hardware incl. identical iLO100:

IMPITOOL works fine!

Conclusion and to answer the question in the subject of this thread:

Kernel (upgrade to) 6.2.16-5 did NOT kill IPMI/iLO!

Apparently it is/was a sad coincidence, that IPMI went belly-up right when we updated the kernel.
Will travel to hosting facility occasionally and see if/how it can be fixed, maybe by replacing the iLO100-Board.

Apologies to the proxmox team, many thanks for your great work! Keep it up! :)

</close>
 
Hi!

I saw that, some HPE "server" has shared port with the onboard NIC and the ILO ( it does not have separate port, only 1-physical port "bridged" together, like an internal switch).

The network-switch maybe blocking the ILO port.
 
afaik, when using shared port for bmc, you may not reach bmc from the host os. have seen this with other bmc besides ilo.
Yes, I experienced this with every management board. Always connect the decicated port to a switch in order to reach it and unfortunately, I also had to hard-reset the management board (different brands) via the hardware reset button / button press pattern in order to reset it.
 
In this case the iLO100 has a dedicated Ethernet Port/Network Connection on a sep. Board w. a dedicated IP.
Same setup as before Kernel-Update, i.e. the setup that worked before :°)
Will report back as soon as we have news.
 
In this case the iLO100 has a dedicated Ethernet Port/Network Connection on a sep. Board w. a dedicated IP.
That is always the case, but you can configure the ILO port to change it to the first NIC in the system, so that it can technically run on two physical ports. Accessing it on the shared port while using the shared port is not possible (@RolandK already pointed this out and I concur).
 
  • Like
Reactions: Riesling.Dry
o.k problem solved! :D
iLO was reset by shutting down the server and removing power (unplugging it) for a couple of minutes.
As soon it was plugged back in, iLO was accessible again, server booted up fine and everything is back to normal.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!