can't boot, modprobe stuck on "acpi:IPI0001"

athompso

Renowned Member
Sep 13, 2013
129
8
83
I noticed that one server in my cluster - the one I typically test updates on first - has been down for a little while.
(Kudos to Proxmox for making a technology where 25% of my cluster has been down for days or possibly weeks without me even noticing.)

The reason it's dead in the water is that the boot process is stuck on:
"udevd[676]: timeout: killing '/sbin/modprobe -b acpi:IPI0001:' [716]"

over and over and over and over... it never stops.


The last thing I remember doing on this server was switching from the linux bridging stuff to openvswitch, but it was working fine (even after a reboot) after I did that.
The only references I can find online to this device are a Dell R300, which is not at all what I have - these are Dell C-series blade servers, with literally nothing in common with the (much newer) R300 servers. I've tried all the various power management options in the BIOS anyway, to no avail.

I am running a 3.x kernel on this system and have been for some time...

Any ideas? I'm going to try booting a 2.x kernel next and see if I can at least get the system back up so I can update it.

-Adam
 
Workaround: if I disable the BMC's local IPMI interface, the system boots fine. IPMI-LAN still works OK but the local channel breaks. Possibly the IPMI card is starting to fail?
 
I just suddenly encountered the same problem on 2 of 4 Dell C6000 blades, after they OOPS'ed recently (presumably thanks to the leap second). No changes whatsoever to the hardware, just *poof*, kernel OOPS, no automatic reboot, and upon manually resetting them through the (functional!) ILOM, they started complaining about this.
I have seen this before on other servers with integrated BMC/ILOM/ALOM/etc. hardware, and one solution I had found in the past was to blacklist *one* of the IPMI drivers, as at that time, that server was trying to load both the KCS driver and the ...something... driver.
-Adam