On a 5-node cluster (all Dell servers with idrac enterprise) running fine for some months, yesterday a node had a watchdog timer expired and rebooted. Last night same issue twice, but since second reboot the logs are full of:
So, it seems it can't get irq info and therefor it runs to slow to reset the timer and thus it expires and reboots. But, why couldn't it get the irq info?
Any experience with this?
Update: seems to be a hardware related issue, I've switched to software watchdog and the problem is gone. I've asked for a replacement of my idrac module. I assume that I can switch back to hardware watchdog once it's replaced without a issue.
Code:
Jun 14 05:59:59 host01 kernel: [ 3527.075573] ipmi_si ipmi_si.0: Couldn't get irq info: c0.
Jun 14 05:59:59 host01 kernel: [ 3527.075578] ipmi_si ipmi_si.0: Maybe ok, but ipmi might run very slowly
So, it seems it can't get irq info and therefor it runs to slow to reset the timer and thus it expires and reboots. But, why couldn't it get the irq info?
Any experience with this?
Update: seems to be a hardware related issue, I've switched to software watchdog and the problem is gone. I've asked for a replacement of my idrac module. I assume that I can switch back to hardware watchdog once it's replaced without a issue.
Last edited: