We run systems which have built in support for IPMI to essentially restart themselves when they lock up.
We had an odd situation with a server where discs ran at half the speed when one of the redundant power supplies failed (rated 750w each with max utilisation reported at 372w) and Ceph was...