Server stuck in recovering journal

blancoreyes

New Member
Nov 12, 2024
3
0
1
Greetings, colleagues.
It turns out I have a problem that I haven't found a solution to by searching the Internet. I have a Dell R720 server with Proxmox v7.1-10 installed. It was working fine for a long time. However, we lost power and the server shut down. When the power came back on, the server started up but it doesn't come out of the state shown in the attached photo. Any suggestions I could make?
 

Attachments

  • 9c767de1-747d-4d9e-b55c-d4793e2fdf9c.jpg
    9c767de1-747d-4d9e-b55c-d4793e2fdf9c.jpg
    54.2 KB · Views: 12
If it gets stuck on that, you may have a bad drive, unplug the bad drive and boot from the other one.
Following your response, I performed a hardware diagnostic that DELL offers on its servers using the Lifecycle Controller. The results showed no hardware issues, so I believe the hard drives are physically working fine. It must be a file system issue. I just can't find a way to fix it.
 

Attachments

  • server-19032025.jpg
    server-19032025.jpg
    106.6 KB · Views: 4
  1. Boot to a Live DVD/USB.
  2. Use fdisk -l to determine which disk(s) have your boot volume(s)
  3. fsck -f /dev/sdaX
See what happens.

The hardware issue may not be visible with a simple check, presuming you even use all the original Dell hardware.

Basically what the error is indicating is that due to the power outage, one or more of your drives has corrupted the disk (interrupted writes etc).

This should not happen in a proper server system with functional hardware, as any professional SSD or HDD has power outage protections - those may not be functional after more than 11 years your system has existed. Your PERC S110 RAID controller in that system may be masking SMART errors, your drives are >10 years old, not sure what you are expecting, but at some point the bottom is going to fall out of it.