Palerius

Active Member
May 5, 2020
32
2
28
35
Hello, community!

I am in trouble. And at the mercy of some smart people in this forum.
Here is the situation:

Server was working fine Monday morning. 2h later - everything dead. Connected a monitor to the server and was greeted by the grub rescure prompt.
"Error: attempt to read or write outside of disk 'hd0' "

ls command most of the times returns:
(lvm/pve-root) (lvm/pve-swap)

Sometimes followed by
error:failure reading sector 0xf010000 from 'hd0'
(sometimes this error even appears multiple times for one ls command)

ls command with set debug=all returns this:

Code:
disk/i386/pc/biosdisk.c:304: Read error when probing drive 0x80
(lvm/pve-root) kern/disk.c:196: Opening 'lvm/pve-root' . . .
kern/disk.c:384: hd0 read failed
kern/disk.c.384: lvm/pve-root read failed
kern/disk.c:295: Closing 'lvm/pve-root' .

(lvm/pve-swap) kern/disk.c:196: Opening 'lvm/pve-swap' . . .
kern/disk.c:384: hd0 read failed
kern/disk.c:384: lvm/pve-swap read failed
kern/disk.c:295: Closing 'lvm/pve-swap' .
disk/i386/pc/biosdisk.c:304: Read error when probing drive 0x80

Started the proxmox installer from USB to go into debug mode but that fails as well. Same for the rescue option in the installer.

SSD has been checked. It is functioning and fine. (at least showing up as healthy in another system)
Drive has an EFI partition as well as the actual partition with the system (sdb)

I already read a lot of related posts but all of them had hd0 as a return from their ls which is not the case for me.

Tbh I am kinda out of my place here. No clue where to continue from here on.
I would appreciate any help.

Further system infos:
Supermicro H8DGU-F
1x SSD > system drive
1x HDD > VM storage
4x HDD in Raid10 > cloud storage

Thanks in advance!

EDIT1:
I am able to boot into an Ubuntu live environment which confirmed the system drive being available and ok.
i mounted it successfully.
Currently trying to follow the steps on this page: https://pve.proxmox.com/wiki/Recover_From_Grub_Failure

EDIT2:
Following the instructions in the link of edit1, I am now booting into grub with ls showing me hd0 to hd5 including (hd0,gpt3), (hd0,gpt2) and (hd0,gpt1)
I keep doing some research but my guess would be to tell grub what partition it needs to load. Help still appreciated!

EDIT3:
Well, that's it for what I know. I do have grub running but can't figure out how to boot from there. I seem to fail to mount things the right way.
sdb2 being my EFI-system and sb3 my actual Linux lvm, I fail to start grub the right way.
I am finding a linuz and inited file in hd0,gpt2 which I assume to be sdb2 (eg my EFI boot partition?) but starting with the system being on hd0,gpt3 I always get an error that root is not defined or could not be mounted.
Maybe by tomorrow, someone had the time to read into this mess and has an idea.

EDIT4:
The system could be fixed and is booting up again!
As it appears the combination of a slow (apparently USB 1.0) USB connection, as well as a lot of drives in the system, results in a VERY long "update-grub" runtime. Letting it run for almost an hour I at least got output! This output was that there was a problem with requesting "udev databases".
After more research, I came across this github issue. Answer #4 gave the final change in the lvm.config that let update-grub run successfully.
Rebooted the system and everything is back as it was.

I hope whoever comes across this might find some help and comfort in this post. It is a shitty situation. Just take your time. Do your research and take it step by step!
 
Last edited:
Code:
Sometimes followed by
error:failure reading sector 0xf010000 from 'hd0'
(sometimes this error even appears multiple times for one ls command)

for me this looks like the system drive has some defect: maybe a bad cable connection or the drive is broken. check the smart info maybe?

(sth. like smartctl -A /dev/sda)
 
I appreciate the input. I'll run the smartctl as soon as I get access to the mashine today. I would assume it is fine. I actually checked the cables (since this would be a super stupid and easy fix) which seemed fine and ubuntu live showed it as healthy for what I saw.
 
smartctl reads the drive being in perfect health. No issues whatsoever.

Really the question at this point is, with

grub on sdb2
linux LVM on sdb3

how to get grub to boot the OS. The instructions from the link in my original post work except for "update-grub" which only shows a blinking cursor in the next line. No error, no progress. It just stays there forever.

I am actually confident that figuring out grub, this should get everything up and running again.
 
are there errors in the output of dmesg / journalctl? did smartctl report some relocated sectors?

update-grub afaik scans disks for installed os: when this takes very long it indicates some disk problem, too.
 
smartctl reports 0 relocated sectors.

In terms of errors for dmesg... Errors I get are:
- Can not request [mem 0xdfe9e000-0xdfe9fff] for ERST
- Can not request [mem 0cdfeb0780-0xdfeb07d3] for APEI BERT register
- Some cache error for the connected live USB (seems not important)
- More error related to the conncted USB (sdg) for fat_get_cluster
- SQUASHFS error zlib decomopression failed followed by a couple lines of unable to read blocks and fragment cache entrys

journalctl does not bring up any errors tho.

I can't say to much about it but there doesn't seem to be anything related to the sdb drive. I did see output for sdb at some point but there were no errors in that output.

about that 'update-grub' problem:
shouldn't it still be quick since I chroot into the drive and it only has to look at literally the OS it is on at that time?
I could disconnect the other drives to make sure it is not going around all the 8+ TB of drives but idk if that makes a difference. I feel like there should at least be some output.
 
Last edited:
Did you try reinstalling grub from the rescue system (with your device mounted)?
Something like grub-install --root-directory=/mnt/sdb2 /dev/sdb2
Doing a backup before couldn't hurt.
 
yes. I did grub-install from the live OS by mounting the drive (as in the link above), changing into it with chroot and using grub-install.
Starting the system does now bring up grub instead of grub rescue.

Grub finds the drive (hd0) with the 3 partitions. It can only access sdb2 which has grub as well as some vmlinuz and initrd img in it.
But giving the linux command the arg root with /dev/sdb3 does not work. neither does writing (hd0,gpt3) which should be the linux LVM.
gpt3 generally can not be used by grub and shows up as unknown file type.

If there is any way to either point grub to the right partition (and it's root directory) or get update-grub working on the live OS this should from what I understand so far, get me into the OS.
 
The problem is solved!
Changed the Thread prefix as well as added the solution to my original post. See "EDIT4".
 
  • Like
Reactions: dremeier

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!