Hello, community!
I am in trouble. And at the mercy of some smart people in this forum.
Here is the situation:
Server was working fine Monday morning. 2h later - everything dead. Connected a monitor to the server and was greeted by the grub rescure prompt.
"Error: attempt to read or write outside of disk 'hd0' "
ls command most of the times returns:
Sometimes followed by
(sometimes this error even appears multiple times for one ls command)
ls command with set debug=all returns this:
Started the proxmox installer from USB to go into debug mode but that fails as well. Same for the rescue option in the installer.
SSD has been checked. It is functioning and fine. (at least showing up as healthy in another system)
Drive has an EFI partition as well as the actual partition with the system (sdb)
I already read a lot of related posts but all of them had hd0 as a return from their ls which is not the case for me.
Tbh I am kinda out of my place here. No clue where to continue from here on.
I would appreciate any help.
Further system infos:
Supermicro H8DGU-F
1x SSD > system drive
1x HDD > VM storage
4x HDD in Raid10 > cloud storage
Thanks in advance!
EDIT1:
I am able to boot into an Ubuntu live environment which confirmed the system drive being available and ok.
i mounted it successfully.
Currently trying to follow the steps on this page: https://pve.proxmox.com/wiki/Recover_From_Grub_Failure
EDIT2:
Following the instructions in the link of edit1, I am now booting into grub with ls showing me hd0 to hd5 including (hd0,gpt3), (hd0,gpt2) and (hd0,gpt1)
I keep doing some research but my guess would be to tell grub what partition it needs to load. Help still appreciated!
EDIT3:
Well, that's it for what I know. I do have grub running but can't figure out how to boot from there. I seem to fail to mount things the right way.
sdb2 being my EFI-system and sb3 my actual Linux lvm, I fail to start grub the right way.
I am finding a linuz and inited file in hd0,gpt2 which I assume to be sdb2 (eg my EFI boot partition?) but starting with the system being on hd0,gpt3 I always get an error that root is not defined or could not be mounted.
Maybe by tomorrow, someone had the time to read into this mess and has an idea.
EDIT4:
The system could be fixed and is booting up again!
As it appears the combination of a slow (apparently USB 1.0) USB connection, as well as a lot of drives in the system, results in a VERY long "update-grub" runtime. Letting it run for almost an hour I at least got output! This output was that there was a problem with requesting "udev databases".
After more research, I came across this github issue. Answer #4 gave the final change in the lvm.config that let update-grub run successfully.
Rebooted the system and everything is back as it was.
I hope whoever comes across this might find some help and comfort in this post. It is a shitty situation. Just take your time. Do your research and take it step by step!
I am in trouble. And at the mercy of some smart people in this forum.
Here is the situation:
Server was working fine Monday morning. 2h later - everything dead. Connected a monitor to the server and was greeted by the grub rescure prompt.
"Error: attempt to read or write outside of disk 'hd0' "
ls command most of the times returns:
(lvm/pve-root) (lvm/pve-swap)
Sometimes followed by
error:failure reading sector 0xf010000 from 'hd0'
(sometimes this error even appears multiple times for one ls command)
ls command with set debug=all returns this:
Code:
disk/i386/pc/biosdisk.c:304: Read error when probing drive 0x80
(lvm/pve-root) kern/disk.c:196: Opening 'lvm/pve-root' . . .
kern/disk.c:384: hd0 read failed
kern/disk.c.384: lvm/pve-root read failed
kern/disk.c:295: Closing 'lvm/pve-root' .
(lvm/pve-swap) kern/disk.c:196: Opening 'lvm/pve-swap' . . .
kern/disk.c:384: hd0 read failed
kern/disk.c:384: lvm/pve-swap read failed
kern/disk.c:295: Closing 'lvm/pve-swap' .
disk/i386/pc/biosdisk.c:304: Read error when probing drive 0x80
Started the proxmox installer from USB to go into debug mode but that fails as well. Same for the rescue option in the installer.
SSD has been checked. It is functioning and fine. (at least showing up as healthy in another system)
Drive has an EFI partition as well as the actual partition with the system (sdb)
I already read a lot of related posts but all of them had hd0 as a return from their ls which is not the case for me.
Tbh I am kinda out of my place here. No clue where to continue from here on.
I would appreciate any help.
Further system infos:
Supermicro H8DGU-F
1x SSD > system drive
1x HDD > VM storage
4x HDD in Raid10 > cloud storage
Thanks in advance!
EDIT1:
I am able to boot into an Ubuntu live environment which confirmed the system drive being available and ok.
i mounted it successfully.
Currently trying to follow the steps on this page: https://pve.proxmox.com/wiki/Recover_From_Grub_Failure
EDIT2:
Following the instructions in the link of edit1, I am now booting into grub with ls showing me hd0 to hd5 including (hd0,gpt3), (hd0,gpt2) and (hd0,gpt1)
I keep doing some research but my guess would be to tell grub what partition it needs to load. Help still appreciated!
EDIT3:
Well, that's it for what I know. I do have grub running but can't figure out how to boot from there. I seem to fail to mount things the right way.
sdb2 being my EFI-system and sb3 my actual Linux lvm, I fail to start grub the right way.
I am finding a linuz and inited file in hd0,gpt2 which I assume to be sdb2 (eg my EFI boot partition?) but starting with the system being on hd0,gpt3 I always get an error that root is not defined or could not be mounted.
Maybe by tomorrow, someone had the time to read into this mess and has an idea.
EDIT4:
The system could be fixed and is booting up again!
As it appears the combination of a slow (apparently USB 1.0) USB connection, as well as a lot of drives in the system, results in a VERY long "update-grub" runtime. Letting it run for almost an hour I at least got output! This output was that there was a problem with requesting "udev databases".
After more research, I came across this github issue. Answer #4 gave the final change in the lvm.config that let update-grub run successfully.
Rebooted the system and everything is back as it was.
I hope whoever comes across this might find some help and comfort in this post. It is a shitty situation. Just take your time. Do your research and take it step by step!
Last edited: