Sudden Boot Failure - "Grub LoadingRead Error"

Darkhog

Member
Aug 16, 2021
6
0
6
50
My proxmox stopped suddenly, and when I tried to boot it I see "Grub LoadingRead Error" when I boot. My boot disk is ZFS, and I have it across two disks. One is a WD Green 120GB disk and the other is crucial bx 240gb disk, both SSDs.

It took me a while to find the proxmox rescue boot, so before finding it I did the following -
  • tried to run from usb ubuntu livecd, then windows on that machine (call it pve01), but booting from USB was consistently hanging.
  • at this point I probably should have run proxmox rescue boot, but I didn't find that until later.
  • load ubuntu on another machine, call that ubuntu01
  • swap my two boot disks into ubuntu01
  • `zpool import -f rpool` and then scrubbed the pool (no errors).
    • I needed to do -f because I didn't export the pools first (because I couldn't boot)
  • SMART tests long/short on both disks are ok.
  • check sata cables on pve01 - all seem fine.
  • find Rescue Boot
  • Boot from proxmox install disk on ubuntu01, run rescue boot
  • I can boot into command line of proxmox, but since it is on another machine some things can't work, and I'm getting disk errors.
    • I'm not sure - are the disk errors because I have these disks in another machine, or is this my problem?
  • swap the disks back into pve01 machine
  • Try to boot - same error, "Grub LoadingRead Error"
  • Run rescue boot from proxmox 7 install disk.
    • "error" compression algorithm 68 not supported" displayed 8 times.
    • "ERROR: unable to find boot disk automatically." then "Press any key to continue..."
  • Run rescue boot from proxmox 8 install disk (I think I have 7 but honestly don't know - maybe I updated to 8).
    • "error: compression algorithm inherit not supported" displayed 8 times
    • "ERROR: unable to find boot disk automatically." then "Press any key to continue..."
Other noteworthy things about my setup -
  • I do have PCIe passthrough configured. I did this about August 2021 (instructions look different now).
  • I do have a copy of my /etc/default/grub from when I did this.

I'm not sure what to do for the next step.
  • Is there a way to help it find the boot disk in rescue disk on the true host system pve01?
  • Is there some repair I can do from the command line on ubuntu01 booting into proxmox 7 rescue boot?
  • Because I'm seeing 'auto reallocate failed' in the disk section, I'm considering cloning the disk to another ssd to see if that helps?

Thanks in advance.

Disk errors when running rescue boot with disks in another machine

diskErrors.jpg

rescue boot output -
proxmox8RescueBoot.jpgproxmox7RescueBoot.jpg
 
I have made some progress. First, these error messages -
  • proxmox 7 - "error: compression algorithm 68 not supported" displayed 8 times.
  • proxmox 8 - "error: compression algorithm inherit not supported" displayed 8 times
I don't know why these occur, but if I switch to a different SATA port, they don't appear. I did this on two different machines, and for each machine changing which cable it was plugged into resolved the error, on both systems. Not a different SATA cable in the same port, but just connecting to a different port.

Edit - this doesn't always work either. It is a mystery to me when this hits or not, and it often hits.
 
Last edited:
Next, the disk is failing, so I bought a new larger SSD and logged into an ubuntu live usb boot. Then cloned the failing disk onto the new disk -

Find the failing disk and the new disk with lsblk -
- new disk is the unused sda
- my proxmox boot disk is the one with rpool, sdb.

Code:
ubuntu@ubuntu:~$ lsblk -f
NAME FSTYPE FSVER LABEL                    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
...
 <trim some things>
...
sda                                                                                            
sdb                                                                                            
├─sdb1
│                                                                                              
├─sdb2
│    vfat   FAT32                          F423-B9C1                                           
└─sdb3
     zfs_me 5000  rpool                    12034959561025552289                                
sdc  iso966 Jolie Ubuntu 24.04.1 LTS amd64 2024-08-27-16-23-26-00                              
├─sdc1
│    iso966 Jolie Ubuntu 24.04.1 LTS amd64 2024-08-27-16-23-26-00                     0   100% /cdrom
├─sdc2
│    vfat   FAT12 ESP                      3C53-CAEB                                           
├─sdc3
│                                                                                              
└─sdc4

Then clone the disk using dd. I need the noerror,nosync flags because the disk is failing and there are read failures. The input file (sorce disk, which is corrupted) is sdb. The new disk is sda (figured out above). I feel compelled to repeat the warnings I found on the internet - if you do this backwards you'll wipe out your source disk! Be careful to get if and of right!
Code:
ubuntu@ubuntu:~$ sudo dd if=/dev/sdb of=/dev/sda status=progress conv=noerror,sync 
dd: error reading '/dev/sdb': Input/output error
104+0 records in
104+0 records out
53248 bytes (53 kB, 52 KiB) copied, 0.164118 s, 324 kB/s
dd: error reading '/dev/sdb': Input/output error
104+1 records in
105+0 records out
53760 bytes (54 kB, 52 KiB) copied, 0.228242 s, 236 kB/s
...
< --  SNIP - many more of these cut out  -- >
...

dd: error reading '/dev/sdb': Input/output error
18696+79 records in
18775+0 records out
9612800 bytes (9.6 MB, 9.2 MiB) copied, 8.1765 s, 1.2 MB/s
120039346688 bytes (120 GB, 112 GiB) copied, 6733 s, 17.8 MB/s
234454960+80 records in
234455040+0 records out
120040980480 bytes (120 GB, 112 GiB) copied, 6738.5 s, 17.8 MB/s
ubuntu@ubuntu:~$

Then verify that the copy happened
Code:
ubuntu@ubuntu:~$ lsblk -f
NAME FSTYPE FSVER LABEL                    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
...
<  --  SNIP  --  >
...
sda                                                                                            
├─sda1
│                                                                                              
├─sda2
│    vfat   FAT32                          F423-B9C1                                           
└─sda3
     zfs_me 5000  rpool                    12034959561025552289                                
sdb                                                                                            
├─sdb1
│                                                                                              
├─sdb2
│    vfat   FAT32                          F423-B9C1                                           
└─sdb3
     zfs_me 5000  rpool                    12034959561025552289                                
sdc  iso966 Jolie Ubuntu 24.04.1 LTS amd64 2024-08-27-16-23-26-00                              
├─sdc1
│    iso966 Jolie Ubuntu 24.04.1 LTS amd64 2024-08-27-16-23-26-00                     0   100% /cdrom
├─sdc2
│    vfat   FAT12 ESP                      3C53-CAEB                                           
├─sdc3
│                                                                                              
└─sdc4
                                                                                               
sr0                                                                                            
ubuntu@ubuntu:~$

Now I have a cloned disk. However if I boot it, there are many errors or it hangs. That makes some sense, as I know that rpool is error free, but there are errors on the boot partition causing boot to fail.
 
Now I'm stuck again - I can boot proxmox 7.4 off a USB, then Advanced, then Rescue Boot. I can log in, and get to a terminal. How do I repair grub on my corrupt disk? I also need to edit my grub file, as well as /etc/default/modules as I had edited it to get PCIe passthrough.

Here's my old Grub file -
Code:
[FONT=Calibri]root@pve:~# cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'
 
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
 
# MY NOTE - enable IOMMU which allows PCIe Passthrough per https://pve.proxmox.com/wiki/Pci_passthrough
#GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"
 
# Disable os-prober, it might add menu entries for each guest
GRUB_DISABLE_OS_PROBER=true
 
# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
 
# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console
 
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480
 
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true
 
# Disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY="true"
 
# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

root@graypve:~# [/FONT]

And changes I made to modules -
Code:
[FONT=Calibri]root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
 
root@pve:~# nano /etc/modules
root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
 
# MY NOTE ==  add iommu modules per https://pve.proxmox.com/wiki/Pci_passthrough
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

root@pve:~# [/FONT]
 
Copy which file afterwards? If I do a reinstall over the top of my existing install, will it preserve all my VMs and configs?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!