PVE 7.3-3: Configuring grub-pc: GRUB failed to install to the following devices

verulian · Feb 27, 2023

Was doing a "routine" update to the system after having not updated things for a few months, but am on PVE 7.3-3.

Hit the following message:

For searchability, here's the body of text:

Code:

Configuring grub-pc:

GRUB failed to install to the following devices:
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf

Do you want to continue anyway? If you do, your computer may not start up properly.

Writing GRUB to boot device failed continue?

<Yes>     <No>

What should one do in this situation - I can't do anything that may provoke something serious and prevent booting at this point, but I'd like to be sure I also diagnose and resolve this situation properly.

verulian · Mar 14, 2023

Just been allowing the system to run without a reboot for fear that something might blow up. I did, find some additional info that I thought I should add here. The following clearly indicates there's a storage space problem for the ESP volume:

Code:

[root@pve1 ~]$ dpkg --configure -a
Setting up initramfs-tools (0.140) ...
update-initramfs: deferring update (trigger activated)
Setting up pve-kernel-5.15.85-1-pve (5.15.85-1) ...
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 5.15.85-1-pve /boot/vmlinuz-5.15.85-1-pve
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 5.15.85-1-pve /boot/vmlinuz-5.15.85-1-pve
update-initramfs: Generating /boot/initrd.img-5.15.85-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/cmdline found - falling back to /proc/cmdline
Copying and configuring kernels on /dev/disk/by-uuid/C621-B008
        Copying kernel 5.13.19-6-pve
        Copying kernel 5.15.74-1-pve
        Copying kernel 5.15.85-1-pve
cp: error writing '/var/tmp/espmounts/C621-B008/initrd.img-5.15.85-1-pve': No space left on device
run-parts: /etc/initramfs/post-update.d//proxmox-boot-sync exited with return code 1
run-parts: /etc/kernel/postinst.d/initramfs-tools exited with return code 1
Failed to process /etc/kernel/postinst.d at /var/lib/dpkg/info/pve-kernel-5.15.85-1-pve.postinst line 19.
dpkg: error processing package pve-kernel-5.15.85-1-pve (--configure):
 installed pve-kernel-5.15.85-1-pve package post-installation script subprocess returned error exit status 2
dpkg: dependency problems prevent configuration of pve-kernel-5.15:
 pve-kernel-5.15 depends on pve-kernel-5.15.85-1-pve; however:
  Package pve-kernel-5.15.85-1-pve is not configured yet.

dpkg: error processing package pve-kernel-5.15 (--configure):
 dependency problems - leaving unconfigured
Processing triggers for initramfs-tools (0.140) ...
update-initramfs: Generating /boot/initrd.img-5.15.85-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/cmdline found - falling back to /proc/cmdline
Copying and configuring kernels on /dev/disk/by-uuid/C621-B008
        Copying kernel 5.13.19-6-pve
        Copying kernel 5.15.74-1-pve
        Copying kernel 5.15.85-1-pve
cp: error writing '/var/tmp/espmounts/C621-B008/initrd.img-5.15.85-1-pve': No space left on device
run-parts: /etc/initramfs/post-update.d//proxmox-boot-sync exited with return code 1
dpkg: error processing package initramfs-tools (--configure):
 installed initramfs-tools package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 pve-kernel-5.15.85-1-pve
 pve-kernel-5.15
 initramfs-tools

Yet when I do some more digging it says this server is using legacy-bios, which I don't recall the details from when I set it up years ago, but it is running on a Dell R710 unit with a kind of strange BIOS as I recall:

Code:

[root@pve1 ~]$ proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with legacy bios
C621-B008 is configured with: uefi (versions: 5.3.18-3-pve, 5.4.41-1-pve), grub (versions: 5.13.19-6-pve, 5.15.35-1-pve, 5.15.74-1-pve, 5.15.85-1-pve)
C622-3C13 is configured with: uefi (versions: 5.3.18-3-pve, 5.4.41-1-pve), grub (versions: 5.13.19-6-pve, 5.15.35-1-pve, 5.15.74-1-pve)
C622-C74E is configured with: uefi (versions: 5.3.18-3-pve, 5.4.41-1-pve), grub (versions: 5.13.19-6-pve, 5.15.35-1-pve, 5.15.74-1-pve)
C623-9E17 is configured with: uefi (versions: 5.3.18-3-pve, 5.4.41-1-pve), grub (versions: 5.13.19-6-pve, 5.15.35-1-pve, 5.15.74-1-pve)
C624-24D0 is configured with: uefi (versions: 5.3.18-3-pve, 5.4.41-1-pve), grub (versions: 5.13.19-6-pve, 5.15.35-1-pve, 5.15.74-1-pve)
C624-F719 is configured with: uefi (versions: 5.3.18-3-pve, 5.4.41-1-pve), grub (versions: 5.13.19-6-pve, 5.15.35-1-pve, 5.15.74-1-pve)

Even though this is saying this, it might be on an old UEFI system.

I'm a little bit afraid to do as @oguz suggested here without getting more details / specifics to know how this is actually functioning:
https://forum.proxmox.com/threads/dpkg-hanging-when-upgrading-pve-kernel.95077/#post-412898

Any thoughts or suggestions on how to determine more details before deleting kernels or going further in trying to resolve the situation?

verulian · Mar 22, 2023

Anyone with some thoughts here? I really want to be sure I have a safety net in place before rebooting the server - quite scared/worried.

fabian · Mar 22, 2023

you can either mount the ESPs manually one by one and clean them up (remove at least the uefi part, since you are booting in legacy mode), or you can reformat them with proxmox-boot-tool. both variants should solve your "running out of space" issue, and allow package configuration to finish.

verulian · Apr 1, 2023

Thank you for your suggestion @fabian. Unfortunately this seemed to be a lot more involved than I had hoped and took me a bit of poking around to - maybe - do this correctly. I believe that what I have found must be a bug OR some kind of edge-case that we need to explore further.

Again, I don't know if what I've done here is right and I would very much appreciate a review and some input to figure out how to truly and properly resolve this in context of the "Proxmox Way"... Especially in question is if I'm moving enough of the "superfluous" kernels out of the way and whether or not I'm doing it the "right" way... and in a reproducible way since this appears as though it will happen AGAIN in the future...

Code:

# 1. List block devices to get all `vfat` volumes
lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL


# 2. Create mount points for each drive
sudo mkdir /mnt/sda2 /mnt/sdb2 /mnt/sdc2 /mnt/sdd2 /mnt/sde2 /mnt/sdf2


# 3. Mount the drives
sudo mount /dev/sda2 /mnt/sda2
sudo mount /dev/sdb2 /mnt/sdb2
sudo mount /dev/sdc2 /mnt/sdc2
sudo mount /dev/sdd2 /mnt/sdd2
sudo mount /dev/sde2 /mnt/sde2
sudo mount /dev/sdf2 /mnt/sdf2


# 4. Check the mounted drives
df -h /mnt/sda2 /mnt/sdb2 /mnt/sdc2 /mnt/sdd2 /mnt/sde2 /mnt/sdf2


# 5. Create temporary directories for old kernels vs rm'ing them
mkdir -p ~/tmp_old_kernels/sda2 ~/tmp_old_kernels/sdb2 ~/tmp_old_kernels/sdc2 ~/tmp_old_kernels/sdd2 ~/tmp_old_kernels/sde2 ~/tmp_old_kernels/sdf2


# 6. Get list of all the seemingly superfluous kernels from the volumes that are out of space
ls /mnt/sda2 /mnt/sdb2 /mnt/sdc2 /mnt/sdd2 /mnt/sde2 /mnt/sdf2


# 7. Move old kernels to the temporary directories
# Repeat this step for all drives, replacing X with the drive letter (sda2, sdb2, sdc2, sdd2, sde2, and sdf2)


mv /mnt/sdX2/initrd.img-5.13.19-6-pve ~/tmp_old_kernels_sdX2/
mv /mnt/sdX2/vmlinuz-5.13.19-6-pve ~/tmp_old_kernels_sdX2/
mv /mnt/sdX2/initrd.img-5.15.35-1-pve ~/tmp_old_kernels_sdX2/
mv /mnt/sdX2/vmlinuz-5.15.35-1-pve ~/tmp_old_kernels_sdX2/


# NOTE: I think there are more that should be moved, but these at least get us back to having at least a LITTLE space....


# 8. Verify volume space available now for each
df -h /mnt/sda2 /mnt/sdb2 /mnt/sdc2 /mnt/sdd2 /mnt/sde2 /mnt/sdf2


# 9. Try to re-execute the system's upgrade path
dpkg --configure -a

I suspect I can generalize the above into a script, but I don't know if / think this should really be necessary:

Code:

#!/usr/bin/env bash

# Get all vfat volumes
volumes=($(lsblk -o NAME,FSTYPE | awk '/vfat/ {gsub(/\|-/,""); print $1}'))

# Ensure the temporary directory for old kernels exists
mkdir -p ~/tmp_old_kernels

for volume in "${volumes[@]}"; do
    echo "Working on volume /dev/${volume}..."

    # Create and mount the volume's mount point
    mkdir -p /mnt/$volume
    mount /dev/$volume /mnt/$volume

    # Create a temporary directory for this volume's old kernels
    mkdir -p ~/tmp_old_kernels/$volume

    # Get all the kernel files, oldest first
    vmlinuz_files=($(ls -tr /mnt/$volume | grep -E '^vmlinuz'))
    initrd_files=($(ls -tr /mnt/$volume | grep -E '^initrd'))

    # Move older vmlinuz files
    if [ ${#vmlinuz_files[@]} -gt 2 ]; then
        for i in $(seq 0 $((${#vmlinuz_files[@]} - 3))); do
            echo "• Moving /mnt/$volume/${vmlinuz_files[$i]} to ~/tmp_old_kernels/$volume/"
            mv /mnt/$volume/${vmlinuz_files[$i]} ~/tmp_old_kernels/$volume/
        done
    fi

    # Move older initrd files
    if [ ${#initrd_files[@]} -gt 2 ]; then
        for i in $(seq 0 $((${#initrd_files[@]} - 3))); do
            echo "• Moving /mnt/$volume/${initrd_files[$i]} to ~/tmp_old_kernels/$volume/"
            mv /mnt/$volume/${initrd_files[$i]} ~/tmp_old_kernels/$volume/
        done
    fi

    # Unmount the current volume we just operated on
    umount /mnt/$volume

    echo "------------------------------------------------------------------"
done

# Re-execute the system's upgrade path
dpkg --configure -a

verulian · Apr 2, 2023

Whelp, even after that I spoke too soon. I got some aspects to work, but then I ran out of drive space again after I did another apt update; apt upgrade. It turns out that I just keep hitting this wall. I thought I'd attempt to purge the old kernels, but then still hit a flipping wall:

Code:

apt-get purge pve-kernel-5.3.18-2-pve
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  pve-headers-5.15.35-1-pve pve-headers-5.15.74-1-pve pve-headers-5.15.85-1-pve sse3-support
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  pve-kernel-5.13.19-1-pve pve-kernel-5.3.18-2-pve*
0 upgraded, 0 newly installed, 2 to remove and 2 not upgraded.
4 not fully installed or removed.
After this operation, 689 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 217285 files and directories currently installed.)
Removing pve-kernel-5.13.19-1-pve (5.13.19-3) ...
Examining /etc/kernel/postrm.d.
run-parts: executing /etc/kernel/postrm.d/initramfs-tools 5.13.19-1-pve /boot/vmlinuz-5.13.19-1-pve
update-initramfs: Deleting /boot/initrd.img-5.13.19-1-pve
run-parts: executing /etc/kernel/postrm.d/proxmox-auto-removal 5.13.19-1-pve /boot/vmlinuz-5.13.19-1-pve
run-parts: executing /etc/kernel/postrm.d/zz-proxmox-boot 5.13.19-1-pve /boot/vmlinuz-5.13.19-1-pve
Re-executing '/etc/kernel/postrm.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/cmdline found - falling back to /proc/cmdline
Copying and configuring kernels on /dev/disk/by-uuid/C621-B008
    Copying kernel 5.15.102-1-pve
    Copying kernel 5.15.74-1-pve
    Copying kernel 5.15.85-1-pve
    Copying kernel 5.3.18-3-pve
cp: error writing '/var/tmp/espmounts/C621-B008/initrd.img-5.3.18-3-pve': No space left on device
run-parts: /etc/kernel/postrm.d/zz-proxmox-boot exited with return code 1
Failed to process /etc/kernel/postrm.d at /var/lib/dpkg/info/pve-kernel-5.13.19-1-pve.postrm line 14.
dpkg: error processing package pve-kernel-5.13.19-1-pve (--remove):
 installed pve-kernel-5.13.19-1-pve package post-removal script subprocess returned error exit status 1
dpkg: too many errors, stopping
Errors were encountered while processing:
 pve-kernel-5.13.19-1-pve
Processing was halted because there were too many errors.
E: Sub-process /usr/bin/dpkg returned an error code (1)

Ultimately I plan on purging all of these:

Code:

apt-get purge \
pve-kernel-5.3.18-2-pve \
pve-kernel-5.3.18-3-pve \
pve-kernel-5.4.41-1-pve \
pve-kernel-5.4.143-1-pve \
pve-kernel-5.13.19-1-pve \
pve-kernel-5.13.19-6-pve \
pve-kernel-5.15.35-1-pve \
pve-kernel-5.15.85-1-pve

It just seems I cannot get enough free space to actually do anything here....

But I kept on pressing and did find some additional older kernel files that seemed to be consuming a good amount of space under each sda2 through sdf2:

Code:

/mnt/sd*2/EFI/proxmox/5.3.18-3-pve:
initrd.img-5.3.18-3-pve  vmlinuz-5.3.18-3-pve

/mnt/sd*2/EFI/proxmox/5.4.41-1-pve:
initrd.img-5.4.41-1-pve  vmlinuz-5.4.41-1-pve

I proceeded to move these folders to a backup location as well.

I then was able to execute the following seemingly successfully:

Code:

apt-get purge pve-kernel-5.3.18-2-pve
apt-get purge pve-kernel-5.3.18-3-pve
apt-get purge pve-kernel-5.4.41-1-pve
apt-get purge pve-kernel-5.4.143-1-pve
apt-get purge pve-kernel-5.13.19-1-pve
apt-get purge pve-kernel-5.13.19-6-pve
apt-get purge pve-kernel-5.15.35-1-pve
apt-get purge pve-kernel-5.15.85-1-pve

One thing that bothers me after all of this is that when I did an apt update and apt upgrade, I got some messages about some seeming package name changes or something:

Code:

[root@pve1 tmp]$ apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
  pve-headers-5.15.35-1-pve pve-headers-5.15.74-1-pve pve-headers-5.15.85-1-pve sse3-support
Use 'apt autoremove' to remove them.
The following packages have been kept back:
  proxmox-ve pve-kernel-helper
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
[root@pve1 tmp]$ apt upgrade proxmox-ve pve-kernel-helper
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 proxmox-kernel-helper : Breaks: pve-kernel-helper
E: Broken packages

leesteken · Apr 2, 2023

verulian said:
It just seems I cannot get enough free space to actually do anything here....

proxmox-boot-tool wants to write the current/new kernels before removing old/deleted ones, so that won't work. Do what the Proxmox staff suggested instead:

fabian said:
you can either mount the ESPs manually one by one and clean them up (remove at least the uefi part, since you are booting in legacy mode), or you can reformat them with proxmox-boot-tool. both variants should solve your "running out of space" issue, and allow package configuration to finish.

verulian · Apr 2, 2023

@leesteken if you review what I wrote above, I did indeed do that. I then had to apt-get purge each one and so forth and so on. After all of the above I finally did a dist-upgrade for good measure which seems to have worked and then I scrubbed through a lot of additional details such as just verifying that the grub config menus were sane and that systemd, etc. tested out, etc. Since this is a remote server that's not easily accessed I then prayed quite a bit and hit restarted the server and it did start backup without any seeming problems.

I still believe I have hit some kind of an edge case and I have strong feelings and worries that this is going to happen again and this is why I wrote the above script, but I will probably modify it further to also purge other folders and also analyze sizes of everything. HECK, probably Proxmox should have some kind of an ESP checking system as well for drive space just to watch for this and warn of or automatically purge before anything like this happens for such strange scenarios.

I'm happy to help and test and collaborate, but I don't see reformatting as a safe option when it feels like something could go sideways, unless there's some kind of a test option first that would show a dry run? That would be safer - or at least feel safer than nuclear suggestions.

Neobin · Apr 2, 2023

verulian said:
apt upgrade

First: Ever use only: apt full-upgrade or: apt dist-upgrade with Proxmox products!: [1].

Regarding the "No space left on device" on the ESP(s) topic: Since you now manually cleaned up once, you should not need any fancy script in the future; simply running: apt autoremove or: apt autoremove --purge on a regular basis should be (hopefully) sufficient. See here: [2], which kernels are covered by this and which are not.
But even with this, you should have an eye on the (number of) installed kernels; especially when a newer kernel series (e.g.: 5.13 -> 5.15) gets the new default and/or if you are installing the opt-in kernels and/or if you update the host regularly, but do only reboot it once in a while (so that the most recent installed kernel gets used/booted with).
(Intentionally pinning of a (older) kernel would be an additional topic, of course.)

This is, what I mainly ever use:

Bash:

apt update && apt list --upgradable && apt full-upgrade && apt autoremove --purge && apt clean

Maybe one might more like to use: apt autoclean, instead of: apt clean, if at all...

[1] https://forum.proxmox.com/threads/proxmox-ve-7-1-released.99847/post-463941
[2] https://forum.proxmox.com/threads/clean-old-kernels.42040/post-257792

verulian · Sep 22, 2024

For whatever it may be worth, I am finding that I have to run a fancy script to do this every couple updates now. Simply executing

Code:

apt update && apt list --upgradable && apt full-upgrade && apt autoremove --purge && apt clean

does not work as expected...

Something is definitely broken even when trying to do things the right way. I just don't know how to recover from this and make Proxmox behave properly on this R710 system. I suspect it is from a screwy EFI implementation that went sideways somehow since it is clearly using GRUB and these ESP volumes really seem that they shouldn't be used.

Search

Search

PVE 7.3-3: Configuring grub-pc: GRUB failed to install to the following devices

verulian

Well-Known Member

verulian

Well-Known Member

verulian

Well-Known Member

fabian

Proxmox Staff Member

verulian

Well-Known Member

verulian

Well-Known Member

leesteken

Distinguished Member

verulian

Well-Known Member

Neobin

Distinguished Member

verulian

Well-Known Member