Help with UEFI boot issue after PVE 9 upgrade

Kevo

Renowned Member
May 7, 2017
60
7
73
48
After upgrading to PVE9 our backup server (hardware copy of main server which gets VM backups rsynced to it) would not boot. I managed to get it back by changing the BIOS CSM to enabled, enabling legacy boot, and setting the boot drive to the non-uefi entry in the boot priorities list.

After reading about some of the potential upgrade issues I am not sure why this occurred. I verified I didn't have systemd-boot installed and never saw any warnings for it while running the pve8to9 script. I also have the grub-efi-amd64 package installed. I'm not using zfs for the boot drive.

Running proxmox-boot-tool status shows 'E: /etc/kernel/proxmox-boot-uuids does not exist.'

I'm thinking that maybe initializing the existing EFI partition with proxmox-boot-tool might fix the issue, but I am gun shy since this server is remote and I am not certain my current legacy boot will be safe after running the tool. I don't believe these machines ever used proxmox-boot-tool.

Our main server is still on PVE8, which is booting UEFI as our backup server was, and I want to make sure I understand this issue before I attempt an upgrade on it. I have checked the main server and from everything I'm seeing I fully expect the same issue to occur on it when we attempt the upgrade.

Any ideas what happened and what I should do to fix it. I thought we had a pretty basic config on these machines as I don't think we've done anything special to the boot config since the original installation, but something appears to be unusual enough to cause this problem.

TIA
 
I managed to get a screen recording of the failed boot and I saw the welcome to grub banner right before it went into bios. Hunting around I found this thread.

https://forum.proxmox.com/threads/stuck-on-welcome-to-grub-after-update.164133/

That led to more searching and I found this page which seems to confirm this being a known potential failure.

https://github.com/jacrook/PVE8-9

I followed the advice for GRUB/Boot issues and got an error message during the reinstall. That error pointed me to manually mounting the /efi/boot partition, rerunning the reinstall for grub-efi-amd64, which led to another error about not being able to set EFI vars or something since I was booted in legacy mode.

After a reboot with manual selection in BIOS for the UEFI drive option, which successfully booted, I reinstalled grub-efi-amd64 again with no errors.

I rebooted again changed to UEFI mode only in BIOS, disabled CSM, made sure to select the right boot drive, and everything seems to be back to normal now.

Not sure why I hadn't gotten bit by this grub-efi-amd64 issue before now, but I am glad for it to be working normally at this point. I think I will wait for another kernel update and reboot cycle before upgrading my main server just to be safe.
 
Hey I think I am experiencing issues similar to yours. How did you do a reinstall? Can you do it without losing any lxc/lvm configs?
 
The two links I posted explain most of it, but you will probably have to do a bit of further research based on your particular circumstances. I'd suggest reading through that first link and checking how things match up on your system. Depending on what you find you may need to investigate what partitions are on your drives to find the correct one to mount before reinstalling the grub-efi-amd64.

Also keep in mind, I am a complete newb on bootloaders and just started looking into all this recently when this problem occurred on my backup server. Don't follow any of my advice without first double checking everything and making sure it is appropriate for what state your system is in.

None of this should have any affect on your existing vms or containers, but you should be careful to make sure you understand what the commands needed do so you don't mount a wrong partition or something during the process. Overall I thought it was pretty safe, but I also had another similarly configured machine to look at for sanity checks.

HTH
 
  • Like
Reactions: alex01763
After upgrading to PVE9 our backup server (hardware copy of main server which gets VM backups rsynced to it) would not boot. I managed to get it back by changing the BIOS CSM to enabled, enabling legacy boot, and setting the boot drive to the non-uefi entry in the boot priorities list.
Hmm we had a few similar reports - could you please share /var/log/apt/term.log (or the rotated variant that covers the upgrade from 8 to 9)?

my guess is - you still had `systemd-boot` installed while upgrading - see https://pve.proxmox.com/wiki/Upgrad...ation_automatically_and_should_be_uninstalled

I hope this helps!
 
Hmm we had a few similar reports - could you please share /var/log/apt/term.log (or the rotated variant that covers the upgrade from 8 to 9)?

my guess is - you still had `systemd-boot` installed while upgrading - see https://pve.proxmox.com/wiki/Upgrad...ation_automatically_and_should_be_uninstalled

I hope this helps!
I checked. It does not appear to have ever been installed. I couldn't find it on either machine. It appears to have been an artifact from an earlier install and incomplete install/fix of the grub-efi-amd64 package. On our backup server 'grub-efi-amd64 grub2/force_efi_extra_removable' was set to false. The newer install on our main server does not appear to have the same issue and it had that setting at true when I checked. I haven't upgraded it to 9 yet, so I won't know for sure until I get around to upgrading that one. It may be a while as it is remote as well and I probably will wait for at least one kernel upgrade on the backup server to make sure things are working properly before upgrading our main server.
 
I checked. It does not appear to have ever been installed. I couldn't find it on either machine. It appears to have been an artifact from an earlier install and incomplete install/fix of the grub-efi-amd64 package.
Hm - could I ask you for the term.log from the dist-upgrade in this case?
Maybe you ran into something we haven't considered yet - or something we could add a warning about to pve8to9 (and thus help users who yet have to upgrade)

Thanks!
 
term.log received via different channel.
from the one you sent - looking through the large output - you can find:
Code:
Installing for x86_64-efi platform.
grub-install.real: error: cannot find EFI directory.
Failed: grub-install --target=x86_64-efi  
WARNING: Bootloader is not properly installed, system may not be bootable

We had 2-3 cases of this:
* please post `proxmox-boot-tool status`
* is your ESP (/boot/efi, for systems installed with our ISO usually /dev/sdX2, /dev/nvmeXn1p2 ) mounted via /etc/fstab? - `mount |grep -i efi` should tell you
* if not - do you see anything in your journal indicating that it might need a fsck? (`fsck.vfat /dev/sdX2`)

Thanks again!
 
I've checked a handful of servers, combination of intel and amd and every one, all installed from USB of the Proxmox ISO, all respond like this to the 'proxmox-boot-tool status':

Code:
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
E: /etc/kernel/proxmox-boot-uuids does not exist.

On all servers except the two I have in use with actual server boards (ASRockRack) with ipmi the efi is mounted like you would expect. On those two it looks like they were not. The one server I haven't upgraded yet shows:

Code:
root@server1:~# mount | grep -i efi
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

The one I upgraded and "fixed" now shows:

Code:
root@backup1:~# mount | grep -i efi
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /efi type autofs (rw,relatime,fd=63,pgrp=1,timeout=120,minproto=5,maxproto=5,direct,pipe_ino=972)

I could not find anything in the journal for either machine relating to fsck on the /dev/nvme1n1p2 partition which is the EFI on both machines.

After the update on the one machine I do see a couple of related lines:

Code:
Aug 14 14:15:18 coreprox2 systemd-fsck[1883]: /dev/nvme1n1p2: 5 files, 86/261628 clusters
Aug 15 15:16:09 coreprox2 systemd-fsck[1891]: /dev/nvme1n1p2: 5 files, 88/261628 clusters

On my one other AMD server, regular cheap consumer motherboard, I find lots of logs related to fsck in the journal. Seems to go back quite a few years. They look like this.

Code:
Jul 10 12:50:19 tycho systemd-fsck[799]: /dev/nvme0n1p2: 5 files, 86/130812 clusters
-- Boot dd14b7bc5d984b0a8957881986140777 --
Jul 19 22:52:13 tycho systemd-fsck[780]: /dev/nvme0n1p2: 5 files, 86/130812 clusters
-- Boot f364d781528a413da06ec25b7885b04b --
Aug 06 10:25:52 tycho systemd-fsck[1344]: /dev/nvme0n1p2: 5 files, 88/130812 clusters
-- Boot c43ac6d457344015b07706be137605c0 --
Aug 06 10:34:29 tycho systemd-fsck[1325]: /dev/nvme0n1p2: 5 files, 88/130812 clusters
-- Boot 36545b72eda6485491ddea983e13f60c --
Aug 06 10:54:16 tycho systemd-fsck[1134]: /dev/nvme0n1p2: 5 files, 88/130812 clusters
-- Boot bfb43f9c84814d3493fa8e82a42aecce --
Aug 13 15:49:19 tycho systemd-fsck[1128]: /dev/nvme0n1p2: 5 files, 88/130812 clusters

Actually now that I see the -- Boot lines I did see those in the other servers, but not with the fsck lines associated.

When I search for the partition in journalctl I get a lot of back to back -- Boot lines, but no systemd-fsck lines. This is from the server I haven't upgraded yet.

Code:
-- Boot 060bd69af6a345479f212d0835813073 --
-- Boot 3d1350b8ce1640a3bed6eb2bb1ed0bd5 --
-- Boot 120bc85901dc4051a8c8ef9065baa63a --
-- Boot 7473792168934fb48e800c8f929fd11c --
-- Boot 6b1a0b80ff2248dc98d0c1145853ccb2 --
-- Boot a701d2bbcd4e4d0c8f87fc64fbab83f8 --
 
On all servers except the two I have in use with actual server boards (ASRockRack) with ipmi the efi is mounted like you would expect. On those two it looks like they were not. The one server I haven't upgraded yet shows:
don't think that it's related to IPMI/Server board vs. Desktop board - but maybe it's something related to when the servers were setup - or secure-boot related... - do you happen to remember when you set them up? - and does `dmesg|grep -i secure` indicate that secure-boot is in use?

anyways - if you don't use proxmox-boot-tool I would really recommend to explicitly mount the ESP at `/boot/efi` - it's what our tooling expects - the automount at /efi caused some issues for users who did by accident install (or fail to remove) systemd-boot upon upgrade of the systemd-boot package
(we created a fix for that - specific case - but there might be other ones we haven't encountered yet).
 
don't think that it's related to IPMI/Server board vs. Desktop board - but maybe it's something related to when the servers were setup - or secure-boot related... - do you happen to remember when you set them up? - and does `dmesg|grep -i secure` indicate that secure-boot is in use?
Secure boot is not in use. The main server was installed in late June or early July of 2023.

The original server which is now the backup that I upgraded was installed in June of 2021. It had a cpu failure in June of 23 and that's when we got the new server. After we got an RMA CPU we repaired the original one and put it in service as a backup. It was upgraded to match the new server at that time. I think it was version 7 of proxmox then, but I'm not certain.


anyways - if you don't use proxmox-boot-tool I would really recommend to explicitly mount the ESP at `/boot/efi` - it's what our tooling expects - the automount at /efi caused some issues for users who did by accident install (or fail to remove) systemd-boot upon upgrade of the systemd-boot package
(we created a fix for that - specific case - but there might be other ones we haven't encountered yet).
I'm not sure what the recommended config is. It doesn't seem that a default install sets up the proxmox-boot-tool. Is it recommended to be using that tool for a basic install or is that only for things liks ZFS installations or other "special" configs.

I do not have systemd-boot on my backup server, but I do have this mount
systemd-1 on /efi type autofs (rw,relatime,fd=63,pgrp=1,timeout=120,minproto=5,maxproto=5,direct,pipe_ino=972)

Are you saying I should change it so it's mounted manually from the fstab?
 
Are you saying I should change it so it's mounted manually from the fstab?
Yes! (systems not using proxmox-boot-tool should have it mounted on /boot/efi)


I'm not sure what the recommended config is. It doesn't seem that a default install sets up the proxmox-boot-tool. Is it recommended to be using that tool for a basic install or is that only for things liks ZFS installations or other "special" configs.
proxmox-boot-tool is (currently) used for all installations which are not using Ext4/XFS (with LVM) - it was initially added to handle ESPs for machines with multiple disks (and to add sensible support to UEFI systems with ZFS). Switching over to it is possible - but needs a bit of care
 
  • Like
Reactions: Kevo
in case it helps anyone... For me, I had to boot into recovery and install grub-efi-amd64.

apt install grub-efi-amd64
reboot