Advice needed - Upgrade PVE 8 to 9 using systemd-boot with ZFS rpool, UEFI, and no secure-boot

TheDJ

New Member
Nov 10, 2024
8
0
1
I planned to upgrade from PVE 8 to 9 this weekend. In preparation, I ran pve8to9 --full and received the bootloader info message:

Code:
INFO: Checking bootloader configuration...
   
FAIL: systemd-boot meta-package installed. This will cause problems on upgrades of other boot-related packages. Remove 'systemd-boot' See https://pve.proxmox.com/wiki/Upgrade_from_8_to_9#sd-boot-warning for more information.

So I investigated the issue further and found this post in the forum which led to more confusion on my side: all of the criteria listed there are true for my installation (running zfs on root, UEFI boot and no secure boot). efibootmgr also reported/reports systemd-boot (see below). Based on further investigation, I still thought that the message should not appear if its unsafe to remove the package. I apt removed it successfully (also purging the message in the checklist) and proceeded with the upgrade. After a reboot, PVE 9 booted successfully - however using the old 6.8.12 kernel. Because I could not fix this situation, I reset the rpool to a previous snapshot. So I am back on PVE 8.4.17 with 6.8.12-2-pve kernel.

Before I run another upgrade attempt, I was wondering, how to deal with this situation now: should I remove systemd-boot or not before the upgrade? If I leave systemd-boot on the system now, how should I deal with future upgrades? Do I just leave it on there until the end of time? Can/should I transfer to GRUB?

These are my current outputs:
Bash:
~# proxmox-boot-tool status
Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi

~# efibootmgr -v
BootCurrent: 0000
Timeout: 3 seconds
BootOrder: 0000,0001
Boot0000* Linux Boot Manager    HD(2,GPT,428ad388-5235-43e8-89a4-ead630164050,0x800,0x200000)/File(\EFI\systemd\systemd-bootx64.efi)
Boot0001* UEFI OS       HD(2,GPT,428ad388-5235-43e8-89a4-ead630164050,0x800,0x200000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
 
As far as I can remember, I followed the exact commands from https://pve.proxmox.com/wiki/Upgrade_from_8_to_9

Code:
apt update
apt dist-upgrade
=======
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list.d/pve-enterprise.list
=======
apt update
=======
apt dist-upgrade

During the last apt dist-upgrade (during the config file changes) my stdout was unfortunately spammed with unrelated messages, so I believe I sent out Ctrl-C commands. However, they did not stop the process, and when I noticed, it was about accepting the configuration changes, I selected "Yes" for everything.
The apt dist-upgrade finished successfully, and I ran pve8to9 again. As far as I can tell, there was no FAIL or WARN in that run. Then I rebooted. After the reboot, pve8to9 reported PVE 9, but also reported the old kernel.

Based on my further reading today, I assume that a new attempt should be somewhat safe™ (because if it failed catastrophically, my bootloader would not even have booted PVE 9) - following the commands exactly and reviewing every config change.

For clarification: pve8to9 would have warned me prior to that final reboot if systemd-boot-efi and systemd-boot-tools would have been required, right?
Based on that further reading, I guess, both packages are probably required in my case.
Or would it only warn me after the PVE 9 reboot?
 
I retried the upgrade with the exact same outcome: booted into PVE9 with old kernel. This time, I made sure to execute the steps mentioned above. During the source list update, I was warned that /etc/apt/sources.list.d/pve-install-repo.list also needed the trixie sources - which I updated accordingly. After removing systemd-boot pve8to9 remained green at all times (I ran it after each command and also prior to the reboot). I was never prompted to install systemd-boot-efi and/or systemd-boot-tools

However, I did get an error, which I unfortunately did not copy from the terminal. It only flashed briefly and did not stop the upgrade.
This was during the initramfs steps, and read something like "cannot determine root device rpool/pve-1..." (I really don't remember the exact wording unfortunately)
So, I guess that I already have a problem with initramfs in my current PVE8, which somehow is masked by systemd-boot.
I never had problems with kernel updates in PVE8 (although I might have missed this error previously).

How can I investigate and troubleshoot this?
Should I try to update the kernel in PVE8 and migrate to PVE9 afterwards?

EDIT:
Because I was curious, I tried bootctl status in the PVE8 environment (after resetting the rpool to a previous snapshot). Maybe this helps (obfuscated at some parts by me)?
The Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi. at the top seems weird to me.

Code:
~# bootctl status
Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi.
Alternatively, use --esp-path= to specify path to mount point.
System:
      Firmware: UEFI 2.80 (American Megatrends 5.27)
 Firmware Arch: x64
   Secure Boot: disabled (disabled)
  TPM2 Support: yes
  Boot into FW: supported

Current Boot Loader:
      Product: systemd-boot 252.22-1~deb12u1
     Features: ✓ Boot counting
               ✓ Menu timeout control
               ✓ One-shot menu timeout control
               ✓ Default entry control
               ✓ One-shot entry control
               ✓ Support for XBOOTLDR partition
               ✓ Support for passing random seed to OS
               ✓ Load drop-in drivers
               ✓ Support Type #1 sort-key field
               ✓ Support @saved pseudo-entry
               ✓ Support Type #1 devicetree field
               ✓ Boot loader sets ESP information
          ESP: /dev/disk/by-partuuid/428ad388-***********************4050
         File: └─/EFI/systemd/systemd-bootx64.efi

Random Seed:
 Passed to OS: no
 System Token: not set

Boot Loaders Listed in EFI Variables:
        Title: Linux Boot Manager
           ID: 0x0000
       Status: active, boot-order
    Partition: /dev/disk/by-partuuid/428ad388-***********************4050
         File: └─/EFI/systemd/systemd-bootx64.efi

        Title: UEFI OS
           ID: 0x0001
       Status: active, boot-order
    Partition: /dev/disk/by-partuuid/428ad388-***********************4050
         File: └─/EFI/BOOT/BOOTX64.EFI
 
Last edited:
However, I did get an error, which I unfortunately did not copy from the terminal. It only flashed briefly and did not stop the upgrade.
This was during the initramfs steps, and read something like "cannot determine root device rpool/pve-1..." (I really don't remember the exact wording unfortunately)
So, I guess that I already have a problem with initramfs in my current PVE8, which somehow is masked by systemd-boot.
I never had problems with kernel updates in PVE8 (although I might have missed this error previously).
I managed to "recreate" the error on my running PVE8:

Code:
~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.8.12-20-pve
cryptsetup: ERROR: Couldn't resolve device rpool/ROOT/pve-1
cryptsetup: WARNING: Couldn't determine root device
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..

Regarding cryptsetup: I do have two LUKS-encrypted devices in the machine, that get unlocked automatically on boot via /etc/crypttab.
However, the boot disk with proxmox is not encrypted (so it actually should not be impacted by cryptsetup). I know this is not providing the intended security at the moment, but this is out of scope for the post.
 
I tried to investigate further, but I didn't come closer to a solution: apt dist-upgrade installs the new kernel and it is reported as available by proxmox-boot-tool kernel list
But - and this is the weird part - it is not selected for (re-)booting. I also tried to pin the kernel, but this did not help.
The boot splash/selection screen also only reports old kernels. So proxmox-boot-tool knows about the (new) kernel, but does not seem to update the boot process. I don't know enough about bootloaders to fix this issue.

In further investigation, I now also noticed that somehow, my old PVE8 installation is ALSO not using the newest available 6.8. kernel:

Code:
~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.8.12-18-pve
6.8.12-20-pve
6.8.12-2-pve

while
Code:
~# pveversion
pve-manager/8.4.17/c8c39014680186a7 (running kernel: 6.8.12-2-pve)

I have a feeling that somehow in the past the boot partition was changed. nvme2n1 is the boot SSD and the other two NVMEs are the ZFS pool devices mentioned above.
bootctl seems to see the nvme2n1p2 as the boot partition (see previous posts). But shouldn't this be the zd32p1 instead? As mentioned above: I don't know enough about bootloaders to fix this myself.

Code:
~# lsblk -o NAME,FSTYPE,LABEL,UUID,PARTUUID,MOUNTPOINT
NAME                FSTYPE      LABEL       UUID                                 PARTUUID                             MOUNTPOINT
zd0
zd16
zd32
├─zd32p1            vfat                    87E3-1814                            dc50b92e-***********************ecc2
├─zd32p2                                                                         dc52c722-***********************ecc2
├─zd32p3                                                                         dc55587c-***********************ecc2
└─zd32p4            zfs_member  zroot       1209************4500                 dc57944a-***********************ecc2
zd48
├─zd48p1            ext4                    443e9b2d-***********************fb0e 5789***3-01
├─zd48p2                                                                         5789***3-02
└─zd48p5            swap                    3256361b-***********************681d 5789***3-05
zd64
nvme1n1
└─nvme1n1p1         crypto_LUKS             3102dc0c-***********************6583 d8f5f490-***********************e38a
  └─luks_secndnvme1 zfs_member  secondpool  7610***********0817
nvme0n1
└─nvme0n1p1         crypto_LUKS             ca058e6c-***********************fb44 2faa0637-***********************52bc
  └─luks_secndnvme2 zfs_member  secondpool  7610***********0817
nvme2n1
├─nvme2n1p1                                                                      781a2b76-***********************43ab
├─nvme2n1p2                                                                      428ad388-***********************4050
└─nvme2n1p3         zfs_member  rpool       5294***********0075                  f8cd4961-***********************b263

Any help would be appreciated.