Booting a ZFS root file system via UEFI - Has something changed with Proxmox VE 5.3 ?

geppi

Active Member
Nov 27, 2018
15
5
43
60
Germany
After operating a ZFS "all in one" ESXi/OmniOS/napp-it server for the past 6 years I was curious to see how ZoL and Open Source virtualization have developed in the meantime. Proxmox seemed to be a very good solution to investigate.

Since my OmniOS storage server is booting from a mirrored ZFS root I was going to use the same setup for Proxmox.

The new motherboard (Supermicro A2SDi-H-TF) did boot via UEFI by factory default. So after installing Proxmox VE 5.3 from a USB Key on a ZFS root in Raid1 configuration I was left with a system that didn't boot.

In the Proxmox VE Wiki I found the explanation and some workaround:

URL: pve.proxmox.com/wiki/Booting_a_ZFS_root_file_system_via_UEFI

Also the more general article:

URL: pve.proxmox.com/wiki/ZFS_on_Linux

does explicitly state: "It is not possible to use ZFS as root file system with UEFI boot."

However, I was curious why that is and what's possible. So here we go.


1. Changed the boot mode from UEFI to Legacy which enabled to boot the installed Proxmox VE from the mirrored ZFS root.

root@server ~ > lsblk -f -l -o NAME,PARTUUID,PARTTYPE
NAME PARTUUID --------------------------- PARTTYPE
loop0
sda
sda1 3b3d3d80-6ad3-46e0-81e9-23b4077bfb8b 21686148-6449-6e6f-744e-656564454649
sda2 07841ffd-27f8-4bf3-b948-12758255edef c12a7328-f81f-11d2-ba4b-00a0c93ec93b
sda3 c7c76cbb-a1a4-4f3c-a6fb-2beedea2b88c 6a898cc3-1dd2-11b2-99a6-080020736631
sdb
sdb1 f3d4146f-bb93-4802-b374-3003139bc8f0 21686148-6449-6e6f-744e-656564454649
sdb2 fa908cbf-e539-4206-9a5b-974dce166504 c12a7328-f81f-11d2-ba4b-00a0c93ec93b
sdb3 dd9a339d-c382-4e2f-977b-cd09ac60731e 6a898cc3-1dd2-11b2-99a6-080020736631


and

root@server ~ > gdisk -l /dev/sda
GPT fdisk (gdisk) version 1.0.1

Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sda: 976773168 sectors, 465.8 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): FA3C30F4-4F99-42A2-BC58-528691159F96
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 976773134
Partitions will be aligned on 8-sector boundaries
Total free space is 0 sectors (0 bytes)

Number Start (sector) End (sector) Size Code Name
1 34 2047 1007.0 KiB EF02
2 2048 1050623 512.0 MiB EF00
3 1050624 976773134 465.3 GiB BF01


/dev/sdb looks similar.

The Proxmox installer had obviously installed a working grub bootloader into the BIOS boot partition 1.
Interestingly it did not create a filesystem in the EFI System Partition 2. The ESP was not mountable.
The Proxmox VE system resides in the Solaris partition 3 which was confirmed by zpool status.


2. Create a rEFInd Boot Manager USB Key

While still in the Proxmox VE system booted from Legacy mode I followed the procedure in:

URL: pve.proxmox.com/wiki/Booting_a_ZFS_root_file_system_via_UEFI

to create a rEFInd Boot Manager USB Key


3. Changed the boot mode from Legacy back to UEFI and booted the system from the rEFInd Boot Manager USB Key.

For reference:
/dev/sda is /dev/disk/by-id/wwn-0x500a0751e14fcd34 the first SSD
/dev/sdb is /dev/disk/by-id/wwn-0x500a0751e14fcd79 the second SSD
/dev/sdc is the rEFInd Boot Manager USB Key

The following procedure is mostly taken from:

URL: github.com/zfsonlinux/zfs/wiki/Ubuntu-18.04-Root-on-ZFS

Create filesystems on the ESP for both SSDs:

root@server ~ > mkfs.vfat -F32 /dev/sda2
mkfs.fat 4.1 (2017-01-24)
root@server ~ > mkfs.vfat -F32 /dev/sdb2
mkfs.fat 4.1 (2017-01-24)

Generate a mount entry in /etc/fstab for the ESP and mount it:

root@server ~ > echo PARTUUID=$(blkid -s PARTUUID -o value /dev/disk/by-id/wwn-0x500a0751e14fcd34-part2) /boot/efi vfat noatime,nofail,x-systemd.device-timeout=1 0 1 >> /etc/fstab

root@server ~ > mount /boot/efi


Change from grub-pc to grub-efi-amd64:

root@server ~ > apt-get install grub-efi-amd64
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
grub-pc
The following NEW packages will be installed:
grub-efi-amd64
0 upgraded, 1 newly installed, 1 to remove and 21 not upgraded.
Need to get 73.1 kB of archives.
After this operation, 362 kB disk space will be freed.
Do you want to continue? [Y/n] Y
Get:1 http://download.proxmox.com/debian/pve stretch/pve-no-subscription amd64 grub-efi-amd64 amd64 2.02-pve6 [73.1 kB]
Fetched 73.1 kB in 0s (595 kB/s)
Preconfiguring packages ...
dpkg: grub-pc: dependency problems, but removing anyway as you requested:
pve-kernel-4.15.18-9-pve depends on grub-pc | grub-efi-amd64 | grub-efi-ia32 | grub-efi-arm64; however:
Package grub-pc is to be removed.
Package grub-efi-amd64 is not installed.
Package grub-efi-ia32 is not installed.
Package grub-efi-arm64 is not installed.
(Reading database ... 40562 files and directories currently installed.)
Removing grub-pc (2.02-pve6) ...
Selecting previously unselected package grub-efi-amd64.
(Reading database ... 40553 files and directories currently installed.)
Preparing to unpack .../grub-efi-amd64_2.02-pve6_amd64.deb ...
Unpacking grub-efi-amd64 (2.02-pve6) ...
Setting up grub-efi-amd64 (2.02-pve6) ...
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.18-9-pve
Found initrd image: /boot/initrd.img-4.15.18-9-pve
Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin
Adding boot menu entry for EFI firmware configuration
done
Processing triggers for man-db (2.7.6.1-2) ...

root@server ~ > update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-4.15.18-9-pve

root@server ~ > update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.18-9-pve
Found initrd image: /boot/initrd.img-4.15.18-9-pve
Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin
Adding boot menu entry for EFI firmware configuration
done


The following command will not only install grub for UEFI booting into the ESP but also update the EFI NVRAM with the boot entry:

root@server ~ > grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=proxmox --recheck --no-floppy
Installing for x86_64-efi platform.
Installation finished. No error reported.



Now I did shutdown and remove the rEFInd USB Key.
Powering up the system did boot into Proxmox VE via the grub bootloader in the EFI System Partition.
We definitely booted in EFI mode because the Compatibility Support Module (CSM) was disabled in the BIOS and we have access to the EFI NVRAM from within the OS:

root@server ~ > efibootmgr -v
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0000,0001
Boot0000* proxmox HD(2,GPT,07841ffd-27f8-4bf3-b948-12758255edef,0x800,0x100000)/File(\EFI\proxmox\grubx64.efi)
Boot0001* UEFI: Built-in EFI Shell VenMedia(5023b95c-db26-429b-a648-bd47664c8012)..BO


Now also copy the bootloader to the ESP of the second SSD and generate a boot entry for booting from this disk:

root@server ~ > umount /boot/efi

root@server ~ > dd if=/dev/disk/by-id/wwn-0x500a0751e14fcd34-part2 of=/dev/disk/by-id/wwn-0x500a0751e14fcd79-part2


root@server ~ > efibootmgr -c -g -d /dev/disk/by-id/wwn-0x500a0751e14fcd79 -p 2 -L "proxmox-2" -l '\EFI\proxmox\grubx64.efi'
BootCurrent: 0000
Timeout: 0 seconds
BootOrder: 0002,0000,0001
Boot0000* proxmox
Boot0001* UEFI: Built-in EFI Shell
Boot0002* proxmox-2

root@server ~ > mount /boot/efi



I can boot this system now from any of the two drives of the root mirror with the other one failed in UEFI mode.


So what is the reason that the Wiki says that it is not possible ?

Why is there only this half way workaround with the rEFInd boot stick in the Wiki ?

What will happen when there is an upgrade of Proxmox VE with a new kernel ?

Couldn't the Proxmox VE Installer write the grub EFI bootloader into the ESP partitions of all ZFS mirrors which seems not to be that much different from writing the grub-pc bootloader into the BIOS Boot Partitions of all the disks ?

As stated in the intro I'm a Proxmox rookie and also I only have a rough understanding of grub and the whole Linux boot process. So I'm afraid that I'm missing something important.
 
So what is the reason that the Wiki says that it is not possible ?

Why is there only this half way workaround with the rEFInd boot stick in the Wiki ?

What will happen when there is an upgrade of Proxmox VE with a new kernel ?

Couldn't the Proxmox VE Installer write the grub EFI bootloader into the ESP partitions of all ZFS mirrors which seems not to be that much different from writing the grub-pc bootloader into the BIOS Boot Partitions of all the disks ?

In short: The sync issue IS the main problem. Then can go a lot very wrong, so it's not done automatically.
 
Sorry, that's a little bit too short for me to understand.

What needs to be kept in sync ?
The two ESP partitions ?

Don't you also have to keep the two BIOS Boot Partitions in sync for the Legacy mode booting ?
 
What needs to be kept in sync ? The two ESP partitions ?

Yes, much more complicated than grub itself. You also have to register both ESP partitions to be able to boot them.
More on the decision is found on the forums, e.g. https://forum.proxmox.com/threads/p...m-zfs-raid1-with-uefi-only.34762/#post-170311

Don't you also have to keep the two BIOS Boot Partitions in sync for the Legacy mode booting ?

Grub just regenerates them on install/update by itself, so this is a very simple process, that grub already takes care of.
 
Thank you for the link. Now I think I understand the concerns.

If I understand it correctly my setup will be safe as long as there will be no Proxmox kernel upgrade because:
- no changes will be made to the ESP by Proxmox
- no other software will modify the ESP in my case because I'm not booting any other OS from these SSDs

However, I will have to remember to run a post installation task after a Proxmox kernel upgrade, which is:
- umount /boot/efi
- dd if=<partition that was just unmounted in the step before> of=<ESP partition of the mirror SDD>
- mount /boot/efi

That seems to be manageable. Or am I still missing something ?

Regarding the question why the Proxmox installer is not taking care of the steps required to UEFI boot from a ZFS Raid1 root, I can understand the position of Fabian not to work around the incapability of grub in that case.

However, I'm pretty sure we will see server motherboards in the near future that will no longer provide a CSM and therefore not allow Legacy mode booting anymore. I hope that the grub developers are working on a method to deal with the ESP sync problem.
If not we would lose the capability to boot from a ZFS Raid1 root sometime in the future.
 
However, I will have to remember to run a post installation task after a Proxmox kernel upgrade, which is:
- umount /boot/efi
- dd if=<partition that was just unmounted in the step before> of=<ESP partition of the mirror SDD>
- mount /boot/efi

That seems to be manageable. Or am I still missing something ?

Yes. This is exactly what you needed a decade ago with linux mdadm software raid until there was mdadm support in grub. As far as I remember, you can hook into the grub-update script and just do your offline-copy hack. I'd also include configuring uefi for the second disk.

I think that this could still work with UEFI and mdadm RAID1, as it did in the last decade. There are some howtos floating around about this, so maybe give it a try!

I hope that the grub developers are working on a method to deal with the ESP sync problem.

As far as I understand it, this is a general fault of UEFI, because there is no raid support in UEFI. But yes, hopefully there will be some support.
 
we are aware of this problem, and the next PVE major release will include some kind of solution for this (we are still evaluating different approaches).
 
we are aware of this problem, and the next PVE major release will include some kind of solution for this (we are still evaluating different approaches).
Is this still something that is planned for Proxmox 6?
 
docs are still missing, but @Stoiko Ivanov implemented a mechanism for keeping multiple ESPs synced on kernel updates, and we use this in combination with systemd-boot for EFI-enabled ZFS boot with kernels+initrds on the ESP on each (bootable) vdev.
 
Thanks for the update @fabian !

This maybe a little off topic, but related - I’m wondering whether this would enable booting PVE from a single ZFS root disk (using UEFI) and also enabling that single ZFS root disk to hold VMs and act as a replicable ZFS disk?

This would be useful for me since my homelab has some NUCs which only really have a single SSD slot so being able to run PVE and the VMs off the single SSD and being able to replicate this to another node would be very useful...
 
Thanks for the update @fabian !

This maybe a little off topic, but related - I’m wondering whether this would enable booting PVE from a single ZFS root disk (using UEFI) and also enabling that single ZFS root disk to hold VMs and act as a replicable ZFS disk?

This would be useful for me since my homelab has some NUCs which only really have a single SSD slot so being able to run PVE and the VMs off the single SSD and being able to replicate this to another node would be very useful...

yes, single disk ZFS installations are still supported with the new UEFI mechanism - although for optimal performance and stability, it is a good idea to separate OS storage and guest disk storage if possible.
 
If a single bootable ZFS disk (or at least a partition of) is able to run o/s and VMs and replicate VMs to another Node, that would be great!

Yes - Id prefer to use a second disk to split local o/s and VMs but with certain NUCs the only option for a second local disk is USB.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!