[TUTORIAL] Encrypted ZFS Root on Proxmox

kables100

New Member
Apr 5, 2025
1
1
1
The Proxmox installer does not provide the ability to set up an encrypted root with ZFS, so Debian with an encrypted ZFS root needs to be installed first, then Proxmox will be added on top. This guide covers some of the caveats of doing so.

First, follow the excellent guide written here by the authors of ZFS to install Debian: https://openzfs.github.io/openzfs-docs/Getting Started/Debian/Debian Bookworm Root on ZFS.html

Disk layout for reference, they are created according to the ZFS guide:

DISK1:
- part1: MBR compatibility partition, not used.
- part2: EFI partition, FAT32 formatted.
- part3: ZFS boot partition, not encrypted
- part4: ZFS root partition, encrypted via ZFS native encryption.

DISK2:
- part1: MBR compatibility partition, not used.
- part2: EFI partition, FAT32 formatted.
- part3: ZFS boot partition, not encrypted
- part4: ZFS root partition, encrypted via ZFS native encryption.

A two disk mirror setup with ZFS native encryption was chosen, it is also possible to use LUKs encryption here. If you choose ZFS native encryption, the benefit is that the encryption will only have to be calculated once for both disks, whereas LUKs will require the data to be encrypted twice. The disadvantage of using ZFS native encryption is that Proxmox migration might not be available, as reported by others:
- https://forum.proxmox.com/threads/a...ion-of-disks-on-zfs-encrypted-storage.117227/
- https://forum.proxmox.com/threads/replication-migration-encrypted-zfs-datasets.70572/
Therefore, LUKs might be a better choice.

After vanilla Debian is able to boot, follow this guide here to add Proxmox on top of Debian, but DO NOT reboot yet: https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm.

This is because the system failed to boot after multiple personal attempts after Grub got updated to use the version provided by the Proxmox repositories. The following quote provides a hint as to why:
For EFI Systems installed with ZFS as the root filesystem systemd-boot is used, unless Secure Boot is enabled. All other deployments use the standard GRUB bootloader (this usually also applies to systems which are installed on top of Debian). - https://pve.proxmox.com/wiki/Host_Bootloader (Apr 6, 2025)
The hypothesis here is that Proxmox's distribution of Grub for some reason does not supported an encrypted ZFS root or ZFS at all, therefore, systemd's boot loader must be used instead. Follow this guide here to install systemd boot to the system: https://blog.bofh.it/debian/id_465.

But in essence, all that needs to be done is:

1. `echo "root=ZFS=rpool/ROOT/debian quiet" > /etc/kernel/cmdline`
2. `apt install systemd-boot`
3. `bootctl set-timeout 4` (Optional, to make debugging easier.)
4. Find the Grub bootloader ID via `efibootmgr`, then remove it: `efibootmgr -b <ID> -B`.

After the system successfully boots from systemd boot (the default label in NVRAM is "Linux Boot Manager"), follow the rest of https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm, to get Proxmox running.

Lastly, in order to sync the boot partition across kernel updates, follow the guide here: https://pve.proxmox.com/wiki/Host_Bootloader. In essence:

1. `proxmox-boot-tool format <DISK2-part2>`
2. `proxmox-boot-tool init <DISK2-part2>`

Quite an involved process overall, but it does end with a usable system.
 
Last edited:
  • Like
Reactions: cantech
The disadvantage of using ZFS native encryption is that Proxmox migration might not be available, as reported by others

With a little patch of ZFSPoolPlugin.pm you can enable migration and replication on encrypted ZFS datasets. The procedure is described here and has been tested with most recent PROXMOX 8.4.1:
 
Thanks for this well written tutorial! I used it to set up a Proxmox 9 (Debian 13) machine. And everything works as expected. It even worked with grub, no need to replace it with systemd-boot.
 
Hi team,

I have some problems about that installation of zfs on debian 13 on a virgin dedicated server before to load up aftrer that proxmox.
After the step
Bash:
zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -o compatibility=grub2 \
    -o cachefile=/etc/zfs/zpool.cache \
    -O devices=off \
    -O acltype=posixacl -O xattr=sa \
    -O compression=lz4 \
    -O normalization=formD \
    -O relatime=on \
    -O canmount=off -O mountpoint=/boot -R /mnt \
    bpool ${DISK}-part3
after having transformed with the hint
For raidz topologies, replace <span>mirror</span> in the above command with<span>raidz</span>, <span>raidz2</span>, or <span>raidz3</span> and list the partitions from the additional disks.

the command resulted in :

Bash:
zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -o compatibility=grub2 \
    -o cachefile=/etc/zfs/zpool.cache \
    -O devices=off \
    -O acltype=posixacl -O xattr=sa \
    -O compression=lz4 \
    -O normalization=formD \
    -O relatime=on \
    -O canmount=off -O mountpoint=/boot -R /mnt \
    bpool raidz \
    /dev/disk/by-id/ata-KIOXIA-EXCERIA_SATA_SSD_Y57B60BNK0Z5-part3 \
    /dev/disk/by-id/ata-Apacer_AS350_1TB_AFLJ1400100123-part3 \
    /dev/disk/by-id/ata-Apacer_AS350_1TB_1393074A14D000011624-part3

but I have that kind of information by executing it
Bash:
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/ata-KIOXIA-EXCERIA_SATA_SSD_Y57B60BNK0Z5-part3 is part of potentially active pool 'bpool'
/dev/disk/by-id/ata-Apacer_AS350_1TB_AFLJ1400100123-part3 is part of potentially active pool 'bpool'
/dev/disk/by-id/ata-Apacer_AS350_1TB_1393074A14D000011624-part3 is part of potentially active pool 'bpool'

no output when I am trying to find an activated "pool" .
Of course I erased the disk before with the suggested command from tutorial. And rebooted a couple of times.
So the disk are completely clean before to begin this tutorial.

any idea?
 
no output when I am trying to find an activated "pool" .
Those disks were used earlier? In a "bpool"?

I did not read read the tutorial you've mentioned; you may erase them completely by something like "dd if=/dev/null of=/dev/disktobeerased bs=1M state=progress" or read man zpool-labelclear
 
found my culprit. I did not take into account the partition for bios legacy I was skipping it and so the command I was trying to do was not understood. My bad for my stupidity. Sorry.
 
  • Like
Reactions: UdoB
well in the end, on another dedicated server it still does not work as expected with an error message like this.
Bash:
root@debian:~# zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -o compatibility=grub2 \
    -o cachefile=/etc/zfs/zpool.cache \
    -O devices=off \
    -O acltype=posixacl -O xattr=sa \
    -O compression=lz4 \
    -O normalization=formD \
    -O relatime=on \
    -O canmount=off -O mountpoint=/boot -R /mnt \
> bpool raidz2 \
> /dev/disk/by-id/ata-Apacer_AS350_1TB_1393074A14D000011624-part3 \
> /dev/disk/by-id/ata-KIOXIA-EXCERIA_SATA_SSD_Y57B60BNK0Z5-part3 \
> /dev/disk/by-id/ata-Apacer_AS350_1TB_AFLJ1400100123-part3
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-id/ata-Apacer_AS350_1TB_1393074A14D000011624-part3 is part of potentially active pool 'bpool'
/dev/disk/by-id/ata-KIOXIA-EXCERIA_SATA_SSD_Y57B60BNK0Z5-part3 is part of potentially active pool 'bpool'
/dev/disk/by-id/ata-Apacer_AS350_1TB_AFLJ1400100123-part3 is part of potentially active pool 'bpool'

It does not make any sense to me. I follow the tutorial to the letter at that point. I even erased the disk and the start of each disk before a reboot so it can be probed as completely clean and nothing. I am still the same problem .
Could it be a problem with the flag because of the raid card they are connected to?
Even if I checked and plenty of people did succeed to work with zfs with that generic card from hpe.
 
It is going to sound as a stupid question but is it possible that a reboot revive an ancient bpool dataset each time? Because when I did a wipefs without a reboot, then I has been able to create the bpool dataset this time.