Setting up Encryption using ZFS

Astraea

Renowned Member
Aug 25, 2018
225
42
68
I have read through the various postings on the forum as it relates to encryption and ZFS and have been able to successfully encrypt the ROOT dataset of the boot disk(s) and when the system restarts it prompts for the passphrase as expected.

However, when I create another pool and subsequent dataset(s) using the remaining disk(s) I am not prompted to unlock them during boot and so Proxmox creates/uses a directory on the main file system to replicate the storage that should be mounted on the second encrypted dataset(s) and its not until I mount and unlock the dataset(s) manual does Proxmox start using the correct dataset(s) for storage. This system does not need to have any replication nor will it be in a cluster so those limitations do not effect this installation but for security I would like to have everything encrypted and not relying on the VMs to have to do the encryption within the VM.

I have tired to create a script & or service that prompts for the disks to be unlocked before Proxmox is too far in the boot process with no success, as it either gets stuck trying to unlock the dataset or skips it completely. I am not sure how to proceed from here other than possibly using a key for the subsequent dataset(s) if that would unlock them at boot or should I create something like a TrueNAS VM to manage the other disk(s).
 
  • Like
Reactions: davros1973
Out of curiosity, how have you installed on an enrypted zfs? Did you adapt the proxmox installer? Did you use the debian installer? Is ther a howto you used?

On my ubuntu system with encryptet zfs i have a small luks partition which contains the key for zfs
 
ZFS replication and ZFS native encryption is problematic. The replication used by PVE won't work with encrypted zvols/datasets so stuff like migration isn't possible without patching some stuff, as this also uses replication. See here: https://bugzilla.proxmox.com/show_bug.cgi?id=2350

Unlocking using keyfiles:

To unlock datasets using a keyfile you can create a systemd service like this:

Create service to auto unlock keyfile encrypted ZFS pools after boot​

  • create service: nano /etc/systemd/system/zfs-load-key.service
    Add there:
    Code:
    [Unit]
    Description=Load encryption keys
    DefaultDependencies=no
    After=zfs-import.target
    Before=zfs-mount.service
    
    [Service]
    Type=oneshot
    RemainAfterExit=yes
    ExecStart=/usr/bin/zfs load-key -a
    StandardInput=tty-force
    
    [Install]
    WantedBy=zfs-mount.service
  • enable service: systemctl enable zfs-load-key.service
This will unlock all datasets/zvols with a provided path to a key file at the correct time when booting.

Unlocking using passphrases:

To auto unlock datasets/zvols using a passphrase stored in a file you can run something like this:
Create script: nano /root/zfs_unlock_passphrase.sh && chmod 700 /root/zfs_unlock_passphrase.sh && chown root:root /root/zfs_unlock_passphrase.sh
Add there:
Bash:
#!/bin/bash

DATASETS=( "YourPool/SomeDataset" "YourPool/AnotherDataset")
PWDFILES=( "/path/to/file/with/passphrase/of/SomeDataset.pwd" "/path/to/file/with/passphrase/of/AnotherDataset.pwd")

unlockDataset () {
    local dataset=$1
    local pwdfile=$2
    # check if dataset exists
    type=$(zfs get -H -o value type ${dataset})
    if [ "$type" == "filesystem" ]; then
        # check if dataset isn't already unlocked
        keystatus=$(zfs get -H -o value keystatus ${dataset})
        if [ "$keystatus" == "unavailable" ]; then
            zfs load-key ${dataset} < ${pwdfile}
            # check if dataset is now unlocked
            keystatus=$(zfs get -H -o value keystatus ${dataset})
            if [ "$keystatus" != "available" ]; then
                echo "Error: Unlocking dataset '${dataset}' failed"
                return 1
            fi
        else
            echo "Info: Dataset already unlocked"
            return 0
        fi
    else
        echo "Error: No valid dataset found"
        return 1
    fi
}

unlockAll () {
    local noerror=0
    # check if number of datasets and pwdfiles are equal
    if [ ${#DATASETS[@]} -eq ${#PWDFILES[@]} ]; then
        # loop though each dataset pwdfile pair
        for (( i=0; i<${#DATASETS[@]}; i++ )); do
            unlockDataset "${DATASETS[$i]}" "${PWDFILES[$i]}"
            if [ $? -ne 0 ]; then
                noerror=1
                break
            fi
        done
    else
        echo "Error: Wrong number of datasets/pwdfiles"
        noerror=1
    fi
    # mount all datasets
    if [ $noerror -eq 0 ]; then
        zfs mount -a
    fi
    return $noerror
}

unlockAll

you can then create a systemd service so it will run the above script: nano /etc/systemd/system/zfs_unlock_passphrase.service
Add there:
Code:
[Unit]
Description=Unlocks ZFS datasets using passphrases stored in files
After=pve-guests.service

[Service]
Type=oneshot
ExecStart=/bin/bash /root/scripts/zfs_unlock_passphrase.sh
User=root

[Install]
WantedBy=multi-user.target
Enable service: systemctl enable zfs_unlock_passphrase.service

And you need to create the files that contain your passphrase, edit the script to point to these files and tell it what dataset to unlock it with. The DATASETS and PWDFILES arrays store these. First element of both arrays belongs together and so on.


Both ways worked fine here but with passphrases you might want to add a global start delay so PVE won't try to start the guests before the datasets actually got unlocked. With key files this isn't a problem.
 
Last edited:
  • Like
Reactions: davros1973
I won't need any zfs replication so that is not an issue. What would I need for a script to enter the pass phrase on the terminal window? Also I assume switching to a script like you posted and a service to call it would be as simple as changing the exec line to call the script?
 
I won't need any zfs replication so that is not an issue. What would I need for a script to enter the pass phrase on the terminal window? Also I assume switching to a script like you posted and a service to call it would be as simple as changing the exec line to call the script?
I edited my post you might want to read it again. I used a systemd service to run the script after the last step of the PVE boot.

If you just want to manually type in a passphrase you can run zfs load-key && zfs mount -a This will ask you for the passphrases of all datasets and mount them all afterwards.
 
  • Like
Reactions: davros1973
Really what I am building is a single node Proxmox install on a laptop that I will use as my administration console for my rack. I plan to have 1 Linux VM that I can use to remote into for management and a Windows XP machine for older devices that are stuck on older TLS encryption. I then want to add a GUI to Promox so that if needed in a pinch I could remote into those machines directly on the laptop and do what I need to if I am not able to remote in from my tablet or desktop.

I Do plan to backup those VMs to a TrueNAS that is separate and on proper hardware which I already have running and wish PBS supported backing up the host in their WebUI, though I know this is coming.

Back on topic, I will give that a try and see if that works, as my current solution in testing is a little messy with the NVME drive being split into 2 partitions, a small one for Proxmox with enough storage to contain a TrueNAS VM and then the rest of that drive being passed through to TrueNAS using HDD (partition in this case) pass-through and then the other HDD being passed-through using PCIe pass-through via the SATA controller. Then I have 2 shares on the TrueNAS VM using ZFS encryption with a key and shared back to Proxmox using a OVS Bridge and NFS with its own network not connected to a physical NIC.

Being this is in a laptop that is easy to pickup is why I want encryption setup and why I am also stuck with just the 2 drives without adding external drives using USB.
 
ZFS replication and ZFS native encryption is problematic. The replication used by PVE won't work with encrypted zvols/datasets so stuff like migration isn't possible without patching some stuff, as this also uses replication. See here: https://bugzilla.proxmox.com/show_bug.cgi?id=2350

... yada yada yada ...

Unlocking using passphrases:

To auto unlock datasets/zvols using a passphrase stored in a file you can run something like this:
Create script: nano /root/zfs_unlock_passphrase.sh && chmod 700 /root/zfs_unlock_passphrase.sh && chown root:root /root/zfs_unlock_passphrase.sh
Add there:
Bash:
#!/bin/bash

DATASETS=( "YourPool/SomeDataset" "YourPool/AnotherDataset")
PWDFILES=( "/path/to/file/with/passphrase/of/SomeDataset.pwd" "/path/to/file/with/passphrase/of/AnotherDataset.pwd")

unlockDataset () {
    local dataset=$1
    local pwdfile=$2
    # check if dataset exists
    type=$(zfs get -H -o value type ${dataset})
    if [ "$type" == "filesystem" ]; then
        # check if dataset isn't already unlocked
        keystatus=$(zfs get -H -o value keystatus ${dataset})
        if [ "$keystatus" == "unavailable" ]; then
            zfs load-key ${dataset} < ${pwdfile}
            # check if dataset is now unlocked
            keystatus=$(zfs get -H -o value keystatus ${dataset})
            if [ "$keystatus" != "available" ]; then
                echo "Error: Unlocking dataset '${dataset}' failed"
                return 1
            fi
        else
            echo "Info: Dataset already unlocked"
            return 0
        fi
    else
        echo "Error: No valid dataset found"
        return 1
    fi
}

unlockAll () {
    local noerror=0
    # check if number of datasets and pwdfiles are equal
    if [ ${#DATASETS[@]} -eq ${#PWDFILES[@]} ]; then
        # loop though each dataset pwdfile pair
        for (( i=0; i<${#DATASETS[@]}; i++ )); do
            unlockDataset "${DATASETS[$i]}" "${PWDFILES[$i]}"
            if [ $? -ne 0 ]; then
                noerror=1
                break
            fi
        done
    else
        echo "Error: Wrong number of datasets/pwdfiles"
        noerror=1
    fi
    # mount all datasets
    if [ $noerror -eq 0 ]; then
        zfs mount -a
    fi
    return $noerror
}

unlockAll

you can then create a systemd service so it will run the above script: nano /etc/systemd/system/zfs_unlock_passphrase.service
Add there:
Code:
[Unit]
Description=Unlocks ZFS datasets using passphrases stored in files
After=pve-guests.service

[Service]
Type=oneshot
ExecStart=/bin/bash /root/scripts/zfs_unlock_passphrase.sh
User=root

[Install]
WantedBy=multi-user.target
Enable service: systemctl enable zfs_unlock_passphrase.service

And you need to create the files that contain your passphrase, edit the script to point to these files and tell it what dataset to unlock it with. The DATASETS and PWDFILES arrays store these. First element of both arrays belongs together and so on.


Both ways worked fine here but with passphrases you might want to add a global start delay so PVE won't try to start the guests before the datasets actually got unlocked. With key files this isn't a problem.

Hi - I've just registered on the Proxmox forum just to say thanks ... this really helped me! :)

Also, though ... I noticed for your suggested zfs_unlock_passphrase.service you have a different path to zfs_unlock_passphrase.sh (has "/scripts" in there). Only an issue if following verbatim and copying/pasting without actually observing it of course. (I'm presuming there's no convention that validates it's apparent inconsistency with earlier example paths).

In my case I've just built a new desktop system for myself - first "desktop" (rather than mobile-chipset based system) I've had in years! I wanted a decent graphics card for AI stuff ... and that was fine with a cheap eGPU from AliExpress ... but I also discovered how much fun Microsoft Flight Simulator was in VR, but my eGPU and mobile-chipset NUC weren't the best companions for the video card trying to perform that function. Anyways - I wanted a "desktop PC" but I love the flexibility of Proxmox and had passed through the eGPU on my NUCs just fine for "desktop" OS's etc. rather than just RDP'ng them and figured I could do the same on the bigger machine. On my NUCs the TPM and SSD "self-encrypting" capability seemed best/easiest for encryption, and my triple-boot laptop I use LUKs under Debian 11 (with Proxmox installed) ... but for this machine ZFS seemed like the way to go, with fast NVMe's and plenty of RAM and CPU horsepower. I wanted a Windows VM to startup with graphics etc. passed-through after the ROOT dataset was decrypted and after pve got going etc.

Like Astraea I'm not looking for replication or clustering with this Proxmox instance. I love Proxmox for making my home-lab interests easier to organise and manage, and let's me make the most of the hardware I have - but I'm a Dev rather than an IT person and this stuff is just for fun or to run things at home ... and I don't really need a "cluster" running 24/7. I prefer to turn off some machines when I'm not using them. And still be able to manage Proxmox instances.

I never quite know what I'm doing; I spend most of my time in Windows - mostly 'cause of work, and I'm completely new to zfs, and I rather blindly tried following these instructions. I tried adding a couple of seconds delay for the pve instance and a couple of seconds boot delay for the VM I set to start up after pve. But my experience was that ZFS data - dataset that I'd set in the script, didn't appear to mount. I had to run the script in the GUI's shell window before my chosen VM continued starting-up. I knew the script worked etc. but it didn't seem to be happening at the "right time" by itself despite my clumsy delays. I could stop the VM start process, and then if I manually restarted it, the script would run.

I'm sure it's me not understanding lol. I'm quite inept with Linux unfortunately though I hope to redress that.
Maybe the intent of your script was to start after a manual invocation of a VM start. With no consideration of VMs set to automatically start. Or maybe I implemented it incorrectly.

What made my set-up work for me, was to not use "After=pve-guests.service" for the zfs_unlock_passphrase.service, but instead I just used After=pveproxy.service and that seemed to facilitate the behaviour I was looking for. Maybe that's hacky and I overlooked the real reason for my problems ... but anyway - that's what worked for me for my situation.

But it might have taken me a long time to get that far with this desired behaviour, what with my ignorance and time constraints , if it weren't that is, for your script posted here ... so ... thanks again. :)
 
  • Like
Reactions: Dunuin
Also, though ... I noticed for your suggested zfs_unlock_passphrase.service you have a different path to zfs_unlock_passphrase.sh (has "/scripts" in there). Only an issue if following verbatim and copying/pasting without actually observing it of course. (I'm presuming there's no convention that validates it's apparent inconsistency with earlier example paths).
Yes, that path is just an example. It doesn't really matter where you store that script, as long as every path is pointing to the same file. "/root/scripts/" is just where I personally like my scripts run by root to reside, so it's easier for me to back them up and not that problematic if a script contains passwords, as other users then can't read that script.

I never quite know what I'm doing; I spend most of my time in Windows - mostly 'cause of work, and I'm completely new to zfs, and I rather blindly tried following these instructions. I tried adding a couple of seconds delay for the pve instance and a couple of seconds boot delay for the VM I set to start up after pve. But my experience was that ZFS data - dataset that I'd set in the script, didn't appear to mount. I had to run the script in the GUI's shell window before my chosen VM continued starting-up. I knew the script worked etc. but it didn't seem to be happening at the "right time" by itself despite my clumsy delays. I could stop the VM start process, and then if I manually restarted it, the script would run.

I'm sure it's me not understanding lol. I'm quite inept with Linux unfortunately though I hope to redress that.
Maybe the intent of your script was to start after a manual invocation of a VM start. With no consideration of VMs set to automatically start. Or maybe I implemented it incorrectly.
The service should run the script once PVE has finished starting. Then the script should unlock and mount the datasets. I personally disabled autostart for all guests and then wrote a script that will start each VM once the datasets got unlocked.
What's working fine with the default autostart of PVE is using keyfiles instead of passphrases like I described here:
https://forum.proxmox.com/threads/f...s-using-proxmox-installer.127512/#post-557808
 
  • Like
Reactions: davros1973
Yes, that path is just an example. It doesn't really matter where you store that script, as long as every path is pointing to the same file. "/root/scripts/" is just where I personally like my scripts run by root to reside, so it's easier for me to back them up and not that problematic if a script contains passwords, as other users then can't read that script.


The service should run the script once PVE has finished starting. Then the script should unlock and mount the datasets. I personally disabled autostart for all guests and then wrote a script that will start each VM once the datasets got unlocked.
What's working fine with the default autostart of PVE is using keyfiles instead of passphrases like I described here:
https://forum.proxmox.com/threads/f...s-using-proxmox-installer.127512/#post-557808
Ahhhh - now I understand. I knew it was probably me being thick. :) Thnx.
 
Hi,

i'm searching about this topic in view of the NIS2 directive.

Any updates about managing zfs encryption and replication?

Would be nice to have two datasets encrypted and than replicated not using the raw so that each node has its own keys

I think this solution will require to replicate properties manually but can be a nice addition to the suite.
 
  • Like
Reactions: pvelati
I'm reading the bugzilla page and the mentioned zfs issue on github.

Someone tried to send the volume without the properties and than sync the props in a separated step?

Is a "hack" but can be a working solution to have replication on encrypted pools until zfs has a proper way of doing that