I am trying to auto-unlock non-root volumes at boot on PVE. The system boots from a 32gb ZFS RAID1 which is not encrypted, but VMs are to be stored on a mirror of LUKS partitions that then form another mirrored zpool. The reason for this (as some of you are no doubt aware) is that you can't replicate native ZFS encrypted VMs, and despite the caveats about needing to make sure you don't accidentally decrypt something when replicating it, I am aware of the risk and OK with managing that.
For auto-unlocking the LUKS partitions I have installed a tang server, used "clevis luks bind" to bind the LUKS partitions and can successfully unlock using the tang server, ie. every single time I run "clevis luks unlock" it works fine.
The problem is that at boot time, SOMETIMES it works and the LUKS volumes are successfully unlocked by clevis-systemd using the _netdev entries I've put in /etc/crypttab, sometimes it doesn't work and it just waits for the password so I have to KVM the server and type the password interactively to decrypt the devices.
Unfortunately one of the machines that'll be in the cluster hasn't got IPMI so I need to get it sorted before I can cluster them, otherwise that machine will just get stuck at boot and hang waiting for me to physically attend it, no bueno.
It's obviously (well, maybe I'm wrong, but it SEEMS obvious) some sort of timing issue that sometimes the crypttab is being processed before networking is available, because like I said the tang server (in fact, there's more than one of them) works perfectly - without fail, every time I interactively use "clevis luks unlock", it only ever fails to work during the PVE boot process, presumably because at the instant it tries to contact the tang server it can't reach it yet, but another boot it can.
To attempt to resolve it I've added "After=network-online.target" to /lib/systemd/system/remote-cryptsetup.target but that doesn't seem to have fixed it.
I'm really in need of someone more familiar than I am with systemd and the Debian boot process to point me in the right direction as to how I can either make it wait until some event like a target or service in systemd being reached, or even just introduce a fixed delay - although that would be a kludge, I strongly suspect that just a few seconds of crude delay before crypttab is processed would be enough to make it work every boot.
For auto-unlocking the LUKS partitions I have installed a tang server, used "clevis luks bind" to bind the LUKS partitions and can successfully unlock using the tang server, ie. every single time I run "clevis luks unlock" it works fine.
The problem is that at boot time, SOMETIMES it works and the LUKS volumes are successfully unlocked by clevis-systemd using the _netdev entries I've put in /etc/crypttab, sometimes it doesn't work and it just waits for the password so I have to KVM the server and type the password interactively to decrypt the devices.
Unfortunately one of the machines that'll be in the cluster hasn't got IPMI so I need to get it sorted before I can cluster them, otherwise that machine will just get stuck at boot and hang waiting for me to physically attend it, no bueno.
It's obviously (well, maybe I'm wrong, but it SEEMS obvious) some sort of timing issue that sometimes the crypttab is being processed before networking is available, because like I said the tang server (in fact, there's more than one of them) works perfectly - without fail, every time I interactively use "clevis luks unlock", it only ever fails to work during the PVE boot process, presumably because at the instant it tries to contact the tang server it can't reach it yet, but another boot it can.
To attempt to resolve it I've added "After=network-online.target" to /lib/systemd/system/remote-cryptsetup.target but that doesn't seem to have fixed it.
I'm really in need of someone more familiar than I am with systemd and the Debian boot process to point me in the right direction as to how I can either make it wait until some event like a target or service in systemd being reached, or even just introduce a fixed delay - although that would be a kludge, I strongly suspect that just a few seconds of crude delay before crypttab is processed would be enough to make it work every boot.