Fix stuck initramfs?

Dunuin

Distinguished Member
Jun 30, 2020
14,793
4,607
258
Germany
Hi,

I wiped my old PVE 7.3, installed a new PVE 7.3, encrypted the ZFS pool, set everything up and then I wanted to add dropbear-initramfs to be able to unlock it through SSH. With the previous PVE installation using LUKS this worked fine. But now initramfs is stuck in a loop and PVE won't boot anymore:
dropbear2.png

What I did:

configure ZFS root unlocking through SSH​

  • install packages: apt update && apt install dropbear-initramfs busybox
  • add pub key to dropbear:nano /etc/dropbear-initramfs/authorized_keys Add pubkey there
  • edit initramfs-dropbear config: nano /etc/dropbear-initramfs/config
    Change
    #DROPBEAR_OPTIONS=
    to
    DROPBEAR_OPTIONS="-p 10022 -j -k -c zfsunlock"
  • add VLAN functionalities to dropbear:
    • run: nano /etc/initramfs-tools/scripts/local-top/vlan
      Add there:
      Code:
      #!/bin/sh
      
      PREREQ=""
      
      prereqs() {
          echo "$PREREQ"
      }
      
      case "$1" in
          prereqs)
              prereqs
              exit 0
          ;;
      esac
      
      . /scripts/functions
      . /conf/initramfs.conf
      . /conf/conf.d/*.conf
      
      if [ -z "$VLAN" ]; then
          exit 0
      fi
      
      modprobe 8021q
      
      for VLAN_IFACE in ${VLAN:-*}; do
          SOURCE_IFACE=$(echo $VLAN_IFACE | cut -d":" -f1)
          VLAN_ID=$(echo $VLAN_IFACE | cut -d":" -f2)
          log_begin_msg "Bringing up $SOURCE_IFACE.$VLAN_ID"
          ip link add link $SOURCE_IFACE name $SOURCE_IFACE.$VLAN_ID type vlan id $VLAN_ID
          ip link set $SOURCE_IFACE up
          ip link set $SOURCE_IFACE.$VLAN_ID up
          log_end_msg
      done
      
      exit 0
    • run: chmod 755 /etc/initramfs-tools/scripts/local-top/vlan
    • run: nano /etc/initramfs-tools/scripts/local-bottom/vlan
      Add there:
      Code:
      #!/bin/sh
      
      PREREQ="ifdown"
      
      prereqs() {
          echo "$PREREQ"
      }
      
      case "$1" in
          prereqs)
              prereqs
              exit 0
          ;;
      esac
      
      . /scripts/functions
      . /conf/initramfs.conf
      . /conf/conf.d/*.conf
      
      if [ -z "$VLAN" ]; then
          exit 0
      fi
      
      for VLAN_IFACE in ${VLAN:-*}; do
          SOURCE_IFACE=$(echo $VLAN_IFACE | cut -d":" -f1)
          VLAN_ID=$(echo $VLAN_IFACE | cut -d":" -f2)
          log_begin_msg "Bringing down $SOURCE_IFACE.$VLAN_ID"
          ip link delete $SOURCE_IFACE.$VLAN_ID
          log_end_msg
      done
    • run: chmod 755 /etc/initramfs-tools/scripts/local-bottom/vlan
    • run: nano /etc/initramfs-tools/hooks/vlan
      Add there:
      Code:
      #!/bin/sh
      PREREQ=""
      prereqs()
      {
           echo "$PREREQ"
      }
      
      case $1 in
      prereqs)
           prereqs
           exit 0
           ;;
      esac
      
      . /usr/share/initramfs-tools/hook-functions
      # Begin real processing below this line
      
      if grep -q ^VLAN= /etc/initramfs-tools/initramfs.conf /etc/initramfs-tools/conf.d/*.conf; then
          manual_add_modules 8021q
      fi
    • run: chmod 755 /etc/initramfs-tools/scripts/local-top/vlan
    • run: nano /etc/initramfs-tools/initramfs.conf
      Add at the bottom:
      Code:
      VLAN="ens3:43"
      IP=192.168.43.50::192.168.43.1:255.255.255.0:EnterpriseUnlock:ens3.43:off:192.168.43.1
    • rebuild initramfs: update-initramfs -u
    • reboot
The hook comes from this repo to add VLAN support to dropbear-initramfs, as all my NICs are tagged VLAN trunks:
https://github.com/stcz/initramfs-tools-network-hook

"ens3" is my MCX-311A-XCAT NIC that previously worked fine with the old PVE installation that was using LUKS.

When booting the server and looking at the physical console I see this, where dropbear-initramfs got errors that the network isn't working (and I can't access it through SSH). Then I'm asked for the passphrase, I type it in, the pool unlocks and then it got stuck in that "cat: not found...sleep: not found" loop from above picture:
dropbear1.png

How do I best recover from this? Tried booting PVE ISO in rescue mode but it complains that rpool can't be found and aborts (I guess its because the pool in encrypted).
CTRL+C in that loop also doesn't work. Not sure what it is doing but neither SSH nor the webUI are available, so I guess it hangs before PVE actually boots.

Would booting a live ubuntu, importing and unlocking my rpool and then chrooting into PVE help, so I could revert the changes and update the initramfs?
Or is there an easier way to disable my changes to the initramfs?
 
How do I best recover from this? Tried booting PVE ISO in rescue mode but it complains that rpool can't be found and aborts (I guess its because the pool in encrypted).
booting the PVE (or any recent Proxmox ) ISO in debug mode should give you a workable shell with ZFS in kernel and userspace (the second debug-shell) - there you should be able to import and mount the pool (at an altroot), chroot and fix the changes if possible

else - not sure how the dropbear-initramfs are configured - but I see the following issues in the ouput:
* ens3 link becomes ready quite after local-top tries to set ens3.43 up - so it might be a timing issue (you could simply add a short sleep for testing)
* is busybox (and all its links as e.g. `/bin/cat`, `/bin/sleep`) added to the initramfs ? (see `man unmkinitramfs`)
 
  • Like
Reactions: Dunuin
What I tried:
- boot PVE 7.2 ISO, enter Debug Mode Install, go to shell with exit
- Tried to edit keyboard layout to be able to enter special chars with a German layout but I get this:
1672507853897.png- I also can't add any new packages because no network is configured, so I need to setup my networking first: echo -e "auto ens3\niface ens3 inet manual\nauto ens3.43\niface ens3.43 inet static\address 192.168.43.50/24\ngateway 192.168.43.1\ndns-nameservers 192.168.43.1" >> /etc/network/interfaces should be correct and results in this:
1672509365428.png
But then I can't bring that VLAN interface up:
1672509591818.png

Any hints on how to get network with VLANs running and how to change the keyboard layout in PVE debug mode?

I first would like to get SSH up and running, because my webKVM doesn't support copy&paste and inputs are screwed up as the webKVM is uing a English kayboard layout and I'm using a physical German layout.

Edit:
Ok, skipped the idea of using PVE debug mode and installed a PBS 2.3 as ext4 on a pen drive to get ZFS support and an easy access to console using webUI or SSH.

Importing rpool:
Code:
root@RescuePBS:~# zpool import -f -R /rpool rpool

root@RescuePBS:~# zfs list
NAME               USED  AVAIL     REFER  MOUNTPOINT
rpool             1.59G  25.4G      104K  /rpool/rpool
rpool/ROOT        1.59G  25.4G      192K  /rpool/rpool/ROOT
rpool/ROOT/pve-1  1.59G  25.4G     1.59G  /rpool

root@RescuePBS:~# zfs load-key rpool/ROOT
Enter passphrase for 'rpool/ROOT':

root@RescuePBS:~# zfs get keystatus
NAME              PROPERTY   VALUE        SOURCE
rpool             keystatus  -            -
rpool/ROOT        keystatus  available    -
rpool/ROOT/pve-1  keystatus  available    -

root@RescuePBS:~# zfs get mounted
NAME              PROPERTY  VALUE    SOURCE
rpool             mounted   yes      -
rpool/ROOT        mounted   no       -
rpool/ROOT/pve-1  mounted   no       -

root@RescuePBS:~# zfs mount rpool/ROOT/pve-1

root@RescuePBS:~# zfs get mounted
NAME              PROPERTY  VALUE    SOURCE
rpool             mounted   yes      -
rpool/ROOT        mounted   no       -
rpool/ROOT/pve-1  mounted   yes      -

root@RescuePBS:~# ls -la /rpool
total 185
drwxr-xr-x 20 root root   26 Dec 28 23:20 .
drwxr-xr-x 19 root root 4096 Dec 31 20:47 ..
lrwxrwxrwx  1 root root    7 May  4  2022 bin -> usr/bin
drwxr-xr-x  5 root root   15 Dec 29 00:38 boot
drwxr-xr-x  4 root root   16 May  4  2022 dev
drwxr-xr-x  2 root root    2 Dec 28 19:15 dpool
drwxr-xr-x 88 root root  180 Dec 29 00:26 etc
drwxr-xr-x  2 root root    2 Mar 19  2022 home
lrwxrwxrwx  1 root root    7 May  4  2022 lib -> usr/lib
lrwxrwxrwx  1 root root    9 May  4  2022 lib32 -> usr/lib32
lrwxrwxrwx  1 root root    9 May  4  2022 lib64 -> usr/lib64
lrwxrwxrwx  1 root root   10 May  4  2022 libx32 -> usr/libx32
drwxr-xr-x  2 root root    2 May  4  2022 media
drwxr-xr-x  2 root root    2 Dec 28 16:44 mnt
drwxr-xr-x  2 root root    2 May  4  2022 opt
drwxr-xr-x  2 root root    2 Mar 19  2022 proc
drwx------  5 root root   10 Dec 28 19:20 root
drwxr-xr-x  2 root root    2 Dec 28 16:46 rpool
drwxr-xr-x  5 root root    6 May  4  2022 run
lrwxrwxrwx  1 root root    8 May  4  2022 sbin -> usr/sbin
drwxr-xr-x  2 root root    2 May  4  2022 srv
drwxr-xr-x  2 root root    2 Mar 19  2022 sys
drwxrwxrwt  7 root root    7 Dec 29 00:39 tmp
drwxr-xr-x 14 root root   14 May  4  2022 usr
drwxr-xr-x 11 root root   13 May  4  2022 var
drwxr-xr-x  3 root root    3 Dec 28 23:20 VMpool

Verified again that my initramfs hooks and scripts are identical to the github repo. Commented out my lines...
Code:
#VLAN="ens3:43"
#IP=192.168.43.50::192.168.43.1:255.255.255.0:EnterpriseUnlock:ens3.43:off:192.168.43.1
... in "/rpool/etc/initramfs-tools/initramfs.conf",

Doing the chroot and rebuilding initramfs:
Code:
root@RescuePBS:/# mount --bind /dev /rpool/dev
root@RescuePBS:/# mount -t proc proc /rpool/proc
root@RescuePBS:/# chroot /rpool/ /bin/bash -i
root@RescuePBS:/# mount -t sysfs sys /sys
root@RescuePBS:/# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-5.15.83-1-pve
cryptsetup: ERROR: Couldn't resolve device rpool/ROOT/pve-1
cryptsetup: WARNING: Couldn't determine root device
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/3BC6-C2B3
        Copying kernel and creating boot-entry for 5.15.30-2-pve
        Copying kernel and creating boot-entry for 5.15.83-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/3BC7-3962
        Copying kernel and creating boot-entry for 5.15.30-2-pve
        Copying kernel and creating boot-entry for 5.15.83-1-pve
root@RescuePBS:/# exit
rroot@RescuePBS:/# reboot

Now it at least boots again into PVE.
And I think I identified one problem. I got a typo in the path to my LUKS keyfile in "/etc/cryptsetup". So initramfs wasn't using my keyfile to unlock a LUKS encrypted LVM-Thin pool and was therefore asking me in the initramfs step to unlock it with a passphrase after unlocking the rpool. I couldn't see this because these "cat/sleep: not found" messages were spamming the screen.


The question is now how to get that dropbear-initramfs working with my VLAN?

I edited "/rpool/etc/initramfs-tools/initramfs.conf" to the same config that worked with the same NIC on the old PVE installation...just replaced ens5 with ens3 because of the new mainboard:
Code:
VLAN="ens3:43"
IP=192.168.43.50::192.168.43.1:255.255.255.0::ens3.43:off

But SSH still doesn't work:
1672520474712.png
1672520984015.png
@Stoiko Ivanov: Where would I need to add a delay in the initramfs so it gets more time for the NIC to become ready?

Edited "/etc/initramfs-tools/scripts/local-top/vlan" to this adding a 15 sek wait:
Code:
#!/bin/sh

PREREQ=""

prereqs() {
    echo "$PREREQ"
}

case "$1" in
    prereqs)
        prereqs
        exit 0
    ;;
esac

. /scripts/functions
. /conf/initramfs.conf
. /conf/conf.d/*.conf

if [ -z "$VLAN" ]; then
    exit 0
fi

modprobe 8021q
sleep 15

for VLAN_IFACE in ${VLAN:-*}; do
    SOURCE_IFACE=$(echo $VLAN_IFACE | cut -d":" -f1)
    VLAN_ID=$(echo $VLAN_IFACE | cut -d":" -f2)
    log_begin_msg "Bringing up $SOURCE_IFACE.$VLAN_ID"
    ip link add link $SOURCE_IFACE name $SOURCE_IFACE.$VLAN_ID type vlan id $VLAN_ID
    ip link set $SOURCE_IFACE up
    ip link set $SOURCE_IFACE.$VLAN_ID up
    log_end_msg
done

exit 0

But SSH it still fails to work:
1672521548490.png
 
Last edited:
- I also can't add any new packages because no network is configured, so I need to setup my networking first: echo -e "auto ens3\niface ens3 inet manual\nauto ens3.43\niface ens3.43 inet static\address 192.168.43.50/24\ngateway 192.168.43.1\ndns-nameservers 192.168.43.1" >> /etc/network/interfaces should be correct and results in this:
I think the issue here is simply that you need to add some whitespace after an iface line (for address etc. )

The `Operation Not Supported` seems to be the issue when trying to configure the vlan-interface - I assume it's from the `ip link add link ...` line
any hint in the initramfs-shell (and it's `dmesg` output) why this is unsuccessful?
maybe a module-parameter is needed in addition?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!