[TUTORIAL] Proxmox VE 9 - Developer Workstation - BTRFS RAID1 - FULL-HA - WAYLAND or X - GRUB-BTRFS - and more

aureladmin

Renowned Member
Apr 15, 2016
53
22
73
Proxmox VE 9 — Developer Workstation (V14)

This is a translation of my documentation into my native language, performed by AI. I have attached the original automated scripts in my language (French).

Debian 13 (Trixie) · Btrfs RAID1 · UEFI HA · Snapper + grub-btrfs · vmbr0 NAT/DHCP


Date: 2026-03-06
Version: V14.5 (complete manual procedure + S1 → S8 scripts in appendix)

⚠️ Warning — destructive: this procedure completely wipes the two selected disks.


Table of Contents

  • 1. Goals and architecture
  • 2. Prerequisites
  • 3. Manual installation (S1 → S8)
  • 4. Operations (diagnostics, rollback, disk replacement)
  • 5. Automated installation (scripts)
  • 6. Appendices: complete script code V14


1. Goals and architecture

This documentation describes the setup of a developer-workstation running Proxmox VE 9 based on:

  • Debian 13 (Trixie) installed via debootstrap
  • Btrfs RAID1 (data + metadata in RAID1)
  • UEFI HA: 2 EFI partitions (/boot/efi + /boot/efi2) synchronized
  • SWAP failover: SWAP1 → SWAP2 via a systemd service
  • Automatic Btrfs disk replacement (missing disk + blank disk detection)
  • Bootable snapshots via Snapper + grub-btrfs (GRUB menu)
  • Network
  • one "WAN" interface (external network)
  • a vmbr0 bridge without physical interface (NAT + DHCP via dnsmasq) for VM/LXC
  • Boot optimization: reduced NetworkManager-wait-online delay (5 seconds)

Why "persistent" subvolumes in addition to the snapshotable root?
When booting on a snapshot, the root / is a Btrfs snapshot. Some services (Proxmox, display manager, etc.) need to write to specific directories at boot.
We therefore isolate critical directories in dedicated subvolumes, mounted separately and thus persistent.

Proxmox subvolumes (created before the proxmox-ve installation):
  • /var/lib/vz@var_lib_vz
  • /var/lib/pve-cluster@var_lib_pve_cluster
  • /var/lib/corosync@var_lib_corosync
  • /var/lib/pve-manager@var_lib_pve_manager

Display manager subvolume (if graphical interface):
  • GNOME: /var/lib/gdm3@gdm3
  • KDE: /var/lib/sddm@sddm
  • XFCE / LMDE7: /var/lib/lightdm@Lightdm

Important note: /etc/pve is a FUSE mount managed by pve-cluster.
Do not create a Btrfs subvolume for /etc/pve.


2. Prerequisites

  • UEFI machine
  • 2 physical disks (identical size, or replacement disk ≥ original disk)
  • Internet access
  • Debian 13 Live ISO (Trixie)
  • Information to prepare:
  • short hostname (e.g. prox-dev1)
  • FQDN (e.g. prox-dev1.local)
  • static IP of the Proxmox host
  • WAN interface (e.g. ens18)


3. Manual installation (S1 → S8)

This section is a copy-paste procedure, in S1 → S8 order.

Convention: run as root (sudo su) when indicated.
You can copy-paste block by block.


3.1 (S1) Boot on Debian Live + tools

Code:
sudo su
set -euo pipefail

apt update
apt install -y gdisk btrfs-progs debootstrap bc curl wget vim git sudo iptables rsync                grub-efi-amd64 shim-signed efibootmgr udev parted


3.2 (S1) Installation variables (hostname, user, network)

Proxmox needs a static IP, and the hostname must be resolved at boot.
We therefore put the host IP in /etc/hosts (not necessarily 127.0.1.1).


Code:
# TO ADAPT
export HOST_SHORT="prox-dev1"
export HOST_FQDN="prox-dev1.local"
export USER_NAME="your_username"

# WAN (static IP of the Proxmox host)
export INSTALL_NET_IF="ens18"
export INSTALL_NET_ADDR="192.168.199.221/24"
export INSTALL_NET_GW="192.168.199.1"
export INSTALL_NET_DNS="1.1.1.1 8.8.8.8"


3.3 (S1) Interactive disk selection (DESTRUCTIVE)


Code:
echo "=== Physical disk detection ==="

mapfile -t CANDIDATES < <(lsblk -dpno NAME,TYPE | awk '$2=="disk"{print $1}')
if [ "${#CANDIDATES[@]}" -lt 2 ]; then
  echo "ERROR: fewer than two disks detected."
  exit 1
fi

echo "Detected disks:"
MENU_ITEMS=()
for DEV in "${CANDIDATES[@]}"; do
  ID_PATH=""
  for ID in /dev/disk/by-id/*; do
    [ -L "$ID" ] || continue
    TARGET=$(readlink -f "$ID")
    if [ "$TARGET" = "$DEV" ]; then
      ID_PATH="$ID"
      break
    fi
  done
  [ -z "$ID_PATH" ] && ID_PATH="$DEV"

  SIZE=$(lsblk -dn -o SIZE "$DEV")
  MODEL=$(lsblk -dn -o MODEL "$DEV" 2>/dev/null || echo "N/A")
  SERIAL=$(udevadm info --query=property --name="$DEV" 2>/dev/null | awk -F= '/^ID_SERIAL=/{print $2}' || echo "")

  if [ -n "$SERIAL" ]; then
    DESC="$ID_PATH | $SIZE | $MODEL | SN: $SERIAL"
  else
    DESC="$ID_PATH | $SIZE | $MODEL"
  fi
  MENU_ITEMS+=("$DEV::$ID_PATH::$DESC")
done

i=0
for ITEM in "${MENU_ITEMS[@]}"; do
  i=$((i+1))
  IDP=$(echo "$ITEM" | cut -d':' -f3)
  echo "  $i) $IDP"
done

echo
read -rp "Select disk 1: " CHOICE1
read -rp "Select disk 2: " CHOICE2

if ! [[ "$CHOICE1" =~ ^[0-9]+$ && "$CHOICE2" =~ ^[0-9]+$ ]]; then
  echo "ERROR: invalid choices."
  exit 1
fi
if [ "$CHOICE1" -eq "$CHOICE2" ]; then
  echo "ERROR: choices must be different."
  exit 1
fi

MAX=${#MENU_ITEMS[@]}
if [ "$CHOICE1" -lt 1 ] || [ "$CHOICE1" -gt "$MAX" ] || [ "$CHOICE2" -lt 1 ] || [ "$CHOICE2" -gt "$MAX" ]; then
  echo "ERROR: choice out of range."
  exit 1
fi

SEL1="${MENU_ITEMS[$((CHOICE1-1))]}"
SEL2="${MENU_ITEMS[$((CHOICE2-1))]}"
export DISK1="$(echo "$SEL1" | cut -d':' -f1)"
export DISK2="$(echo "$SEL2" | cut -d':' -f1)"

echo "Disk 1: $DISK1"
echo "Disk 2: $DISK2"


3.4 (S1) GPT + EFI + SWAP + BTRFS_RAID partitioning

Layout:
  • p1: EFI (2048 MiB)
  • p2: SWAP (RAM × 1.5) — label SWAP1 / SWAP2
  • p3: BTRFS_RAID (remaining disk space)

The Btrfs label of the volume (created in the next step) is fixed: prox_raid1.


Code:
RAM_MB=$(awk '/MemTotal/ {printf "%.0f", $2/1024}' /proc/meminfo)
SWAP_MB=$(printf "%.0f" "$(echo "$RAM_MB * 1.5" | bc -l)")
echo "RAM_MB=$RAM_MB  SWAP_MB=$SWAP_MB"

i=0
for DISK in "$DISK1" "$DISK2"; do
  i=$((i+1))
  sgdisk -Z "$DISK"
  sgdisk -og "$DISK"
  sgdisk -n 1::+2048M -t 1:ef00 -c 1:"EFI" "$DISK"
  if [ "$i" -eq 1 ]; then SWAPLABEL="SWAP1"; else SWAPLABEL="SWAP2"; fi
  sgdisk -n 2::+${SWAP_MB}M -t 2:8200 -c 2:"$SWAPLABEL" "$DISK"
  sgdisk -n 3:: -t 3:8300 -c 3:"BTRFS_RAID" "$DISK"
  mkfs.fat -F32 -n EFI "${DISK}1"
done


3.5 (S1) Create Btrfs RAID1 + base subvolumes

V14: do not manually create /.snapshots (Snapper manages it).

Fixed Btrfs label: prox_raid1
The Btrfs filesystem label is intentionally fixed (mkfs.btrfs -L prox_raid1 …).
Commands in this doc (and scripts) use it via mount -L prox_raid1 / blkid -t LABEL="prox_raid1". Do not change it.


Code:
mkfs.btrfs -f -L prox_raid1 -d raid1 -m raid1 "${DISK1}3" "${DISK2}3"

# Create subvolumes at top-level (subvolid=5)
mount -L prox_raid1 -o subvolid=5 /mnt

btrfs subvolume create /mnt/@
btrfs subvolume create /mnt/@home
btrfs subvolume create /mnt/@log
btrfs subvolume create /mnt/@cache
btrfs subvolume create /mnt/@tmp

umount /mnt


3.6 (S1) Mount + debootstrap

V14: Btrfs options without defaults (avoids unwanted interactions on snapshot boot).


Code:
BTRFS_OPTS="noatime,compress=zstd:1"

mount -o $BTRFS_OPTS,subvol=@ -L prox_raid1 /mnt
mkdir -p /mnt/{home,var/{log,cache,tmp},boot/efi,boot/efi2}

mount -o $BTRFS_OPTS,subvol=@home  -L prox_raid1 /mnt/home
mount -o $BTRFS_OPTS,subvol=@log   -L prox_raid1 /mnt/var/log
mount -o $BTRFS_OPTS,subvol=@cache -L prox_raid1 /mnt/var/cache
mount -o $BTRFS_OPTS,subvol=@tmp   -L prox_raid1 /mnt/var/tmp

mount "${DISK1}1" /mnt/boot/efi
mount "${DISK2}1" /mnt/boot/efi2

debootstrap --arch=amd64 trixie /mnt https://deb.debian.org/debian


3.7 (S1) Prepare and run the chroot (Debian + HA GRUB configuration)


Code:
# pseudo-FS
for d in dev proc sys run; do
  mount --rbind "/$d" "/mnt/$d"
  mount --make-rslave "/mnt/$d"
done

mkdir -p /mnt/sys/firmware/efi/efivars
mount -t efivarfs efivarfs /mnt/sys/firmware/efi/efivars || true

cp -L /etc/resolv.conf /mnt/etc/resolv.conf || true

Create the "inside-chroot" script:


Code:
cat > /mnt/root/inside-chroot-base.sh <<'EOF_CHROOT'
#!/bin/bash
set -euo pipefail

: "${HOST_SHORT:?}"
: "${HOST_FQDN:?}"
: "${USER_NAME:?}"
: "${INSTALL_NET_IF:?}"
: "${INSTALL_NET_ADDR:?}"
: "${INSTALL_NET_GW:?}"
: "${INSTALL_NET_DNS:?}"

echo "$HOST_SHORT" > /etc/hostname
HOST_IP="${INSTALL_NET_ADDR%/*}"

# /etc/hosts: replace the IP on the "FQDN SHORT" line with the host IP
cat > /etc/hosts <<EOF
127.0.0.1   localhost
$HOST_IP    $HOST_FQDN $HOST_SHORT
::1         localhost
EOF

cat > /etc/apt/sources.list <<EOF
deb http://deb.debian.org/debian trixie main contrib non-free non-free-firmware
deb http://security.debian.org/debian-security trixie-security main contrib non-free non-free-firmware
deb http://deb.debian.org/debian trixie-updates main contrib non-free non-free-firmware
EOF

apt update
apt install -y locales console-setup
dpkg-reconfigure locales
dpkg-reconfigure keyboard-configuration

apt install -y   linux-image-amd64 linux-headers-amd64   grub-efi-amd64 shim-signed efibootmgr   btrfs-progs smartmontools rsync sudo vim git curl wget iptables bash-completion   openssh-server

cat > /etc/network/interfaces <<EOF
auto lo
iface lo inet loopback

auto $INSTALL_NET_IF
iface $INSTALL_NET_IF inet static
    address $INSTALL_NET_ADDR
    gateway $INSTALL_NET_GW
    dns-nameservers $INSTALL_NET_DNS
EOF

BTRFS_UUID=$(blkid -o value -s UUID -t LABEL="prox_raid1" | sort -u)

EFI1_UUID=$(
  blkid -s UUID -o value "$(
    lsblk -rpno NAME,FSTYPE,MOUNTPOINT | awk '$2=="vfat" && $3=="/boot/efi"{print $1;exit}'
  )"
)

EFI2_UUID=$(
  blkid -s UUID -o value "$(
    lsblk -rpno NAME,FSTYPE,MOUNTPOINT | awk '$2=="vfat" && $3=="/boot/efi2"{print $1;exit}'
  )"
)

BTRFS_OPTS="noatime,compress=zstd:1"

cat > /etc/fstab <<EOF
UUID=$BTRFS_UUID /           btrfs $BTRFS_OPTS,subvol=@           0 0
UUID=$BTRFS_UUID /home       btrfs $BTRFS_OPTS,subvol=@home       0 0
UUID=$BTRFS_UUID /var/log    btrfs $BTRFS_OPTS,subvol=@log        0 0
UUID=$BTRFS_UUID /var/cache  btrfs $BTRFS_OPTS,subvol=@cache      0 0
UUID=$BTRFS_UUID /var/tmp    btrfs $BTRFS_OPTS,subvol=@tmp        0 0

UUID=$EFI1_UUID /boot/efi    vfat  defaults,noatime,nofail        0 2
UUID=$EFI2_UUID /boot/efi2   vfat  defaults,noatime,nofail        0 2
EOF

SWAP1_DEV=$(blkid -o device -t PARTLABEL="SWAP1")
SWAP2_DEV=$(blkid -o device -t PARTLABEL="SWAP2")

mkswap -L SWAP1 "$SWAP1_DEV"
mkswap -L SWAP2 "$SWAP2_DEV"
swapon "$SWAP1_DEV"

SWAP1_UUID=$(blkid -s UUID -o value "$SWAP1_DEV")

cat >> /etc/fstab <<EOF
# SWAP-FAILOVER-MANAGED
UUID=$SWAP1_UUID none swap defaults 0 0
EOF

# GRUB: root on Btrfs UUID + subvol + degraded + resume SWAP1
sed -i "s|^GRUB_CMDLINE_LINUX_DEFAULT=.*|GRUB_CMDLINE_LINUX_DEFAULT="quiet resume=UUID=$SWAP1_UUID root=UUID=$BTRFS_UUID rootflags=subvol=@,degraded"|" /etc/default/grub
sed -i "s|^GRUB_CMDLINE_LINUX=.*|GRUB_CMDLINE_LINUX="resume=UUID=$SWAP1_UUID root=UUID=$BTRFS_UUID rootflags=subvol=@,degraded"|" /etc/default/grub

echo btrfs >> /etc/initramfs-tools/modules || true

grub-install --target=x86_64-efi --efi-directory=/boot/efi  --bootloader-id=debian-disk1 --removable --recheck
grub-install --target=x86_64-efi --efi-directory=/boot/efi2 --bootloader-id=debian-disk2 --removable --recheck

update-grub
update-initramfs -u -k all

useradd -m -G sudo,adm -s /bin/bash -c "$USER_NAME" "$USER_NAME"

echo
echo "Password for user $USER_NAME:"
passwd "$USER_NAME"

echo
echo "Password for root:"
passwd root
EOF_CHROOT
chmod +x /mnt/root/inside-chroot-base.sh

Run the chroot:


Code:
export HOST_SHORT HOST_FQDN USER_NAME INSTALL_NET_IF INSTALL_NET_ADDR INSTALL_NET_GW INSTALL_NET_DNS
chroot /mnt /root/inside-chroot-base.sh

Clean exit + reboot:


Code:
umount -Rv /mnt || true
reboot
 

Attachments

3.8 (S2) Enable HA (EFI sync + swap failover + auto-replace + APT hook) + SSH

This step is done on the installed Debian system (after S1), as root.

3.8.1 Mount EFI partitions if needed


Code:
sudo su
set -euo pipefail

mountpoint -q /boot/efi  || mount /boot/efi
mountpoint -q /boot/efi2 || mount /boot/efi2

3.8.2 Install dependencies


Code:
apt update
apt install -y rsync efibootmgr grub-efi-amd64 shim-signed btrfs-progs

3.8.3 Install efi-sync.sh


Code:
cat > /usr/local/sbin/efi-sync.sh <<'EOF'
#!/bin/bash
#
# EFI SYNC PRODUCTION SCRIPT — 2026 V2.1
#
# - Check that /boot/efi and /boot/efi2 are mounted
# - Rebuild a clean "removable" BOOTX64.efi on the primary ESP
# - Sync EFI1 content -> EFI2 via rsync
#
set -euo pipefail
LOGFILE="/var/log/efi-sync.log"
PRIMARY_EFI="/boot/efi"
SECONDARY_EFI="/boot/efi2"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
log(){ echo "[$DATE] $1" | tee -a "$LOGFILE"; }
log "----- EFI Sync Start -----"
# Check mount points
if ! mountpoint -q "$PRIMARY_EFI"; then
    log "ERROR: primary EFI not mounted ($PRIMARY_EFI)"
    exit 1
fi
if ! mountpoint -q "$SECONDARY_EFI"; then
    log "ERROR: secondary EFI not mounted ($SECONDARY_EFI)"
    exit 1
fi
###############################################
# 1) BOOTX64.efi “removable” propre
###############################################
BOOT_DIR_PRIMARY="$PRIMARY_EFI/EFI/BOOT"
mkdir -p "$BOOT_DIR_PRIMARY"
SHIM_PATH=$(find "$PRIMARY_EFI/EFI" -maxdepth 3 -type f -iname 'shimx64.efi' 2>/dev/null | head -n1 || true)
GRUB_PATH=$(find "$PRIMARY_EFI/EFI" -maxdepth 3 -type f -iname 'grubx64.efi' 2>/dev/null | head -n1 || true)
if [ -n "$SHIM_PATH" ]; then
    log "Secure Boot (shimx64.efi) detected — updating BOOTX64.efi from $SHIM_PATH"
    cp -f "$SHIM_PATH" "$BOOT_DIR_PRIMARY/BOOTX64.efi"
elif [ -n "$GRUB_PATH" ]; then
    log "shimx64.efi absent — updating BOOTX64.efi from $GRUB_PATH"
    cp -f "$GRUB_PATH" "$BOOT_DIR_PRIMARY/BOOTX64.efi"
else
    log "WARNING: neither shimx64.efi nor grubx64.efi found under $PRIMARY_EFI/EFI — BOOTX64.efi not updated"
fi
###############################################
# 2) Sync EFI1 -> EFI2
###############################################
log "Syncing EFI2 from EFI1..."
if rsync -a --delete --inplace "$PRIMARY_EFI"/ "$SECONDARY_EFI"/ >> "$LOGFILE" 2>&1; then
    log "Sync OK"
else
    log "ERROR: rsync failed"
    exit 1
fi
###############################################
# 3) Secure Boot information
###############################################
SHIM_PATH_LOG=$(find "$PRIMARY_EFI/EFI" -maxdepth 3 -type f -iname 'shimx64.efi' 2>/dev/null | head -n1 || true)
if [ -n "$SHIM_PATH_LOG" ]; then
    log "Secure Boot detected (shim present under $PRIMARY_EFI/EFI)"
else
    log "Shim not detected under $PRIMARY_EFI/EFI"
fi
log "----- EFI Sync End -----"
echo "" >> "$LOGFILE"
exit 0
EOF
chmod +x /usr/local/sbin/efi-sync.sh

3.8.4 Kernel + GRUB hooks (automatic execution)


Code:
cat > /etc/kernel/postinst.d/zz-sync-efi <<'EOF'
#!/bin/bash
/usr/local/sbin/efi-sync.sh
EOF
chmod +x /etc/kernel/postinst.d/zz-sync-efi

cat > /etc/grub.d/99_sync_efi <<'EOF'
#!/bin/bash
/usr/local/sbin/efi-sync.sh
EOF
chmod +x /etc/grub.d/99_sync_efi

3.8.5 Install swap-failover.sh + service


Code:
cat > /usr/local/sbin/swap-failover.sh <<'EOF'
#!/bin/bash
#
# SWAP FAILOVER FOR RAID1 — SWAP1 → SWAP2
#
set -euo pipefail
LOGFILE="/run/swap-failover.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
log() {
    echo "[$DATE] $1" | tee -a "$LOGFILE"
}
log "===== SWAP FAILOVER START ====="
SWAP1_DEV=$(blkid -o device -t PARTLABEL="SWAP1" || true)
SWAP2_DEV=$(blkid -o device -t PARTLABEL="SWAP2" || true)
# Remove the managed block (SWAP-FAILOVER-MANAGED) from /etc/fstab
if grep -q '^# SWAP-FAILOVER-MANAGED' /etc/fstab; then
    sed -i '/^# SWAP-FAILOVER-MANAGED/{N;d;}' /etc/fstab
fi
sed -i '/SWAP1/d;/SWAP2/d;/Primary swap/d' /etc/fstab || true
if [ -n "$SWAP1_DEV" ]; then
    SWAP_UUID=$(blkid -s UUID -o value "$SWAP1_DEV")
    {
        echo "# SWAP-FAILOVER-MANAGED"
        echo "UUID=$SWAP_UUID none swap defaults 0 0"
    } >> /etc/fstab
    log "SWAP1 present → fstab line (SWAP-FAILOVER-MANAGED) updated"
elif [ -z "$SWAP1_DEV" ] && [ -n "$SWAP2_DEV" ]; then
    SWAP_UUID=$(blkid -s UUID -o value "$SWAP2_DEV")
    {
        echo "# SWAP-FAILOVER-MANAGED"
        echo "UUID=$SWAP_UUID none swap defaults 0 0"
    } >> /etc/fstab
    log "SWAP1 absent → switching to SWAP2 in fstab (SWAP-FAILOVER-MANAGED)"
else
    log "WARNING: no SWAP partition detected"
    log "===== SWAP FAILOVER END ====="
    exit 0
fi
swapon -a || true
log "===== SWAP FAILOVER END ====="
exit 0
EOF
chmod +x /usr/local/sbin/swap-failover.sh

cat > /etc/systemd/system/swap-failover.service <<'EOF'
[Unit]
Description=Swap failover for RAID1 (SWAP1 -> SWAP2)
DefaultDependencies=no
After=local-fs-pre.target
Before=swap.target
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/swap-failover.sh
[Install]
WantedBy=sysinit.target
EOF

systemctl daemon-reload
systemctl enable swap-failover.service

3.8.6 Install auto-replace-btrfs-disk.sh + service


Code:
cat > /usr/local/sbin/auto-replace-btrfs-disk.sh <<'EOF'
#!/bin/bash
#
# AUTO BTRFS RAID1 DISK REPLACEMENT — ENTERPRISE EDITION 2026 (V2)
#
set -euo pipefail

LOG="/var/log/btrfs-disk-replace.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
FS_LABEL="prox_raid1"

log(){ echo "[$DATE] $1" | tee -a "$LOG"; }

log "===== AUTO REPLACE START ====="

# Check that / is on prox_raid1
if ! btrfs filesystem show / | grep -q "$FS_LABEL"; then
  log "INFO: system not mounted on prox_raid1 → stopping"
  exit 0
fi

###############################################
# Helper: detect Btrfs errors (device stats)
###############################################
has_btrfs_errors() {
  # Returns 0 if there is at least one non-zero error, 1 otherwise
  # Skip the first line (header)
  if btrfs device stats / | awk 'NR>1 && $2>0 {exit 0} END {exit 1}'; then
    return 0  # errors detected
  else
    return 1  # no errors
  fi
}

###############################################
# Check for missing devices
###############################################
MISSING_COUNT=$(btrfs filesystem show / | grep -c "missing" || true)

if [ "$MISSING_COUNT" -eq 0 ]; then
  log "No missing disk"

  # If no disk is missing but Btrfs errors exist, launch an auto scrub
  if has_btrfs_errors; then
    log "Btrfs errors detected in 'btrfs device stats /' → launching automatic scrub"
    if btrfs scrub start -B -R /; then
      log "Automatic scrub completed successfully"
    else
      log "WARNING: automatic scrub returned an error code"
    fi
  else
    log "No Btrfs errors detected in 'btrfs device stats /'"
  fi

  log "===== AUTO REPLACE END (no missing disk) ====="
  exit 0
fi

if [ "$MISSING_COUNT" -gt 1 ]; then
  log "ERROR: more than one missing disk"
  exit 1
fi

log "Missing disk detected"

###############################################
# Detect a new blank disk
###############################################
ALL_DISKS=$(lsblk -dpno NAME,TYPE | awk '$2=="disk"{print $1}')
# Disks already part of the RAID
USED_DISKS=$(btrfs filesystem show / | grep "path" | awk '{print $NF}' | sed 's/[0-9]*$//' | sort -u)

NEW_DISK=""
for D in $ALL_DISKS; do
  # Disk not used by the current Btrfs RAID
  if ! echo "$USED_DISKS" | grep -q "$D"; then
    # Disk with no known signature (partition / FS / LVM, etc.)
    if ! blkid "$D" &>/dev/null; then
      NEW_DISK="$D"
      break
    fi
  fi
done

if [ -z "$NEW_DISK" ]; then
  log "ERROR: no blank disk detected for replacement."
  log "       No additional disk was found that is not already a RAID member"
  log "       and contains no signature (partition, filesystem, LVM, etc.)."
  log "TIP: to reuse an old disk, erase its partition table, e.g.:"
  log "     sgdisk -Z /dev/sdX  (replace /dev/sdX with the target disk)"
  log "     then restart the btrfs-auto-replace service."
  log "===== AUTO REPLACE END (no blank disk detected) ====="
  exit 1
fi

log "New disk detected: $NEW_DISK"

###############################################
# Minimal size check on the new disk
###############################################
REF_DISK=$(echo "$USED_DISKS" | head -n1 || true)
if [ -n "$REF_DISK" ]; then
  REF_SIZE=$(lsblk -bdno SIZE "$REF_DISK" 2>/dev/null || echo 0)
  NEW_SIZE=$(lsblk -bdno SIZE "$NEW_DISK" 2>/dev/null || echo 0)

  if [ "$REF_SIZE" -eq 0 ] || [ "$NEW_SIZE" -eq 0 ]; then
    log "WARNING: unable to read the size of $REF_DISK or $NEW_DISK, minimal size check skipped."
  elif [ "$NEW_SIZE" -lt "$REF_SIZE" ]; then
    log "ERROR: the replacement disk ($NEW_DISK, size ${NEW_SIZE} bytes)"
    log "       is smaller than an existing RAID member ($REF_DISK, size ${REF_SIZE} bytes)."
    log "       Use a disk of at least equivalent capacity, then restart the btrfs-auto-replace service."
    log "===== AUTO REPLACE END (new disk too small) ====="
    exit 1
  else
    log "Size check OK: $NEW_DISK (${NEW_SIZE} bytes) >= $REF_DISK (${REF_SIZE} bytes)."
  fi
fi

log "SMART check..."
if ! smartctl -H "$NEW_DISK" | grep -q "PASSED"; then
  log "ERROR: SMART FAILED"
  exit 1
fi
log "SMART OK"

###############################################
# SWAP management (possible recreation on the new disk)
###############################################
SWAP1_EXIST=$(blkid -o device -t LABEL="SWAP1" || true)
SWAP2_EXIST=$(blkid -o device -t LABEL="SWAP2" || true)
SWAP_LABEL_NEW=""
SWAP_REF_DEV=""

if [ -n "$SWAP1_EXIST" ] && [ -z "$SWAP2_EXIST" ]; then
  SWAP_LABEL_NEW="SWAP2"
  SWAP_REF_DEV="$SWAP1_EXIST"
elif [ -z "$SWAP1_EXIST" ] && [ -n "$SWAP2_EXIST" ]; then
  SWAP_LABEL_NEW="SWAP1"
  SWAP_REF_DEV="$SWAP2_EXIST"
elif [ -n "$SWAP1_EXIST" ] && [ -n "$SWAP2_EXIST" ]; then
  log "INFO: both SWAP partitions already exist, no SWAP to recreate"
else
  log "WARNING: no existing SWAP found, SWAP not recreated"
fi

SWAP_MB=0
if [ -n "$SWAP_REF_DEV" ]; then
  SWAP_BYTES=$(lsblk -b -n -o SIZE "$SWAP_REF_DEV")
  SWAP_MB=$(( SWAP_BYTES / 1024 / 1024 ))
  log "Reference SWAP size: ${SWAP_MB}MiB"
fi

###############################################
# Partition the new disk
###############################################
log "Partitioning $NEW_DISK."
sgdisk -Z "$NEW_DISK"
sgdisk -og "$NEW_DISK"
sgdisk -n 1::+2048M -t 1:ef00 -c 1:"EFI" "$NEW_DISK"

if [ "$SWAP_MB" -gt 0 ] && [ -n "$SWAP_LABEL_NEW" ]; then
  sgdisk -n 2::+${SWAP_MB}M -t 2:8200 -c 2:"$SWAP_LABEL_NEW" "$NEW_DISK"
fi

sgdisk -n 3:: -t 3:8300 -c 3:"BTRFS_RAID" "$NEW_DISK"
mkfs.fat -F32 "${NEW_DISK}1"

if [ "$SWAP_MB" -gt 0 ] && [ -n "$SWAP_LABEL_NEW" ]; then
  NEW_SWAP_DEV=$(blkid -o device -t PARTLABEL="$SWAP_LABEL_NEW")
  mkswap -L "$SWAP_LABEL_NEW" "$NEW_SWAP_DEV"
  log "New SWAP recreated: $NEW_SWAP_DEV ($SWAP_LABEL_NEW)"
fi

###############################################
# Add to Btrfs RAID and rebalance
###############################################
NEW_BTRFS_DEV=$(lsblk -rpno NAME,PARTLABEL | awk -v D="$NEW_DISK" '$1 ~ "^"D && $2=="BTRFS_RAID"{print $1}')

log "Adding BTRFS device."
btrfs device add "$NEW_BTRFS_DEV" /

log "Starting RAID1 rebalance."
btrfs balance start -dconvert=raid1 -mconvert=raid1 /
while btrfs balance status / | grep -q running; do
  sleep 10
done
log "Rebalance complete"

log "Removing missing device."
btrfs device remove missing /

###############################################
# Reinstall GRUB + EFI sync + validation scrub
###############################################
log "Reinstalling GRUB."
grub-install --target=x86_64-efi \
  --efi-directory=/boot/efi \
  --bootloader-id=debian-disk1 \
  --removable --recheck || true

grub-install --target=x86_64-efi \
  --efi-directory=/boot/efi2 \
  --bootloader-id=debian-disk2 \
  --removable --recheck || true

log "Syncing EFI."
/usr/local/sbin/efi-sync.sh || true

log "Starting a validation scrub after replacement."
if btrfs scrub start -B -R /; then
  log "Validation scrub completed successfully"
else
  log "WARNING: post-replace scrub returned an error code"
fi

log "===== AUTO REPLACE SUCCESS ====="
exit 0
EOF
chmod +x /usr/local/sbin/auto-replace-btrfs-disk.sh

cat > /etc/systemd/system/btrfs-auto-replace.service <<'EOF'
[Unit]
Description=Auto BTRFS RAID1 Disk Replacement
After=local-fs.target
DefaultDependencies=no
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/auto-replace-btrfs-disk.sh
[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable btrfs-auto-replace.service

3.8.7 Install efi-ha-autosync.sh + APT hook


Code:
cat > /usr/local/sbin/efi-ha-autosync.sh <<'EOF'
#!/bin/bash
#
# EFI HA AUTOSYNC — called automatically after APT/Dpkg
#
set -euo pipefail
LOGFILE="/var/log/efi-ha-autosync.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
PRIMARY_EFI="/boot/efi"
SECONDARY_EFI="/boot/efi2"
log(){ echo "[$DATE] $1" | tee -a "$LOGFILE"; }
log "===== EFI HA AUTOSYNC START ====="
if ! mountpoint -q "$PRIMARY_EFI"; then
    log "Primary EFI not mounted ($PRIMARY_EFI) → nothing to do"
    log "===== EFI HA AUTOSYNC END (no primary) ====="
    exit 0
fi
if ! mountpoint -q "$SECONDARY_EFI"; then
    log "Secondary EFI not mounted ($SECONDARY_EFI) → sync not possible"
    log "===== EFI HA AUTOSYNC END (no secondary) ====="
    exit 0
fi
log "Reinstalling GRUB on EFI1..."
grub-install --target=x86_64-efi \
  --efi-directory="$PRIMARY_EFI" \
  --bootloader-id=debian-disk1 \
  --removable --recheck >> "$LOGFILE" 2>&1 || log "WARNING: grub-install EFI1 returned an error code"
log "Reinstalling GRUB on EFI2..."
grub-install --target=x86_64-efi \
  --efi-directory="$SECONDARY_EFI" \
  --bootloader-id=debian-disk2 \
  --removable --recheck >> "$LOGFILE" 2>&1 || log "WARNING: grub-install EFI2 returned an error code"
if [ -x /usr/local/sbin/efi-sync.sh ]; then
  log "Calling efi-sync.sh..."
  /usr/local/sbin/efi-sync.sh >> "$LOGFILE" 2>&1 || log "WARNING: efi-sync.sh returned an error code"
else
  log "WARNING: /usr/local/sbin/efi-sync.sh not found or not executable"
fi
log "===== EFI HA AUTOSYNC END ====="
exit 0
EOF
chmod +x /usr/local/sbin/efi-ha-autosync.sh

cat > /etc/apt/apt.conf.d/99-efi-ha-autosync <<'EOF'
// APT hook: after each successful dpkg operation,
// run /usr/local/sbin/efi-ha-autosync.sh if present.
DPkg::Post-Invoke-Success {
  "if [ -x /usr/local/sbin/efi-ha-autosync.sh ]; then /usr/local/sbin/efi-ha-autosync.sh || true; fi";
};
EOF

3.8.8 SSH


Code:
apt update
apt install -y openssh-server
systemctl enable ssh
systemctl restart ssh || true
 
3.9 (S3.1) Proxmox kernel + persistent Proxmox subvolumes (BEFORE proxmox-ve)

3.9.1 Pre-configure GRUB


Code:
sudo su

echo 'grub-efi-amd64 grub2/force_efi_extra_removable boolean false' | debconf-set-selections -v -u

3.9.2 Add Proxmox VE repository + key


Code:
cat > /etc/apt/sources.list.d/pve-install-repo.sources << 'EOF'
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg
EOF

mkdir -p /usr/share/keyrings
wget -q https://enterprise.proxmox.com/debian/proxmox-archive-keyring-trixie.gpg -O /usr/share/keyrings/proxmox-archive-keyring.gpg

apt update

3.9.3 Install the Proxmox kernel


Code:
apt install -y proxmox-default-kernel

3.9.4 Create Proxmox subvolumes and add them to fstab


Code:
# Source of truth: fixed Btrfs LABEL
FS_UUID="$(blkid -o value -s UUID -t LABEL="prox_raid1" | sort -u)"
[ -n "$FS_UUID" ] || { echo "ERROR: Btrfs FS not found (LABEL=prox_raid1)"; exit 1; }

TOP="/mnt/btrfs-top"
mkdir -p "$TOP"
mount -L prox_raid1 -o subvolid=5 "$TOP"

ensure_sv() {
  local sv="$1" mp="$2" opts="$3"
  if ! btrfs subvolume show "$TOP/$sv" >/dev/null 2>&1; then
    btrfs subvolume create "$TOP/$sv"
  fi
  mkdir -p "$mp"
  if ! grep -qE "^[[:space:]]*UUID=$FS_UUID[[:space:]]+$mp[[:space:]]+btrfs[[:space:]]+.*subvol=$sv" /etc/fstab; then
    echo "UUID=$FS_UUID $mp btrfs $opts,subvol=$sv 0 0" >> /etc/fstab
  fi
  mountpoint -q "$mp" || mount "$mp"
}

PVE_OPTS="noatime,compress=zstd:1,space_cache=v2"
ensure_sv "@var_lib_vz"          "/var/lib/vz"          "$PVE_OPTS"
ensure_sv "@var_lib_pve_cluster" "/var/lib/pve-cluster" "$PVE_OPTS"
ensure_sv "@var_lib_corosync"    "/var/lib/corosync"    "$PVE_OPTS"
ensure_sv "@var_lib_pve_manager" "/var/lib/pve-manager" "$PVE_OPTS"

umount "$TOP"
rmdir "$TOP" 2>/dev/null || true

3.9.5 Reboot


Code:
reboot


3.10 (S3.2) Install Proxmox VE


Code:
sudo su

export DEBIAN_FRONTEND=noninteractive
echo "postfix postfix/main_mailer_type select No configuration" | debconf-set-selections

apt update
apt install -y proxmox-ve postfix open-iscsi chrony

# Clean up generic Debian kernels
apt remove -y linux-image-amd64 || true
apt remove -y 'linux-image-6.*' || true

update-grub
reboot


3.11 (S4) Graphical interface (optional) + Display Manager subvolume

Goal: avoid crashes on snapshot boot (GDM/SDDM/LightDM must write to /var/lib/*).

3.11.1 Create the display manager subvolume + fstab entry


Code:
sudo su

# Source of truth: fixed Btrfs LABEL
FS_UUID="$(blkid -o value -s UUID -t LABEL="prox_raid1" | sort -u)"
[ -n "$FS_UUID" ] || { echo "ERROR: Btrfs FS not found (LABEL=prox_raid1)"; exit 1; }

TOP="/mnt/btrfs-top"
mkdir -p "$TOP"
mount -L prox_raid1 -o subvolid=5 "$TOP"

echo "Graphical interface choice:"
echo "  1) GNOME (gdm3)        -> /var/lib/gdm3     (subvol @gdm3)"
echo "  2) KDE Plasma (sddm)   -> /var/lib/sddm     (subvol @sddm)"
echo "  3) XFCE (lightdm)      -> /var/lib/lightdm  (subvol @lightdm)"
echo "  4) LMDE7 Cinnamon      -> /var/lib/lightdm  (subvol @lightdm)"
echo "  5) COSMIC              -> (managed by cosmic installer)"
read -rp "Choice: " GUI_CHOICE

case "$GUI_CHOICE" in
  1) DM_SV="@gdm3";    DM_MP="/var/lib/gdm3";    PKGS="task-gnome-desktop gdm3" ;;
  2) DM_SV="@sddm";    DM_MP="/var/lib/sddm";    PKGS="task-kde-desktop sddm" ;;
  3) DM_SV="@lightdm"; DM_MP="/var/lib/lightdm"; PKGS="task-xfce-desktop lightdm" ;;
  4) DM_SV="@lightdm"; DM_MP="/var/lib/lightdm"; PKGS="" ;;   # install LMDE7: dedicated block below
  5) DM_SV=""; DM_MP=""; PKGS="" ;;                             # COSMIC: dedicated block below
  *) echo "Cancelled"; umount "$TOP"; exit 0 ;;
esac

GUI_OPTS="noatime,compress=zstd:1,space_cache=v2"

if [ -n "${DM_SV}" ]; then
  if ! btrfs subvolume show "$TOP/$DM_SV" >/dev/null 2>&1; then
    btrfs subvolume create "$TOP/$DM_SV"
  fi
  mkdir -p "$DM_MP"
  if ! grep -qE "^[[:space:]]*UUID=$FS_UUID[[:space:]]+$DM_MP[[:space:]]+btrfs[[:space:]]+.*subvol=$DM_SV" /etc/fstab; then
    echo "UUID=$FS_UUID $DM_MP btrfs $GUI_OPTS,subvol=$DM_SV 0 0" >> /etc/fstab
  fi
  mountpoint -q "$DM_MP" || mount "$DM_MP"
fi

3.11.2 Install the interface

GNOME/KDE/XFCE (choices 1-3):


Code:
apt update
apt install -y $PKGS
apt install -y -f
systemctl set-default graphical.target
reboot

LMDE7 Cinnamon (choice 4):


Code:
# Run your usual LMDE7 Cinnamon procedure.
# Puis :
apt install -y -f
systemctl set-default graphical.target
reboot

COSMIC (choice 5):


Code:
apt update
apt install -y curl git
curl -fsSL https://raw.githubusercontent.com/pop-os/cosmic-debian-installer/main/install.sh | bash
apt install -y -f
systemctl set-default graphical.target
reboot


3.12 (S5) Snapper + grub-btrfs (bootable snapshots)

Goal:
  • Snapshots managed by Snapper (root + optional home)
  • GRUB "snapshots" menu via grub-btrfs
  • Essential Proxmox + Btrfs RAID1 fixes:
  • grub-probe returns multiple devices → keep the 1st line only
  • remove defaults from flags read from fstab (avoids broken rootflags=)
  • ensure grub-btrfs.cfg is included in GRUB via a loader (if needed)
  • avoid duplicate menus (executable backups in /etc/grub.d / multiple includes)


3.12.1 Required packages (Snapper + grub-btrfs build)

Code:
sudo su

apt-get update -y
apt-get install -y snapper snapper-gui btrfs-assistant btrfs-progs \
  git build-essential make inotify-tools gawk


3.12.2 Snapper: create root and home configurations (if /home exists)

V14: /.snapshots is created and managed by Snapper.
Do not manually create a @snapshots subvolume and do not change /.snapshots permissions.


Code:
if snapper -c root get-config >/dev/null 2>&1; then
  echo "Snapper root: already configured"
else
  snapper -c root create-config /
fi

if [ -d /home ]; then
  if snapper -c home get-config >/dev/null 2>&1; then
    echo "Snapper home: already configured"
  else
    snapper -c home create-config /home
  fi
fi


3.12.3 Snapper ACL (optional): allow a user or the sudo group


Code:
echo "Snapper ACL configuration (non-root access to snapshots):"
echo "  1) Allow a specific user (root + home if present)"
echo "  2) Allow the sudo group (root + home if present)"
echo "  0) Do not modify Snapper ACL configuration"
read -rp "Choice: " ACL_CHOICE

apply_allow_users() {
  local cfg="$1" val="$2" conf="/etc/snapper/configs/${cfg}"
  [ -f "$conf" ] || return 0
  if grep -q '^ALLOW_USERS=' "$conf"; then
    sed -i "s|^ALLOW_USERS=.*|ALLOW_USERS=\"${val}\"|" "$conf"
  else
    echo "ALLOW_USERS=\"${val}\"" >> "$conf"
  fi
  if grep -q '^SYNC_ACL=' "$conf"; then
    sed -i 's|^SYNC_ACL=.*|SYNC_ACL="yes"|' "$conf"
  else
    echo 'SYNC_ACL="yes"' >> "$conf"
  fi
}

apply_allow_groups() {
  local cfg="$1" val="$2" conf="/etc/snapper/configs/${cfg}"
  [ -f "$conf" ] || return 0
  if grep -q '^ALLOW_GROUPS=' "$conf"; then
    sed -i "s|^ALLOW_GROUPS=.*|ALLOW_GROUPS=\"${val}\"|" "$conf"
  else
    echo "ALLOW_GROUPS=\"${val}\"" >> "$conf"
  fi
  if grep -q '^SYNC_ACL=' "$conf"; then
    sed -i 's|^SYNC_ACL=.*|SYNC_ACL="yes"|' "$conf"
  else
    echo 'SYNC_ACL="yes"' >> "$conf"
  fi
}

case "$ACL_CHOICE" in
  1)
    read -rp "User to allow: " SNAP_USER
    if id "$SNAP_USER" >/dev/null 2>&1; then
      apply_allow_users root "$SNAP_USER"
      [ -f /etc/snapper/configs/home ] && apply_allow_users home "$SNAP_USER"
    else
      echo "User does not exist, no changes made."
    fi
    ;;
  2)
    apply_allow_groups root "sudo"
    [ -f /etc/snapper/configs/home ] && apply_allow_groups home "sudo"
    ;;
  *)
    echo "ACL unchanged."
    ;;
esac


3.12.4 Snapper timers (root) + disable timeline for /home


Code:
systemctl enable --now snapper-timeline.timer snapper-cleanup.timer

if [ -f /etc/snapper/configs/home ]; then
  sed -i 's/^TIMELINE_CREATE=.*/TIMELINE_CREATE="no"/' /etc/snapper/configs/home
  systemctl restart snapper-timeline.timer || true
fi


3.12.5 Install grub-btrfs from GitHub + rd.live.overlay.overlayfs=1 config


Code:
SRC_DIR="/opt/grub-btrfs"

if [ -d "$SRC_DIR/.git" ]; then
  git -C "$SRC_DIR" pull --ff-only
else
  git clone https://github.com/Antynea/grub-btrfs.git "$SRC_DIR"
fi

cd "$SRC_DIR"

# Required in this context (grub-btrfs.path not used)
if ! grep -q '^GRUB_BTRFS_SNAPSHOT_KERNEL_PARAMETERS="rd\.live\.overlay\.overlayfs=1"' config; then
  sed -i.bkp \
    '/^#GRUB_BTRFS_SNAPSHOT_KERNEL_PARAMETERS=/a \
GRUB_BTRFS_SNAPSHOT_KERNEL_PARAMETERS="rd.live.overlay.overlayfs=1"' \
    config
fi


3.12.6 Proxmox + Btrfs RAID1 patch (sources AND installed script)

Applies:
- root_device=$(… | head -n1)
- boot_device=$(… | head -n1) (broad match, not /boot hardcoded)
- removal of defaults from fstabflags if present


Code:
patch_41_file() {
  local f="$1"
  [ -f "$f" ] || return 0

  mkdir -p /var/backups/grub-btrfs
  local bkp="/var/backups/grub-btrfs/$(basename "$f").bak.$(date +%F-%H%M%S)"
  cp -a "$f" "$bkp"
  chmod 0644 "$bkp" || true

  # root_device head -n1
  if ! grep -q 'root_device=$(printf "%s\n" "$root_device" | head -n1)' "$f"; then
    awk '
      {print}
      !r && $0 ~ /^[[:space:]]*root_device=.*--target=device/ && $0 ~ /[[:space:]]\// {
        print "root_device=$(printf \"%s\\n\" \"$root_device\" | head -n1)";
        r=1
      }
    ' "$f" > "$f.new" && mv "$f.new" "$f"
  fi

  # boot_device head -n1 (match large)
  if ! grep -q 'boot_device=$(printf "%s\n" "$boot_device" | head -n1)' "$f"; then
    awk '
      {print}
      !b && $0 ~ /^[[:space:]]*boot_device=.*--target=device/ {
        print "boot_device=$(printf \"%s\\n\" \"$boot_device\" | head -n1)";
        b=1
      }
    ' "$f" > "$f.new" && mv "$f.new" "$f"
  fi

  # remove "defaults" from fstabflags (if present)
  if grep -q 'fstabflags=' "$f" && ! grep -q 'PVE/Btrfs: remove "defaults"' "$f"; then
    awk '
      {print}
      !d && $0 ~ /^[[:space:]]*fstabflags=/ {
        print "    # PVE/Btrfs: remove \"defaults\" which breaks rootflags= in some initramfs";
        print "    fstabflags=\"${fstabflags//defaults,/}\"";
        print "    fstabflags=\"${fstabflags//,defaults/}\"";
        print "    fstabflags=\"${fstabflags//defaults/}\"";
        d=1
      }
    ' "$f" > "$f.new" && mv "$f.new" "$f"
  fi

  chmod +x "$f" || true
}

# Patch des sources AVANT install
patch_41_file "$SRC_DIR/41_snapshots-btrfs"


3.12.7 Compile / install grub-btrfs (with fallback)


Code:
cd /opt/grub-btrfs

set +e
make install
RC=$?
set -e

if [ "$RC" -ne 0 ]; then
  echo "WARNING: make install failed (rc=$RC). Falling back without update-grub…"
  set +e
  make GRUB_UPDATE_EXCLUDE=true install
  RC2=$?
  set -e
  [ "$RC2" -eq 0 ] || { echo "ERROR: grub-btrfs installation failed (rc=$RC, rc2=$RC2)"; exit 1; }
fi

# Patch the installed script
patch_41_file /etc/grub.d/41_snapshots-btrfs


3.12.8 Avoid duplicate menus (executable backups / custom / loader)


Code:
# 1) executable backups: make them non-executable (grub-mkconfig would run them otherwise)
chmod -x /etc/grub.d/41_snapshots-btrfs.bak* 2>/dev/null || true

# 2) remove any old loader
rm -f /etc/grub.d/09_grub-btrfs-loader 2>/dev/null || true

# 3) remove grub-btrfs inclusions from 40_custom if they exist
if [ -f /etc/grub.d/40_custom ] && grep -q "grub-btrfs.cfg" /etc/grub.d/40_custom; then
  cp -a /etc/grub.d/40_custom /etc/grub.d/40_custom.bak.$(date +%F-%H%M%S)
  grep -v "grub-btrfs.cfg" /etc/grub.d/40_custom > /etc/grub.d/40_custom.new || true
  mv /etc/grub.d/40_custom.new /etc/grub.d/40_custom
  chmod +x /etc/grub.d/40_custom 2>/dev/null || true
fi


3.12.9 Generate GRUB + create Proxmox loader if needed


Code:
update-grub

# If grub.cfg does not reference grub-btrfs.cfg, install a dedicated loader
if ! grep -q "grub-btrfs.cfg" /boot/grub/grub.cfg; then
  LOADER="/etc/grub.d/09_grub-btrfs-loader"
  cat > "$LOADER" <<'EOF'
#!/bin/sh
set -e
SUBMENUNAME="${GRUB_BTRFS_SUBMENUNAME:-Debian GNU/Linux snapshots}"
cat <<EOG
submenu "$SUBMENUNAME" {
  if [ -f ${prefix}/grub-btrfs.cfg ]; then
    source ${prefix}/grub-btrfs.cfg
  fi
}
EOG
EOF
  chmod +x "$LOADER"
  update-grub
fi


3.12.10 Create a first snapshot + verify the menu

Without a snapshot, grub-btrfs.cfg may be empty → this is normal.


Code:
snapper -c root create -d "boot"
update-grub

test -s /boot/grub/grub-btrfs.cfg && echo "OK: grub-btrfs.cfg non vide" || echo "INFO: grub-btrfs.cfg vide (pas de snapshot ?)"

grep -n "grub-btrfs.cfg" /boot/grub/grub.cfg | head -20 || true


3.13 (S6) Optional firmware (manual installation)

Proxmox provides pve-firmware, but some firmware (e.g. AMD iGPU) may require manual addition.
This section does not run apt install firmware-*: we retrieve .deb files from Debian, extract and copy to /lib/firmware.

Debian Trixie directory (check for the latest available version):


Code:
https://ftp.debian.org/debian/pool/non-free-firmware/f/firmware-nonfree/


3.13.1 Required tools + helper to detect the latest version


Code:
sudo su

apt-get update -y
apt-get install -y wget dpkg

BASE_URL="https://ftp.debian.org/debian/pool/non-free-firmware/f/firmware-nonfree"

get_latest_deb() {
  # ex: get_latest_deb "amd-graphics" -> firmware-amd-graphics_XXXX_all.deb
  local pattern="$1"
  wget -qO- "$BASE_URL/" | \
    grep -o "firmware-${pattern}_[^\"']*\.deb" | \
    sort -V | tail -n1
}

install_fw_deb() {
  local pattern="$1"
  local debname
  debname="$(get_latest_deb "$pattern")"
  [ -n "$debname" ] || { echo "ERROR: no firmware-${pattern}_*.deb found"; return 1; }

  echo "Downloading: $BASE_URL/$debname"
  wget -q "$BASE_URL/$debname"

  rm -rf extracted
  mkdir -p extracted
  dpkg -x "$debname" extracted

  if [ ! -d extracted/lib/firmware ]; then
    echo "WARNING: extracted/lib/firmware not found (unexpected structure)."
  fi

  cp -r extracted/lib/firmware/* /lib/firmware/ || true

  rm -rf extracted "$debname"
  echo "OK: firmware ${pattern} copied to /lib/firmware"
}


3.13.2 Choose and install

AMDGPU (AMD iGPU / dGPU):

Code:
install_fw_deb "amd-graphics"

Intel Wi‑Fi / Bluetooth:

Code:
install_fw_deb "iwlwifi"

Realtek Wi‑Fi / Bluetooth:

Code:
install_fw_deb "realtek"

Other (Mediatek / Broadcom / misc):

Code:
install_fw_deb "misc-nonfree"


3.13.3 Regenerate initramfs + reboot (if at least one firmware was installed)


Code:
update-initramfs -u -k all
reboot
 
3.14 (S7) vmbr0 NAT + DHCP + /etc/hosts + interface cleanup (manual installation)

Goal:
- vmbr0 without physical interface (NAT) for VM/LXC
- DHCP on vmbr0 via dnsmasq
- Automatic NAT toward the WAN interface (dynamic detection)
- /etc/network/interfaces must contain only lo + vmbr*
- /etc/hosts: replace only the IP of the line containing HOST_FQDN HOST_SHORT with the vmbr0 IP


3.14.1 Required packages


Code:
sudo su

apt update
apt install -y ipcalc dnsmasq


3.14.2 Enter vmbr0 parameters (IP/CIDR, DHCP, DNS)


Code:
echo "Example vmbr0: 192.168.200.254/24"
read -rp "IP/CIDR address for vmbr0: " VMBR_ADDR

VMBR_IP="${VMBR_ADDR%/*}"
NAT_NETWORK=$(ipcalc -n "$VMBR_ADDR" | awk '/Network/ {print $2}')
[ -n "$NAT_NETWORK" ] || { echo "ERROR: ipcalc failed"; exit 1; }

DHCP_GW="$VMBR_IP"

echo "DHCP range (within $NAT_NETWORK)"
read -rp "DHCP start (e.g.: 192.168.200.10): " DHCP_START
read -rp "DHCP end   (e.g.: 192.168.200.200): " DHCP_END

read -rp "DHCP DNS [1.1.1.1]: " IN_DNS
DHCP_DNS="${IN_DNS:-1.1.1.1}"

echo "NAT_NETWORK=$NAT_NETWORK"


3.14.3 Write /etc/network/interfaces (vmbr0) + clean (keep only lo + vmbr*)


Code:
IFACES="/etc/network/interfaces"
cp -a "$IFACES" "${IFACES}.s7-manual.bak.$(date +%Y%m%d%H%M%S)"

# remove old vmbr0 block if present
if grep -q "^auto vmbr0" "$IFACES" 2>/dev/null; then
  awk '
    BEGIN{skip=0}
    /^auto vmbr0/{skip=1}
    skip==1 && /^auto / && $2!="vmbr0"{skip=0}
    skip==0{print}
  ' "$IFACES" > "${IFACES}.tmp" && mv "${IFACES}.tmp" "$IFACES"
fi

# add vmbr0
cat >> "$IFACES" <<EOF

auto vmbr0
iface vmbr0 inet static
    address $VMBR_ADDR
    bridge-ports none
    bridge-stp off
    bridge-fd 0
EOF

# cleanup: keep only lo + vmbr*
awk '
function keep_name(n){ return (n=="lo" || n ~ /^vmbr[0-9]+$/) }
function emit_auto(kind,    i,n,out){
  out=kind;
  for(i=2;i<=NF;i++){
    n=$i;
    if(keep_name(n)) out=out" "n;
  }
  if(out!=kind) print out;
}
BEGIN{keep=1}
/^[[:space:]]*source(-directory)?[[:space:]]+/ {print; next}
/^[[:space:]]*#/ || /^[[:space:]]*$/ {print; next}
/^[[:space:]]*(auto|allow-hotplug)[[:space:]]+/ { kind=$1; emit_auto(kind); next }
/^[[:space:]]*mapping[[:space:]]+/ { name=$2; keep=keep_name(name); if(keep) print; next }
/^[[:space:]]*iface[[:space:]]+/ { name=$2; keep=keep_name(name); if(keep) print; next }
/^[[:space:]]+/ { if(keep) print; next }
{ print }
' "$IFACES" > "${IFACES}.tmp" && mv "${IFACES}.tmp" "$IFACES"

echo "OK: /etc/network/interfaces updated"


3.14.4 /etc/hosts: replace only the IP of the hostname line


Code:
VMBR_IP_HOSTS="${VMBR_ADDR%/*}"
HOST_SHORT="$(hostname -s 2>/dev/null || hostname)"
HOST_FQDN="$(hostname -f 2>/dev/null || true)"
[ -z "${HOST_FQDN:-}" ] && HOST_FQDN="$HOST_SHORT"

HOSTS="/etc/hosts"
cp -a "$HOSTS" "${HOSTS}.s7-manual.bak.$(date +%Y%m%d%H%M%S)"

NEWIP="$VMBR_IP_HOSTS" H1="$HOST_FQDN" H2="$HOST_SHORT" \
awk '
BEGIN{updated=0}
{
  # si la ligne contient "H1 H2" (dans cet ordre) on remplace uniquement le 1er champ (IP)
  if ($0 ~ ("[[:space:]]" ENVIRON["H1"] "[[:space:]]+" ENVIRON["H2"] "([[:space:]]|$)") ) {
    $1 = ENVIRON["NEWIP"];
    updated=1;
  }
  print
}
END{
  if(updated==0){
    exit 1
  }
}
' "$HOSTS" > "${HOSTS}.tmp" && mv "${HOSTS}.tmp" "$HOSTS" || {
  echo "ERROR: no hostname line found in /etc/hosts."
  echo "Fix /etc/hosts manually: ${VMBR_IP_HOSTS} ${HOST_FQDN} ${HOST_SHORT}"
  exit 1
}

echo "OK: /etc/hosts updated (IP replaced)"


3.14.5 dnsmasq: DHCP on vmbr0


Code:
mkdir -p /etc/dnsmasq.d
DNSMASQ_CONF="/etc/dnsmasq.d/vmbr0.conf"

cat > "$DNSMASQ_CONF" <<EOF
# DHCP for vmbr0 (NAT $NAT_NETWORK)
interface=vmbr0
bind-interfaces
dhcp-range=$DHCP_START,$DHCP_END,12h
dhcp-option=3,$DHCP_GW
dhcp-option=6,$DHCP_DNS
EOF

systemctl daemon-reload || true
systemctl enable --now dnsmasq.service
systemctl restart dnsmasq.service


3.14.6 Automatic NAT script + systemd service + udev rule


Code:
AUTO_NAT="/usr/local/bin/auto-nat.sh"

cat > "$AUTO_NAT" <<EOF
#!/bin/bash
set -e

VMBR_ADDR="$VMBR_ADDR"
NAT_NETWORK="$NAT_NETWORK"

# enable ip_forward (runtime + persistent)
sysctl -w net.ipv4.ip_forward=1 >/dev/null
if ! grep -q '^net.ipv4.ip_forward=1' /etc/sysctl.conf 2>/dev/null; then
  echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
fi

# WAN detection (en* preferred, otherwise wl*)
WAN=\$(ip -o link show | awk -F': ' '{print \$2}' | egrep '^(en|eth|eno|enp|ens)' | head -n1)
if [ -z "\$WAN" ]; then
  WAN=\$(ip -o link show | awk -F': ' '{print \$2}' | egrep '^wl' | head -n1)
fi
[ -n "\$WAN" ] || { echo "ERROR: WAN not detected"; exit 1; }

# clean up old MASQUERADE rules for this network
while iptables -t nat -C POSTROUTING -s "\$NAT_NETWORK" -j MASQUERADE 2>/dev/null; do
  iptables -t nat -D POSTROUTING -s "\$NAT_NETWORK" -j MASQUERADE
done

iptables -t nat -A POSTROUTING -s "\$NAT_NETWORK" -o "\$WAN" -j MASQUERADE
echo "NAT applied: \$NAT_NETWORK -> \$WAN"
EOF

chmod +x "$AUTO_NAT"

cat > /etc/systemd/system/auto-nat.service <<EOF
[Unit]
Description=Automatic NAT for vmbr0 with dynamic WAN detection
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=$AUTO_NAT
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

cat > /etc/udev/rules.d/99-auto-nat.rules <<EOF
ACTION=="add|remove", SUBSYSTEM=="net", RUN+="$AUTO_NAT"
EOF

udevadm control --reload-rules

systemctl daemon-reload
systemctl enable --now auto-nat.service

# reload networking (tolerant)
systemctl restart networking || true


3.14.7 Recommended tests


Code:
cat /proc/sys/net/ipv4/ip_forward
iptables -t nat -vnL POSTROUTING


3.15 (S8) Boot optimization: reduce NetworkManager-wait-online to 5 seconds (manual installation)

Goal: avoid 60–90s of waiting at boot.

3.15.1 Verify the unit exists


Code:
sudo su

UNIT="NetworkManager-wait-online.service"
systemctl list-unit-files "$UNIT" --no-legend || true
systemctl list-unit-files | grep -i 'wait-online' || true


3.15.2 Create a systemd drop-in (override ExecStart)


Code:
UNIT="NetworkManager-wait-online.service"
DROPIN_DIR="/etc/systemd/system/${UNIT}.d"
OVERRIDE="${DROPIN_DIR}/override.conf"
mkdir -p "$DROPIN_DIR"

[ -f "$OVERRIDE" ] && cp -a "$OVERRIDE" "${OVERRIDE}.bak.$(date +%Y%m%d%H%M%S)" || true

cat > "$OVERRIDE" <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/bin/nm-online -s -q --timeout=5
EOF

systemctl daemon-reload
systemctl restart "$UNIT" 2>/dev/null || true

systemctl show "$UNIT" -p ExecStart --no-pager | sed 's/^/  /'



4. Operations (diagnostics, rollback, disk replacement)

4.1 Proxmox + Btrfs checks


Code:
pveversion -v
pvesm status

btrfs filesystem show /
btrfs filesystem df /
btrfs device stats /

Manual scrub:


Code:
btrfs scrub start -B -R /

4.2 EFI HA


Code:
efibootmgr -v
/usr/local/sbin/efi-sync.sh
tail -n 200 /var/log/efi-sync.log

4.3 SWAP failover


Code:
systemctl status swap-failover.service --no-pager -l
swapon --show
cat /run/swap-failover.log

4.4 Automatic Btrfs RAID1 disk replacement


Code:
systemctl start btrfs-auto-replace.service
tail -n 200 /var/log/btrfs-disk-replace.log

4.5 Rollback via snapshots (Snapper + GRUB)

Create a manual snapshot, then update GRUB:


Code:
snapper -c root create -d "Manual snapshot"
update-grub

At boot: "Debian GNU/Linux snapshots" menu → select the snapshot.
 
5. Automated installation (scripts)

5.1 Recommended workflow (S1 + S2 from Live, then 1st reboot)


The idea is simple: S1 installs the system in /mnt and S2 adds the HA layer in the same /mnt.
The first reboot happens after S2.

1) Boot on a Debian Live
2) Copy the scripts to the Live (USB key, wget, scp…), then:


Code:
sudo su
chmod +x S*.sh

./S1-14V1.sh
# DO NOT reboot yet
./S2-14V1.sh

reboot

3) After booting on the installed Debian:


Code:
sudo su
chmod +x S*.sh

./S3.1-14V2.sh   # Proxmox kernel + PVE subvolumes → reboot requested by the script
# reboot

./S3.2-14V0.sh   # proxmox-ve installation → reboot
# reboot

./S4-14V1.sh     # optional (GUI) → reboot if installation was done
# reboot if necessary

./S5-14V7.4.sh   # Snapper + grub-btrfs
./S6-14V1.sh     # optional (firmware)
./S7-14V6.sh     # vmbr0 NAT/DHCP + /etc/hosts + interface cleanup
./S8-14V2.sh     # boot optimization (wait-online=5s)


5.2 Alternative (if you already rebooted after S1)

If you already booted into the installed Debian after S1, you can still run S2 "post-install":


Code:
sudo su
chmod +x S2-14V1.sh
./S2-14V1.sh
reboot

Then continue with S3.1 → S8 as above.

6. Appendices: complete script code V14

I can't include the scripts; I'm limited by the number of characters. I'm attaching two zip files: one with the original scripts and the other with the English version translated by AI.
 

Attachments

Last edited: