Backup rpool/ROOT/pve-1 for Disaster Recovery strategy

gabrimox · Aug 6, 2025

Hi all,
i'm a new proxmox user and i'm tryng to get better strategy to backup root pve partitition.
I've installed proxmox into a ssd, also vm/lxc data config are stored here but external ssd with passthourgh are used for data storage.

I like a feedback from your for best DR strategy in case of disk failure (raid is used only for data storage, not for root)
My combo is:
rsync /etc --- > external fat32 disk (slow recovery)
zfs send/rcv --> external disk (fast recovery.... so install proxmox on new disk and then restore the pool to mount on /)

external disks for both are attached to truenas vm inside proxmox and shared via NFS

i asked to chatgpt a script for second point with following requiremet:

You want to backup rpool/ROOT/pve-1 recursively
You want to avoid mounting the replicated datasets (to prevent conflicts like a second / mount)
You still want daily backups at 3 AM, retention of 10 days, and basic verification
Snapshot exists on destination
Dataset is not mounted (as intended)
zpool status of backup pool is healthy
zfs scrub is recent or run manually after backup
Optionally: compare checksums of source vs backup (if file-level check needed — optional)

Code:

#!/bin/bash

set -e

# Variables
SRC="rpool/ROOT/pve-1"
DST="nfsbackuppool/backups/pve-1"
SNAP_PREFIX="autosnap"
DATE=$(date +%F)
SNAP_NAME="${SNAP_PREFIX}-${DATE}"
LOG_FILE="/var/log/zfs_nfs_backup.log"
BACKUP_POOL="nfsbackuppool"

log() {
  echo "$(date '+%F %T') $*" | tee -a "$LOG_FILE"
}

# 1. Create recursive snapshot
log "[INFO] Creating snapshot: ${SRC}@${SNAP_NAME}"
zfs snapshot -r "${SRC}@${SNAP_NAME}"

# 2. Send snapshot recursively, unmounted
log "[INFO] Sending snapshot recursively to ${DST}"
zfs send -R "${SRC}@${SNAP_NAME}" | zfs receive -uF "$DST"

# 3. Prune old snapshots (source)
log "[INFO] Pruning old snapshots from source"
zfs list -H -t snapshot -o name -s creation | \
  grep "^${SRC}@" | grep "${SNAP_PREFIX}-" | \
  head -n -10 | xargs -r -n1 zfs destroy

# 4. Prune old snapshots (destination)
log "[INFO] Pruning old snapshots from destination"
zfs list -H -t snapshot -o name -s creation | \
  grep "^${DST}@" | grep "${SNAP_PREFIX}-" | \
  head -n -10 | xargs -r -n1 zfs destroy

# 5. Verify snapshot exists on destination
log "[INFO] Verifying snapshot on destination"
if zfs list -t snapshot "${DST}@${SNAP_NAME}" >/dev/null 2>&1; then
  log "[OK] Snapshot exists: ${DST}@${SNAP_NAME}"
else
  log "[ERROR] Snapshot not found: ${DST}@${SNAP_NAME}"
  exit 1
fi

# 6. Check that backup dataset is not mounted
MOUNTPOINT=$(zfs get -H -o value mounted "${DST}")
if [ "$MOUNTPOINT" = "no" ]; then
  log "[OK] Dataset is not mounted (as expected)"
else
  log "[WARNING] Dataset is mounted unexpectedly"
fi

# 7. Check zpool health
log "[INFO] Checking backup pool status"
POOL_STATUS=$(zpool status -x "$BACKUP_POOL")
if [[ "$POOL_STATUS" == "all pools are healthy" || "$POOL_STATUS" == "$BACKUP_POOL is healthy" ]]; then
  log "[OK] Backup zpool '$BACKUP_POOL' is healthy"
else
  log "[ERROR] zpool '$BACKUP_POOL' is not healthy"
  zpool status "$BACKUP_POOL" >> "$LOG_FILE"
  exit 1
fi

# 8. Run scrub and wait (optional - long running)
log "[INFO] Running scrub on backup pool: $BACKUP_POOL"
zpool scrub "$BACKUP_POOL"

# Optional: wait for scrub completion (skip if not wanted)
# log "[INFO] Waiting for scrub to complete..."
# while zpool status "$BACKUP_POOL" | grep -q "scrub in progress"; do sleep 10; done
# log "[OK] Scrub completed"

log "[SUCCESS] Backup completed and verified successfully."
exit 0

what do you think?
do you have any suggestion?

important point that i found during my test and i posted to chatgpt attentions are: recursive snapshot to avoid missing dataset and avoid autmounting point (during snapshot test on external disk, i was affected by slow performance and other strange stuff... then i noticed that / was mount on both ssd and external disk! )

fpausp · Aug 29, 2025

gabrimox said:
do you have any suggestion?

Did you test the rollback of a snapshot back to proxmox on a new hdd?

gabrimox · Aug 29, 2025

fpausp said:
Did you test the rollback of a snapshot back to proxmox on a new hdd?

No, but i did a new script much more reliable
I will post as soon as i'm back from holidays

Anyway i'm thinking to use TrueNas and its Replication Task feature, it looks promising...

fpausp · Aug 29, 2025

gabrimox said:
Anyway i'm thinking to use TrueNas and its Replication Task feature, it looks promising...

I tried this one: Replicate Proxmox ZFS Pool to TrueNAS and it works vy well but I was not able to rollback until now...

gabrimox · Aug 29, 2025

fpausp said:
I tried this one: Replicate Proxmox ZFS Pool to TrueNAS and it works vy well but I was not able to rollback until now...

Rollback on a new hdd/server or rollback to previous snapshot?

fpausp · Aug 29, 2025

gabrimox said:
Rollback on a new hdd/server or rollback to previous snapshot?

What I like to do is a complete disaster recovery if the rootfs or hdd (bootdisk) is broken...

LnxBil · Aug 29, 2025

fpausp said:
rootfs or hdd (bootdisk) is broken...

For those scenarios and as a first level of mitigation, I use

snapshotting rpool/ROOT/pve-1 if changes rendering the OS itself unuseable and
raid 1 / zfs mirror for hardware failures

which covers both use cases. Replication/zfs send/receive or proxmox-backup-client-based backups to PBS do the second level.

gabrimox · Sep 6, 2025

ok final script here, thanks chatgpt:
- create snapshot
- retention like you want, delete all other snapshot
- send to usb disk with integrity/verify checks
- pruning job on both src/dst
- safe mountpoint (no / ! ) (in my case pool is backup_usb and mount is on /backup_usb , so replace with your data)
- send message to telegram if failing

Code:

#!/bin/bash

set -euo pipefail

# ===== PATH & COMMANDS =====
PATH=/usr/sbin:/usr/bin:/sbin:/bin

ZFS=/usr/sbin/zfs
ZSTREAMDUMP=/usr/sbin/zstreamdump
DATE=/bin/date
CAT=/bin/cat
CURL=/usr/bin/curl

# ===== VARIABLES =====
SRC_DATASET="rpool/ROOT/pve-1"
DST_POOL="backup_usb"
DST_DATASET="${DST_POOL}/${SRC_DATASET#*/}"
SNAP_PREFIX="snap"
DATE_FMT=$($DATE +%d_%m_%y_%H:%M)
SNAP_NAME="${SNAP_PREFIX}-${DATE_FMT}"

DST_MOUNTPOINT="/backup_usb/ROOT/pve-1"

TMP_STREAM="/var/tmp/sendstream-${SNAP_NAME}.dat"
LOGFILE="/var/log/zfs-backup.log"

TELEGRAM_TOKEN="xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
TELEGRAM_CHAT_ID="xxxxxxxxxxxxxxxxxxxxxxxxxx"

RETENTION=30  # keep last 30 snapshots

# ===== LOGGING =====
exec > >(tee -a "$LOGFILE") 2>&1

echo "========== $(date) =========="
echo "[INFO] Backup started for ${SRC_DATASET} → ${DST_DATASET}"

# ===== FUNCTIONS =====
send_telegram() {
  local msg="$1"
  echo "[TELEGRAM] $msg"
  $CURL -s -X POST "https://api.telegram.org/bot${TELEGRAM_TOKEN}/sendMessage" \
    -d chat_id="${TELEGRAM_CHAT_ID}" \
    -d text="$msg" >/dev/null
}

error_handler() {
  local last_exit=$?
  local lineno=$1
  echo "[ERROR] Script failed at line $lineno with exit code $last_exit"
  send_telegram "❌ ZFS backup failed at line $lineno with exit code $last_exit.
Source: ${SRC_DATASET}
Destination: ${DST_DATASET}"
  exit $last_exit
}

trap 'error_handler $LINENO' ERR

prune_snapshots() {
    local dataset="$1"
    echo "[INFO] Pruning snapshots for dataset $dataset..."
    if ! $ZFS list -H -t snapshot -o name "$dataset" >/dev/null 2>&1; then
        echo "[INFO] Dataset $dataset does not exist, skipping pruning"
        return
    fi
    mapfile -t snaps < <($ZFS list -H -t snapshot -o name -s creation "$dataset" | grep "^${dataset}@${SNAP_PREFIX}-")
    echo "[INFO] Found ${#snaps[@]} snapshots with prefix $SNAP_PREFIX"
    if [ "${#snaps[@]}" -le "$RETENTION" ]; then
        echo "[INFO] No pruning needed, keeping all ${#snaps[@]} snapshots"
        return
    fi
    for old_snap in "${snaps[@]:0:${#snaps[@]}-$RETENTION}"; do
        echo "[INFO] Destroying old snapshot $old_snap"
        $ZFS destroy "$old_snap"
    done
    echo "[INFO] Pruning completed for $dataset"
}

# ===== SNAPSHOT CREATION =====
echo "[INFO] Creating snapshot ${SRC_DATASET}@${SNAP_NAME} ..."
$ZFS snapshot -r "${SRC_DATASET}@${SNAP_NAME}"
echo "[INFO] Snapshot created"

echo "[INFO] Listing source snapshots:"
$ZFS list -H -t snapshot -o name -s creation "${SRC_DATASET}" | grep "^${SRC_DATASET}@${SNAP_PREFIX}-" || echo "[INFO] No snapshots found"

echo "[INFO] Listing destination snapshots:"
if $ZFS list -H -t snapshot -o name "${DST_DATASET}" >/dev/null 2>&1; then
    $ZFS list -H -t snapshot -o name -s creation "${DST_DATASET}" | grep "^${DST_DATASET}@${SNAP_PREFIX}-" || echo "[INFO] No snapshots found"
else
    echo "[INFO] Destination dataset ${DST_DATASET} does not exist or has no snapshots yet"
fi

# ===== FIND LAST COMMON SNAPSHOT =====
echo "[INFO] Finding last common snapshot between source and destination"
mapfile -t src_snaps < <($ZFS list -H -t snapshot -o name -s creation "${SRC_DATASET}" | grep "^${SRC_DATASET}@${SNAP_PREFIX}-")
dst_snaps=()
if $ZFS list -H -t snapshot -o name "${DST_DATASET}" >/dev/null 2>&1; then
    mapfile -t dst_snaps < <($ZFS list -H -t snapshot -o name -s creation "${DST_DATASET}" | grep "^${DST_DATASET}@${SNAP_PREFIX}-")
fi

last_common=""
for (( i=${#src_snaps[@]}-1; i>=0; i-- )); do
  snap=${src_snaps[i]#${SRC_DATASET}@}
  for dst_snap_full in "${dst_snaps[@]}"; do
    dst_snap=${dst_snap_full#${DST_DATASET}@}
    if [[ "$snap" == "$dst_snap" ]]; then
      last_common="$snap"
      break 2
    fi
  done
done

if [[ -n "$last_common" ]]; then
  echo "[INFO] Last common snapshot: $last_common"
else
  echo "[INFO] No common snapshot found; first full backup will be sent"
fi

# ===== GENERATE SEND STREAM =====
if [[ -n "$last_common" ]]; then
  echo "[INFO] Generating incremental send stream from $last_common → $SNAP_NAME ..."
  $ZFS send -v --large-block --compressed -I "${SRC_DATASET}@${last_common}" "${SRC_DATASET}@${SNAP_NAME}" > "$TMP_STREAM"
else
  echo "[INFO] Generating full recursive send stream ..."
  $ZFS send -v --large-block --compressed -R "${SRC_DATASET}@${SNAP_NAME}" > "$TMP_STREAM"
fi
echo "[INFO] Send stream created at $TMP_STREAM"

# ===== VERIFY & RECEIVE =====
echo "[INFO] Verifying send stream with zstreamdump ..."
if $ZSTREAMDUMP -v < "$TMP_STREAM"; then
  echo "[INFO] Stream verification passed, receiving dataset ..."
  $CAT "$TMP_STREAM" | $ZFS recv -uF "${DST_DATASET}"
else
  send_telegram "❌ ZFS send stream verification FAILED for snapshot ${SNAP_NAME}!"
  rm -f "$TMP_STREAM"
  exit 1
fi
echo "[INFO] Receive completed"

rm -f "$TMP_STREAM"
echo "[INFO] Temporary send stream deleted"

# ===== SNAPSHOT PRUNING =====
prune_snapshots "$SRC_DATASET"
prune_snapshots "$DST_DATASET"

# ===== SET MOUNTPOINT =====
echo "[INFO] Setting mountpoint ${DST_MOUNTPOINT} on ${DST_DATASET}"
$ZFS set mountpoint="${DST_MOUNTPOINT}" "${DST_DATASET}"

echo "[INFO] Backup job completed successfully."
echo "============================================="

it works like a charm

do you suggest to export/import zpool at the end/start of the process?
consider usb disk is always attached....but no write occurs on it out of replicate script

Search

Search

Backup rpool/ROOT/pve-1 for Disaster Recovery strategy

gabrimox

New Member

fpausp

Renowned Member

gabrimox

New Member

fpausp

Renowned Member

gabrimox

New Member

fpausp

Renowned Member

LnxBil

Distinguished Member

gabrimox

New Member

We value your privacy