Single backup at a time cron script

Pouch6867

New Member
May 18, 2024
15
4
3
This post was made in regards to bug report 3086: https://bugzilla.proxmox.com/show_bug.cgi?id=3086

Update: I've updated the code to include an email report to mirror the basics of what the backup job provides and verified that both the log and the email functions as expected. This should work until they make sequential backups a feature of the gui-generated backup jobs (or you can just use this for as long as you want since you have a lot more flexibility using a cron then using pve's imbedded commands). I'll probably tweak it a bit more here and there... maybe shift my host/pve backups to this instead of having it's own script.

Not sure how many other people don't like that if you configure a backup job in the gui it will start vzdump on all applicable nodes so if you have 3 nodes you will have 3 vzdump tasks running all at once. I wanted to limit the resource demand to only one backup task at a time and the only real "option", if you can call it that, is to stager the backups and hope that one doesn't overlap with another's task run. I only wanted one backup at a time... so since the gui can't do it I started working on a script to do it via a cron job. You'll see one task in the gui task list per backup job instead of one per node. You will not see any backup jobs in the gui since this is run by cron, not pve.

This script is to help those that need to perform a single-backup-at-a-time instead of the one-at-a-time-per-host that happens if you configure a backup job in the gui. You'll need to configure cron to call this when you want to run it but it will perform a sequential backup, one at a time, of all your VMs. You can add to the skip VM to add any additional criteria (mine is just configured to not back itself up). You can also add SMTP to the end of this to replicate the notifications that the gui provides (but this way you'll have a lot more control over what goes into the notifications).

I've added a configuration section and comments to help with configuring it for your cluster. One last note... I don't take responsibility for any bad juju happening to your cluster and you need to have a basic understanding of scripting and cron but here's my tasks running right now on this script (timestamps and username removed):

1766074346880.png

Your log will look like this (I pulled it from my log directly and only changed the guest names, host id, and email):

Code:
=== Cluster backup started 2025-12-19
+++ Backing up VMID 100 (Guest Name 100) on node HOSTx
+++ Finished VMID 100 (Guest Name 100)
+++ Backing up VMID 101 (Guest Name 101) on node HOSTx
+++ Finished VMID 101 (Guest Name 101)
+++ Backing up VMID 102 (Guest Name 102) on node HOSTx
+++ Finished VMID 102 (Guest Name 102)
+++ Backing up VMID 103 (Guest Name 103) on node HOSTx
+++ Finished VMID 103 (Guest Name 103)
+++ Backing up VMID 104 (Guest Name 104) on node HOSTx
+++ Finished VMID 104 (Guest Name 104)
--- Skipping VMID 105 (Guest Name that starts with PBS)
+++ Backing up VMID 106 (Guest Name 106) on node HOSTx
+++ Finished VMID 106 (Guest Name 106)
+++ Backing up VMID 107 (Guest Name 107) on node HOSTx
+++ Finished VMID 107 (Guest Name 107)
+++ Backing up VMID 108 (Guest Name 108) on node HOSTx
+++ Finished VMID 108 (Guest Name 108)
+++ Backing up VMID 109 (Guest Name 109) on node HOSTx
+++ Finished VMID 109 (Guest Name 109)
+++ Backing up VMID 110 (Guest Name 110) on node HOSTx
+++ Finished VMID 110 (Guest Name 110)
+++ Backing up VMID 111 (Guest Name 111) on node HOSTx
+++ Finished VMID 111 (Guest Name 111)
+++ Backing up VMID 112 (Guest Name 112) on node HOSTx
+++ Finished VMID 112 (Guest Name 112)
+++ Backing up VMID 113 (Guest Name 113) on node HOSTx
+++ Finished VMID 113 (Guest Name 113)
+++ Backing up VMID 114 (Guest Name 114) on node HOSTx
+++ Finished VMID 114 (Guest Name 114)
+++ Backup report sent to user@domain.tld
=== Cluster backup completed 2025-12-19

Updated code 12/19/2025:
Code:
#!/bin/bash
set -euo pipefail

# Configuration (update here)
PBS_STORAGE="pbs"
SSH_USER="user"
SSH_KEY="/root/.ssh/vzdump_cluster"

# Fixed settings
PVE_USER="${SSH_USER}@pam"
LOGFILE="/var/log/cluster-vzdump.log"
LOCKFILE="/var/lock/cluster-vzdump.lock"
TODAY=$(date +%F)

# Email settings
MAIL_TO=$(awk -F: -v u="${SSH_USER}@pam" '$1=="user" && $2==u { print $7; exit }' /etc/pve/user.cfg)
MAIL_FROM=$(awk -F': ' '$1=="email_from"{print $2}' /etc/pve/datacenter.cfg)
MAIL_SUBJECT="Proxmox Backup Report - $TODAY"

# Result storage
RESULTS=()

# Acquire lock
exec 9>"$LOCKFILE" || exit 1
flock -n 9 || exit 0

echo "" >> "$LOGFILE"
echo "=== Cluster backup started $TODAY" >> "$LOGFILE"

# Read VMs into an array
mapfile -t VM_ARRAY < <(
  pvesh get /cluster/resources --type vm --output-format json | jq -c '.[]'
)

# Backup VMs
for vm in "${VM_ARRAY[@]}"; do
    NODE=$(echo "$vm" | jq -r '.node')
    VMID=$(echo "$vm" | jq -r '.vmid')
    NAME=$(echo "$vm" | jq -r '.name')

    # Skip VM
    if [[ "${NAME^^}" == PBS* ]]; then
        echo "--- Skipping VMID $VMID ($NAME)" >> "$LOGFILE"
        RESULTS+=("$VMID|$NAME|$NODE|SKIPPED")
        continue
    fi

    echo "+++ Backing up VMID $VMID ($NAME) on node $NODE" >> "$LOGFILE"

    ssh -i "$SSH_KEY" \
        -o StrictHostKeyChecking=no \
        -o UserKnownHostsFile=/root/.ssh/known_hosts \
        "$SSH_USER@$NODE" \
        "vzdump $VMID --storage $PBS_STORAGE --mode snapshot --compress zstd --quiet 1 --notes '$NAME'" \
        >> "$LOGFILE" 2>&1

    EXIT_CODE=$?

    if [[ $EXIT_CODE -eq 0 ]]; then
        echo "+++ Finished VMID $VMID ($NAME)" >> "$LOGFILE"
        RESULTS+=("$VMID|$NAME|$NODE|OK")
    else
        echo "!!! Backup FAILED for VMID $VMID ($NAME)" >> "$LOGFILE"
        RESULTS+=("$VMID|$NAME|$NODE|FAILED")
    fi
done

# Email report
if [[ -n "${MAIL_TO:-}" ]]; then
    # Send the email
    /usr/sbin/sendmail -f "$MAIL_FROM" -t <<EOF
From: $MAIL_FROM
To: $MAIL_TO
Subject: $MAIL_SUBJECT

Proxmox Backup Report
Date: $TODAY

$(printf "%-6s %-13s %-5s %-8s\n" "VMID" "NAME" "NODE" "STATUS")
$(printf "%-6s %-13s %-5s %-8s\n" "----" "-------------" "-----" "--------")
$(for r in "${RESULTS[@]}"; do
    IFS="|" read -r VMID NAME NODE STATUS <<< "$r"
    printf "%-6s %-13.13s %-5.5s %-8s\n" "$VMID" "$NAME" "$NODE" "$STATUS"
done)
EOF

    if [[ $? -eq 0 ]]; then
        echo "+++ Backup report sent to $MAIL_TO" >> "$LOGFILE"
    else
        echo "!!! Failed to send backup report to $MAIL_TO" >> "$LOGFILE"
    fi
else
    echo "!!! No email defined for $PVE_USER" >> "$LOGFILE"
fi

echo "=== Cluster backup completed $TODAY" >> "$LOGFILE"
 
Last edited:
Thanks to Pouch6867 for sharing your script. Until Proxmox introduces a built-in feature for serialized/sequential cluster backups, writing our own scripts is indeed a very practical approach to prevent storage or network congestion.

Inspired by your work, I have expanded the script with several additional features to make it more robust and flexible for different cluster environments:

  • Automatic Dependency Check: Since jq is not installed by default in PVE, the script now checks for it and handles the installation automatically.
  • LXC Support: Added support for Containers (ct) in addition to Virtual Machines (qemu).
  • Randomized Backup Order: It now shuffles the VMID list using shuf to ensure that backups don't always start from the same ID every time.
  • Enhanced Node Resolution: Solved the issue where SSH might fail using only node names. The script now parses /etc/pve/corosync.conf to automatically resolve the correct Cluster IP for each node.
  • Simplified Setup: I've defaulted the user to root for easier initial deployment. Of course, users with higher security requirements can still revert to passwordless SSH with a dedicated user as you did.
Here is the updated script:

Bash:
#!/bin/bash
# ------------------------------------------------------------------------------
# Proxmox Cluster Backup Script
# Optimized for: Proxmox VE 9.1.6
# ------------------------------------------------------------------------------

# Use -e to exit on error, but we handle the critical parts manually
set -e
set -u

# --- Configuration ---
PBS_STORAGE="pbs"
SSH_USER="root"
SSH_KEY="/root/.ssh/id_rsa"
LOGFILE="/var/log/cluster-vzdump.log"
LOCKFILE="/var/lock/cluster-vzdump.lock"

# --- Internal Variables ---
TODAY=$(date +%F)
CURRENT_NODE=$(hostname)
RESULTS=()

# --- Check Dependencies ---
if ! command -v jq &> /dev/null; then
    echo "Dependency 'jq' not found. Installing..."
    export DEBIAN_FRONTEND=noninteractive
    apt-get update -qq && apt-get install -y jq > /dev/null
    echo "'jq' installed."
fi

# --- Acquire Lock ---
exec 9>"$LOCKFILE" || exit 1
if ! flock -n 9; then
    echo "Error: Another backup process is already running."
    exit 0
fi

# --- Trap for Cleanup (Ctrl+C / Abort) ---
cleanup() {
    rm -f "$LOCKFILE"
}
# Automatically delete lock file on normal script exit
trap cleanup EXIT

handle_interrupt() {
    echo -e "\n!!! Process interrupted by user (Ctrl+C). Cleaning up lock file..."
    cleanup
    exit 130
}
# Catch Ctrl+C (INT) and kill commands (TERM)
trap handle_interrupt INT TERM

echo "" >> "$LOGFILE"
echo "=== Cluster backup started at $(date) ===" >> "$LOGFILE"

# --- 1. Fetch Resources ---
echo "Fetching cluster resources from pvesh..."
JSON_DATA=$(pvesh get /cluster/resources --output-format json)

# Convert JSON array to line-delimited JSON objects
mapfile -t RAW_RESOURCES < <(echo "$JSON_DATA" | jq -c '.[] | select(.type=="qemu" or .type=="lxc")' 2>/dev/null || echo "")

if [[ ${#RAW_RESOURCES[@]} -eq 0 || "${RAW_RESOURCES[0]}" == "" ]]; then
    echo "No VMs or Containers found. Nothing to backup."
    exit 0
fi

# --- 2. Randomize Order ---
echo "Randomizing backup order..."
SHUFFLED_DATA=$(printf "%s\n" "${RAW_RESOURCES[@]}" | shuf)
mapfile -t SHUFFLED_RESOURCES <<< "$SHUFFLED_DATA"

TOTAL_COUNT=${#SHUFFLED_RESOURCES[@]}
CURRENT_INDEX=0

echo "Found $TOTAL_COUNT resources to process."
echo "------------------------------------------------------------"

# --- 3. Backup Loop ---
for item in "${SHUFFLED_RESOURCES[@]}"; do
    [[ -z "$item" || "$item" == "null" ]] && continue
    
    # Fix: Use safe addition to avoid the set -e trap
    CURRENT_INDEX=$((CURRENT_INDEX + 1))

    # Safely parse fields
    set +e
    NODE=$(echo "$item" | jq -r '.node // empty')
    VMID=$(echo "$item" | jq -r '.vmid // empty')
    NAME=$(echo "$item" | jq -r '.name // "Unnamed"')
    TYPE=$(echo "$item" | jq -r '.type // "unknown"')
    set -e

    if [[ -z "$NODE" || -z "$VMID" ]]; then
        continue
    fi

    PROGRESS_STR="[$CURRENT_INDEX/$TOTAL_COUNT]"
    
    # Skip logic: ignore items with name starting with 'PBS'
    if [[ "${NAME^^}" == PBS* ]]; then
        echo "$PROGRESS_STR Skipping $TYPE $VMID ($NAME)..."
        echo "--- Skipping $TYPE $VMID ($NAME)" >> "$LOGFILE"
        RESULTS+=("$VMID|$NAME|$NODE|SKIPPED")
        continue
    fi

    echo "$PROGRESS_STR Current Task: $TYPE $VMID ($NAME) on node $NODE"
    echo "+++ Backing up $TYPE $VMID ($NAME) on node $NODE" >> "$LOGFILE"

    # --- Execute Backup ---
    # Using 'set +e' to ensure loop continues even if one task fails
    set +e
    
    VZDUMP_CMD="vzdump $VMID --storage $PBS_STORAGE --mode snapshot --compress zstd --quiet 1 --notes '$NAME'"
    
    # Check if we are on the target node or need SSH
    if [[ "$NODE" == "$CURRENT_NODE" || "$NODE" == "localhost" ]]; then
        echo "    Action: Running locally..."
        eval "$VZDUMP_CMD" >> "$LOGFILE" 2>&1
    else
        echo "    Action: Running via SSH..."
        
        # --- Automatically resolve IP from Corosync ---
        TARGET_IP="$NODE" # Default to node name in case parsing fails
        if [[ -f /etc/pve/corosync.conf ]]; then
            # Extract 5 lines below the node name and filter for ring0_addr (Cluster IP)
            COROSYNC_IP=$(grep -A 5 "name: $NODE" /etc/pve/corosync.conf | grep "ring0_addr:" | awk '{print $2}' | head -n 1)
            if [[ -n "$COROSYNC_IP" ]]; then
                TARGET_IP="$COROSYNC_IP"
                echo "    Info: Resolved '$NODE' to IP '$TARGET_IP' via corosync.conf"
            fi
        fi

        # -n: Redirects stdin from /dev/null (Prevents SSH from eating loop items)
        # -o BatchMode=yes: Ensures it won't hang waiting for a password
        ssh -n -i "$SSH_KEY" \
            -o StrictHostKeyChecking=no \
            -o ConnectTimeout=15 \
            -o BatchMode=yes \
            -o UserKnownHostsFile=/root/.ssh/known_hosts \
            "$SSH_USER@$TARGET_IP" \
            "$VZDUMP_CMD" \
            >> "$LOGFILE" 2>&1
    fi
    
    EXIT_CODE=$?
    set -e

    if [[ $EXIT_CODE -eq 0 ]]; then
        echo "    Status: [SUCCESS]"
        echo "+++ Finished $VMID ($NAME)" >> "$LOGFILE"
        RESULTS+=("$VMID|$NAME|$NODE|OK")
    else
        echo "    Status: [FAILED] (Exit Code: $EXIT_CODE)"
        echo "    (Check $LOGFILE for details)"
        echo "!!! Backup FAILED for $VMID ($NAME)" >> "$LOGFILE"
        RESULTS+=("$VMID|$NAME|$NODE|FAILED")
    fi

    sleep 3

    echo "------------------------------------------------------------"
done

# --- 4. Generate Email Report ---
MAIL_TO=$(awk -F: -v u="${SSH_USER}@pam" '$1=="user" && $2==u { print $7; exit }' /etc/pve/user.cfg || echo "")
if [[ -n "${MAIL_TO:-}" ]]; then
    MAIL_FROM=$(awk -F': ' '$1=="email_from"{print $2}' /etc/pve/datacenter.cfg || echo "root@$(hostname)")
    echo "Sending email report to $MAIL_TO..."
    
    REPORT_BODY=$(
        printf "%-8s %-15s %-10s %-8s\n" "VMID" "NAME" "NODE" "STATUS"
        printf "%-8s %-15s %-10s %-8s\n" "--------" "---------------" "----------" "--------"
        for r in "${RESULTS[@]}"; do
            IFS="|" read -r R_VMID R_NAME R_NODE R_STATUS <<< "$r"
            printf "%-8s %-15.15s %-10.10s %-8s\n" "$R_VMID" "$R_NAME" "$R_NODE" "$R_STATUS"
        done
    )

    {
        echo "From: $MAIL_FROM"
        echo "To: $MAIL_TO"
        echo "Subject: Proxmox Cluster Backup Report - $TODAY"
        echo ""
        echo "Proxmox Cluster Backup Report"
        echo "Date: $TODAY"
        echo ""
        echo "$REPORT_BODY"
    } | /usr/sbin/sendmail -f "$MAIL_FROM" -t
fi

echo "Backup process completed. All logs saved to $LOGFILE"
echo "=== Cluster backup completed at $(date) ===" >> "$LOGFILE"

I've successfully tested this on my 3-node PVE 9.1.6 laboratory cluster containing 3 VMs and 1 CT. You can see the script in action here:

https://youtu.be/fd02GpZVfjg

Any thoughts or further improvements are welcome. Thanks again for the inspiration!
 
  • Like
Reactions: UdoB