How to recover from 100% disk use

maomaocake

Member
Feb 13, 2022
47
5
13
22
Hi guys, I did an oopsie. I set up a new PBS server and forgot to configure a GC and a retention policy(I have one on pve), now the disk is at 100% full how do I get the GC to run now that its giving me a disk full error.
Code:
TASK ERROR: update atime failed for chunk/file "/mnt/datastore/backups/.chunks/3cd7/3cd7ece9322ea3a1e2070500555e348682a92e89cf23d6b66ad7e075805b552d" - ENOSPC: No space left on device
Error: task failed (status update atime failed for chunk/file "/mnt/datastore/backups/.chunks/3cd7/3cd7ece9322ea3a1e2070500555e348682a92e89cf23d6b66ad7e075805b552d" - ENOSPC: No space left on device)
 
I've decided to just nuke my current backups and re setup a new one. if anyone has any tops post it for the next person. I rather lose my current backups than to not be able to backup in the future
 
You don't really need to do that. I don't remember the exact steps, but you can try:

1.- Place datastore in maintenance, read only
2.- Move some files from the .chunks directory to some other drive. Having like 200MB free should be enough.
3.- Run GC and hope it's able to delete some chunks. You may need to wait 24h+5m so the GC is really able to delete something [1]. You will get errors about missing chunks (those you moved in step 2).
4.- Move back the chunks to the same directory you moved them from.
5.- Run a full verify.
6.- Run GC again.
7.- If enough free space, remove maintenance mode.
 
You don't really need to do that. I don't remember the exact steps, but you can try:

1.- Place datastore in maintenance, read only
2.- Move some files from the .chunks directory to some other drive. Having like 200MB free should be enough.
3.- Run GC and hope it's able to delete some chunks. You may need to wait 24h+5m so the GC is really able to delete something [1]. You will get errors about missing chunks (those you moved in step 2).
4.- Move back the chunks to the same directory you moved them from.
5.- Run a full verify.
6.- Run GC again.
7.- If enough free space, remove maintenance mode.
@VictorSTS
Thank you for your simple and effective tip. This is how you revive a dead datastore.
I first tried expanding my ZFS datastore via zpool xyz. Unfortunately it doesn't work somehow.
The Proxmox team should definitely address the problem of full data stores. This is extremely annoying.
 
  • Like
Reactions: Taomyn
Why do you use ZFS at all? I mean it is written in manual, that
Monitor pool and file system space to make sure that they are not full. Consider using ZFS quotas and reservations to make sure file system space does not exceed 80% pool capacity.
ZFS is kinda enterprise thing if you read about it. It CAN run on single drive (I run it that way myself for now), but it's really not intended to.
If you don't use ZFS perks like snapshots, deduplication and sophisticated RAID-Z modes, just use ext4 + mdraid. It's not a bad or inferior thing in any way. It will be more like proper tool for your task.

My point is, ZFS is awesome, but it has it's own drawbacks and you at least should be aware of those to make effective use/don't use decisions.

If ZFS is tool of your choice, why don't install something like Zabbix, that takes resources next to nothing, but monitors and notifies you of your system.
 
Last edited:
I made a script to help other out
If you end up filling up your space run this with privileged user like root

Steps:
1. Set the Paths "SRC_DIR", "DEST_DIR" and "BACKUP_FILE"
2. Run the script and select "MOVE", you need to free up at least 100mb
3. Manually run a Garbage Collect Job now in PBS, it should work (ignore the warnings, we will restore them in step 4.)
4. Run this script and select "RESTORE", to restore your moved file as they were.
5. Run a full Verify in PBS

Bash:
#!/bin/env bash

# Set Source and destination directories

SRC_DIR="/path/to/datastore/.chunks/"
DEST_DIR="/path/to/move/to/.chunks/"
BACKUP_FILE="/any/path/logs/move_log.txt"

# ----- DO NOT EDIT BELOW ----- #
clear
function_move() {

    # warn the user that the script will move files from SRC_DIR to DEST_DIR, make it eye catching:
    echo -e "\e[31m----- WARNING: This script will move files -----\e[0m"
    echo -e "from: \e[33m'$SRC_DIR'\e[0m"
    echo -e "to:   \e[32m'$DEST_DIR'\e[0m"
    echo -e "\e[31m----- WARNING: Do Not Run MOVE second time without Running RESTORE first -----\e[0m"

    # Prompt the user to select the number of latest modified folders to move
    echo
    echo "Select the number of latest folders to move:"
    echo "This is done to free up space for garbage cleaning to be able to run"
    echo
    options=("<Manually input>" "5" "10" "25" "50" "100")
    select opt in "${options[@]}"
    do
        case $opt in
            "<Manually input>")
                read -p "Enter the number of items: " NUM_ITEMS
                break
                ;;
            "5")
                NUM_ITEMS=5
                break
                ;;
            "10")
                NUM_ITEMS=10
                break
                ;;
            "25")
                NUM_ITEMS=25
                break
                ;;
            "50")
                NUM_ITEMS=50
                break
                ;;
            "100")
                NUM_ITEMS=100
                break
                ;;
            *) echo "Invalid option $REPLY";;
        esac
    done

    # Find the latest modified folders or files in SRC_DIR
    LATEST_ITEMS=$(ls -t "$SRC_DIR" | head -n "$NUM_ITEMS")

    # Output the result
    # echo "The latest $NUM_ITEMS modified items are:"
    # echo "$LATEST_ITEMS"

    # Show human-readable sizes of the selected items
    echo "Sizes of the selected items:"
    for ITEM in $LATEST_ITEMS;
    do
      du -sh "$SRC_DIR/$ITEM"
    done

    # Show the total combined size
    echo
    echo "Total combined size of the selected items:"
    du -ch $(for ITEM in $LATEST_ITEMS; do echo "$SRC_DIR/$ITEM"; done) | grep total

    # Ask for confirmation to proceed with the move
    while true; do
        read -p "Do you want to proceed with moving these items to $DEST_DIR? (y/n): " CONFIRM
        case $CONFIRM in
            [Yy]* )
                break
                ;;
            [Nn]* )
                read -p "Do you want to retry or exit? (r/e): " RETRY
                case $RETRY in
                    [Rr]* )
                        function_move
                        return
                        ;;
                    [Ee]* )
                        echo "Operation cancelled."
                        exit 1
                        ;;
                    * )
                        echo "Invalid option. Please enter 'r' to retry or 'e' to exit."
                        ;;
                esac
                ;;
            * )
                echo "Invalid option. Please enter 'y' to proceed or 'n' to cancel."
                ;;
        esac
    done

    # Ensure the destination directory exists
    if [ ! -d "$DEST_DIR" ]; then
        mkdir -pv "$DEST_DIR"
    fi
    # Change ownership of the destination directory to 34:34
    chown 34:34 "$DEST_DIR"

    # Backup 'permissions', 'uid', 'gid', 'File path', 'last modification', 'last access' and 'last status change' to a file
    > "$BACKUP_FILE"
    for ITEM in $LATEST_ITEMS;
    do
    find "$SRC_DIR/$ITEM" -exec stat -c "%a %U %G %n %Y %X %Z" {} \; >> "$BACKUP_FILE"
    done

    # Copy the latest folders or files to DEST_DIR
    for ITEM in $LATEST_ITEMS;
    do
    cp -rpv "$SRC_DIR/$ITEM" "$DEST_DIR"
    done

    # Restore permissions, timestamps, and ownership from the backup file
    while IFS=' ' read -r SRC_PERM SRC_USER SRC_GROUP SRC_PATH SRC_MODTIME SRC_ACCESSTIME SRC_CHANGETIME; do
    DEST_PATH="$DEST_DIR/${SRC_PATH#$SRC_DIR/}"
    if [ -e "$DEST_PATH" ]; then
        DEST_PERM=$(stat -c %a "$DEST_PATH")
        if [ "$SRC_PERM" != "$DEST_PERM" ]; then
        chmod "$SRC_PERM" "$DEST_PATH"
        fi
        # Set the timestamps
        touch -m -d "@$SRC_MODTIME" "$DEST_PATH"
        touch -a -d "@$SRC_ACCESSTIME" "$DEST_PATH"
        touch -d "@$SRC_CHANGETIME" "$DEST_PATH"
        # Restore ownership
        chown "$SRC_USER:$SRC_GROUP" "$DEST_PATH"
    fi
    done < "$BACKUP_FILE"

    # Delete the source items
    for ITEM in $LATEST_ITEMS;
    do
    rm -r "$SRC_DIR/$ITEM"
    done

    echo
    echo "Items copied, permissions, timestamps, and ownership saved, and source items deleted successfully."
    echo
    echo -e "\e[31m----- DO NOT RUN AGAIN UNTIL FILES ARE RESTORED BACK TO '$SRC_DIR' -----\e[0m"
    echo
    echo "Next Steps:"
    echo
    echo " 1). Run a Garbage Collect Job in Proxbox Backup Server to free up space"
    echo " 2). Then Run RESTORE with this script to move the files back to '$SRC_DIR'"
    echo
}

function_restore() {
    # Restore the items from DEST_DIR to SRC_DIR
    echo "Restoring items from $DEST_DIR to $SRC_DIR..."

    # Copy the items back to SRC_DIR
    while IFS=' ' read -r SRC_PERM SRC_USER SRC_GROUP SRC_PATH SRC_MODTIME SRC_ACCESSTIME SRC_CHANGETIME; do
        REL_PATH="${SRC_PATH#$SRC_DIR/}"
        DEST_PATH="$SRC_DIR/$REL_PATH"
        DEST_DIR_PATH=$(dirname "$DEST_PATH")

        # Copy the item back
        cp -rpv "$DEST_DIR/$REL_PATH" "$DEST_PATH"

        # Restore permissions, timestamps, and ownership
        if [ -e "$DEST_PATH" ]; then
            DEST_PERM=$(stat -c %a "$DEST_PATH")
            if [ "$SRC_PERM" != "$DEST_PERM" ]; then
                chmod "$SRC_PERM" "$DEST_PATH"
            fi
            # Set the timestamps
            touch -m -d "@$SRC_MODTIME" "$DEST_PATH"
            touch -a -d "@$SRC_ACCESSTIME" "$DEST_PATH"
            touch -d "@$SRC_CHANGETIME" "$DEST_PATH"
            # Restore ownership
            chown "$SRC_USER:$SRC_GROUP" "$DEST_PATH"
        fi
    done < "$BACKUP_FILE"

    # Delete the leftover files and folders in DEST_DIR
    echo "Deleting leftover files and folders in $DEST_DIR..."
    while IFS=' ' read -r _ _ _ SRC_PATH _ _ _; do
        REL_PATH="${SRC_PATH#$SRC_DIR/}"
        DEST_PATH="$DEST_DIR/$REL_PATH"
        if [ -e "$DEST_PATH" ]; then
            rm -r "$DEST_PATH"
        fi
    done < "$BACKUP_FILE"

    echo
    echo "Items restored, permissions, timestamps, and ownership checked, and leftover files deleted successfully."
    echo
}

echo
echo -e "\e[31m----- WARNING: Do Not Run MOVE second time without Running RESTORE first -----\e[0m"
# Start the function_move function or restore function based on user input
echo "Do you want to move items or restore items?"
echo
options=("Move" "Restore")
select opt in "${options[@]}"
do
    case $opt in
        "Move")
            function_move
            break
            ;;
        "Restore")
            function_restore
            break
            ;;
        *) echo "Invalid option $REPLY";;
    esac
done
 
Nice! I’m sure I’m going to use this. I’ve already had to deal with this twice and I only started using ProxMox this year.
 
I did this to a system where the main storage also hosted the datastore.
Here's how you can prevent it.
Set a reservation on the dataset so you don't fill up again.

Code:
zfs set reservation=10g rpool/ROOT
Im running in this the second time, because i have set prune and GC wrong. To prevent this for the future, i would type this command in it, but where ? In the shell from PBS ? Or somewhere else ?

I have send this in the shell from PBS but this comes back.
zfs set reservation=2g rpool/ROOT
and that came back.

cannot open 'rpool/ROOT': dataset does not exist
 
Last edited:
So thats what he tells me..

NAME USED AVAIL REFER MOUNTPOINT
Backup_Storage 196G 28.4G 196G /mnt/datastore/Backup_Storage

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
Backup_Storage 232G 196G 35.6G - - 46% 84% 1.00x DEGRADED -

Filesystem Size Used Avail Use% Mounted on
udev 1.9G 0 1.9G 0% /dev
tmpfs 391M 676K 391M 1% /run
/dev/mapper/pbs-root 8.3G 2.4G 5.5G 30% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
Backup_Storage 225G 197G 29G 88% /mnt/datastore/Backup_Storage
tmpfs 391M 0 391M 0% /run/user/0