Backup to external Disks - multiple Issues

shift.cx

New Member
Jun 20, 2024
9
0
1
I wanted to create a thread for others to find before I create a ticket, as I think there should be better support inside the software for managing rotating drives.

Currently, I'm struggling to find a proper way to rotate four USB HDDs (one drive per week), specifically:
  • Spinning down the disks when they aren't being used.
  • Automatically mounting the disks whenever I physically plug them in.
  • Understanding why the license model is per server instead of based on the entire Proxmox environment. I need to buy another PBS license just to avoid running into the basement to swap HDDs. It would be more convenient to connect them to a PC near my office, but PBS feels expensive compared to PVE for the additional cost of handling these scenarios. It seems like the lack of certain features forces the use of multiple PBS instances, which complicates the setup with custom scripts that could potentially cause issues.
  • Considering if I need to invest in new hard disks because the data compression only happens after the backup (and sync), leading to the disks reaching 100% capacity before all servers finish backing up.

Our old setup with Veeam worked like this, and I'm trying to replicate it with PBS and PVE:
  • The Veeam Backup Server triggered a Backup Sync Job on a Windows client. The Windows PC existed solely to manage USB disks, with no extra licensing costs.
  • On Monday at 12:30, the Backup Sync started copying files to our external disk.
  • By Friday at 12:00, the job stopped using the disk. At 12:30, the disk was unmounted and ready to be taken home.
  • Another HDD could be plugged into the Windows PC to prepare for the next backup cycle starting on Monday at 12:30.

With PBS:
  • It seems like I need a second PBS server with a license to add my existing PBS as a "Remote."
  • I'm not sure if I should create separate ZFS datasets for each disk? Is that the right way?
  • Running a Sync Job for each datastore on Monday fills the disks completely because compression doesn't happen until later. This means the disks reach full capacity, and I can't run compression or deduplication afterward. With the current capacity limits, I wonder how to handle backups efficiently without needing to buy much larger disks or a more complex setup. yes I could buy 20TB Drives now, but these will fill up with just one Backup Cycle sooner or later.
I also need to find a way to properly eject the disks. Currently, I put the datastore in maintenance mode (offline), run sync, unmount the ZFS, but the disk doesn't power down even after setting these parameters with sdparm:

sdparm -l --save --set SZCT=6000 /dev/sdc
sdparm -l --save --set STANDBY_Z=1 /dev/sdc

I end up rebooting the PBS server because I can't seem to get the datastore back online after plugging in the disk for next week.


Is there a recommended way to handle backups on rotating hard drives with PBS?

I apologize if my post comes across as critical of the product. I genuinely want PBS to work as seamlessly as our PVE cluster does.
I'm feeling quite frustrated with the current setup, and I haven't even addressed the need to set up a third PBS server due to the absence of a feature like a Hardened Repository, similar to what Veeam offers.
I already had to find workarounds for the lack of Network File Backup for my previous Post (https://forum.proxmox.com/threads/backup-files-from-smb-share-to-pbs.151766/#post-687412)
 
Last edited:
Hi,

I think you are trying to address multiple points here, but I'll add some piece information that might help in this discussion. First, removable datastores are planned and being worked on [1]. This would be needed to handle switching out USB drives like you describe, so maybe keep an eye on the corresponding Bugzilla issue.

Understanding why the license model is per server instead of based on the entire Proxmox environment. I need to buy another PBS license just to avoid running into the basement to swap HDDs. It would be more convenient to connect them to a PC near my office, but PBS feels expensive compared to PVE for the additional cost of handling these scenarios. It seems like the lack of certain features forces the use of multiple PBS instances, which complicates the setup with custom scripts that could potentially cause issues.
Not sure what exactly you mean here. If you want to run several instances of PBS without licensing costs, you can as it is open source. Then you can use sync jobs to sync datastores between both locations. If you need enterprise level support for all them, then yes, you'd need to pay for additional subscription fees.

However, you could also expose your USB drive from another host via NFS/CIFS etc. and then mount that on the PBS host. Once you added that as a directory backed datastore, you can use PBS' ability to sync content between two datastores locally to move the data across.

Running a Sync Job for each datastore on Monday fills the disks completely because compression doesn't happen until later. This means the disks reach full capacity, and I can't run compression or deduplication afterward.
I am not sure what you mean by that. Chunks are compressed before they are written, and duplicate chunks are ever only written once. So what do you mean by “compression doesn't happen until later”?


[1]: https://bugzilla.proxmox.com/show_bug.cgi?id=3156
 
1. You do not need a second PBS (even if it is nice to have;) ). PBS now supports a local sync.
2. You also do not need ZFS for PBS on your external storage disks, any Linux FS will do.

This is how i did it:

/usr/local/bin/backup-replicate-pbs.sh

#!/bin/bash
export PATH=$PATH:/usr/sbin:/usr/local/sbin

ZP=$(zpool import | awk '/pool:.*extern/ {print $2}')
if [[ "x$ZP" == "x" ]]; then
ZP=$(zpool list | awk '$1 ~ /extern/ {print $1}')
if [[ "x$ZP" == "x" ]]; then
echo "ERROR: No zpool to import and no zpool imported >&2"
exit 1
fi
else
zpool import $ZP
KEY=$(zfs get keylocation $ZP)
systemctl restart zfs-load-key@$ZP
zfs mount $ZP/backup-extern
fi
DEVICE=$(zpool list -v $ZP | awk '$10 == "ONLINE" && $9 == "-" {print $1}')
echo "Platte $DEVICE ist nach $(mount |grep backup-extern) gemountet"

proxmox-backup-manager datastore update extern --delete maintenance-mode
echo "Starte Sync auf externe Platte"
SYNCID=$(proxmox-backup-manager sync-job list --output-format json | jq -r '.[] | .id')
time proxmox-backup-manager sync-job run $SYNCID |tee /var/log/backup-sync_$(date -I).log || (echo -e "\e[0;31mBackup-sync failed\e[0m"; exit)
date -Iseconds > "/backup/pbsreplicationstatus/$DEVICE"

proxmox-backup-manager verify extern --ignore-verified true --outdated-after 30

PRUNEID=$(proxmox-backup-manager prune-job list | awk '$5 == "extern" {print $2}')
time proxmox-backup-manager prune-job run $PRUNEID

time proxmox-backup-manager garbage-collection start extern

proxmox-backup-manager datastore update extern --maintenance-mode type=offline

if ! fuser -M /backup-extern 2>&1; then
zpool export $ZP
else
tries=10
delay=2
while [ $tries -gt 0 ]; do
tries=$((tries-1))
sleep $delay
if [ 2 -ge $(fuser -Mv /backup-extern 2>&1 | wc -l ) ]; then
break
fi
done
zpool export $ZP
fi

echo "Bekannte Replikationen:"
(cd /backup/pbsreplicationstatus/ && grep -H . *) | sed 's/^\(.*\).*\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\)/\2\t\1/; s/:+..:..$//'

I use encrypte ZFS, hence the key parts, you can igrore them
 
First, removable datastores are planned and being worked on [1]. This would be needed to handle switching out USB drives like you describe, so maybe keep an eye on the corresponding Bugzilla issue.
Thanks for the info, will check the thread!

Not sure what exactly you mean here. If you want to run several instances of PBS without licensing costs, you can as it is open source. Then you can use sync jobs to sync datastores between both locations. If you need enterprise level support for all them, then yes, you'd need to pay for additional subscription fees.
Well, technically I can run another PBS instance unlicensed, but would you recommend for my company that I run one licenced PBS where I get updates from the enterprise repository parallel to another PBS that is running on the no-subscription repository?

However, you could also expose your USB drive from another host via NFS/CIFS etc. and then mount that on the PBS host. Once you added that as a directory backed datastore, you can use PBS' ability to sync content between two datastores locally to move the data across.
That was the original Plan, but when I had the choice to install another Ubuntu Server or a second PBS instance, PBS sounded smarter and easier in my head. Guess I will re-evaluate

I am not sure what you mean by that. Chunks are compressed before they are written, and duplicate chunks are ever only written once. So what do you mean by “compression doesn't happen until later”?
from what I understand, I have to run a Garbage Collection. This only works if I have a existing Backup and doesn't happen while the Backup is running, so initially I need a bigger set of HDDS?
Or I'm misunderstanding the "Deduplication Factor" in the Summary tab, it showed 1.0 until I set a daily Garbage Collection Schedule which changed the Factor to 15.65
 
1. You do not need a second PBS (even if it is nice to have;) ). PBS now supports a local sync.
2. You also do not need ZFS for PBS on your external storage disks, any Linux FS will do.

This is how i did it:

/usr/local/bin/backup-replicate-pbs.sh

#!/bin/bash
export PATH=$PATH:/usr/sbin:/usr/local/sbin

ZP=$(zpool import | awk '/pool:.*extern/ {print $2}')
if [[ "x$ZP" == "x" ]]; then
ZP=$(zpool list | awk '$1 ~ /extern/ {print $1}')
if [[ "x$ZP" == "x" ]]; then
echo "ERROR: No zpool to import and no zpool imported >&2"
exit 1
fi
else
zpool import $ZP
KEY=$(zfs get keylocation $ZP)
systemctl restart zfs-load-key@$ZP
zfs mount $ZP/backup-extern
fi
DEVICE=$(zpool list -v $ZP | awk '$10 == "ONLINE" && $9 == "-" {print $1}')
echo "Platte $DEVICE ist nach $(mount |grep backup-extern) gemountet"

proxmox-backup-manager datastore update extern --delete maintenance-mode
echo "Starte Sync auf externe Platte"
SYNCID=$(proxmox-backup-manager sync-job list --output-format json | jq -r '.[] | .id')
time proxmox-backup-manager sync-job run $SYNCID |tee /var/log/backup-sync_$(date -I).log || (echo -e "\e[0;31mBackup-sync failed\e[0m"; exit)
date -Iseconds > "/backup/pbsreplicationstatus/$DEVICE"

proxmox-backup-manager verify extern --ignore-verified true --outdated-after 30

PRUNEID=$(proxmox-backup-manager prune-job list | awk '$5 == "extern" {print $2}')
time proxmox-backup-manager prune-job run $PRUNEID

time proxmox-backup-manager garbage-collection start extern

proxmox-backup-manager datastore update extern --maintenance-mode type=offline

if ! fuser -M /backup-extern 2>&1; then
zpool export $ZP
else
tries=10
delay=2
while [ $tries -gt 0 ]; do
tries=$((tries-1))
sleep $delay
if [ 2 -ge $(fuser -Mv /backup-extern 2>&1 | wc -l ) ]; then
break
fi
done
zpool export $ZP
fi

echo "Bekannte Replikationen:"
(cd /backup/pbsreplicationstatus/ && grep -H . *) | sed 's/^\(.*\).*\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}\)/\2\t\1/; s/:+..:..$//'

I use encrypte ZFS, hence the key parts, you can igrore them
thanks, is it correct that you only have a single target (unlike me, I rotate 4 disks)?
 
Well, technically I can run another PBS instance unlicensed, but would you recommend for my company that I run one licenced PBS where I get updates from the enterprise repository parallel to another PBS that is running on the no-subscription repository?
Generally we recommend only products that use the enterprise repository in a productive environment. That being said, the sync protocol we use is fairly stable and shouldn't change between minor versions. Also in that case your second PBS would only be used for a second copy, so the risks should be fairly minimal.

That was the original Plan, but when I had the choice to install another Ubuntu Server or a second PBS instance, PBS sounded smarter and easier in my head. Guess I will re-evaluate
I think it depends, if you are used to maintaining NFS/CIFS etc. shares already, this shouldn't be too much trouble. But yes, PBS' sync capability might be easier to use if you aren't.

from what I understand, I have to run a Garbage Collection. This only works if I have a existing Backup and doesn't happen while the Backup is running, so initially I need a bigger set of HDDS?
Or I'm misunderstanding the "Deduplication Factor" in the Summary tab, it showed 1.0 until I set a daily Garbage Collection Schedule which changed the Factor to 15.65
The garbage collector will just remove unused chunks. The way PBS works is as follows:
  • A backup snapshot is split up into chunks (typically they are 4 MiB in size).
  • A SHA-256 hash sum is calculated based on the chunk. You don't need to care about what this means exactly, the short explanation is, that this creates a unique ID of the chunk based on its contents. So if two chunks have the same ID, they must also have the same content.
  • An index of said IDs is created to keep track which chunks belong to a given snapshot.
  • The chunk is compressed and possibly encrypted.
  • The ID of a chunk is used to check if the datastore already has a chunk with the same ID. If it does, the chunk does not need to be stored, as its contents are already in the store. If it doesn't, a new chunk is stored.
The garbage collector basically just checks all indices of all snapshots to determine if a chunk is still in use. If it isn't, it is removed from the datastore. This is necessary as removing a snapshot will just remove its index. Not the associated chunks, as they could be used by other snapshots.

As for the de-duplication factor in the GUI, I am currently uncertain when exactly this is updated. It may be that this gets updated only after garbage collection tasks, as those are the only tasks that have an overview of how often chunks are re-used between all snapshots. However, that is cosmetic as the de-duplication and compression happens when a backup is created, not when the next garbage collection task is run.

Hope that makes sense.
 
thanks, is it correct that you only have a single target (unlike me, I rotate 4 disks)?
No, in this case i have 3 Disks with zpool zp_extern1 to zp_extern3.
The all have a zfs subvolume backup-extern with the mountpoint /backup-extern

root@srv01:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zp_extern3 4.13T 13.9T 192K none
zp_extern3/backup-extern 4.13T 13.9T 4.13T /backup-extern

root@srv01:~# zfs get mountpoint zp_extern3/backup-extern
NAME PROPERTY VALUE SOURCE
zp_extern3/backup-extern mountpoint /backup-extern local
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!