First steps with PBS, pleasant surprise

IsThisThingOn

Well-Known Member
Nov 26, 2021
291
117
48
After reading a lot about PBS, I decided to do some real-world tests and installed PBS on an old Dell Optiplex 3020 with a 60GB SSD as the boot disk and an old 8TB HDD.

The first hurdle I ran into was that somehow my local PVE was not able to connect to my remote PBS over my static IPv6 2001::
After a lot of tinkering with permissions and firewall rules, I realized that my problem was my PBS not having IPv6 :) Totally forgot about Proxmox being the only thing in my network not using SLAAC.

I was a little bit surprised to find out that you can't (or at least I think you can't) use the boot disk as a backup destination.

I later want to back up multiple different backups from different small clients. I was really impressed with how easy you can create API tokens, different datastores, and protect it from ransomware by only allowing backups. But what surprised me even more was the performance!

The second run took only 25 seconds for a 50GB Linux VM!

The manual makes it look like you can only really use it with SSDs. I get perfectly fine performance with one single HDD and ext4. Impressive.
But of course, that is not a long-term observation.

I was wondering how you guys experience performance with PBS and what backup configurations you use. Here are some of my thoughts and questions that came up:

1. I currently only do backups on the weekends. The reasoning was that VMs themselves are protected against hardware failures with ZFS. Actually needing these backups would be a worst-case scenario like a ransomware attack on the VM or a building burning down. And on most of the VMs, there is not important data on them, or if there is, it gets saved to somewhere else. But maybe I still should switch to daily backups. By only doing backups on the weekend, I thought about a crontab job to shut down PBS and BIOS to boot.

2. I always use "stop" to run a backup. That way, the VMs also get a reboot, which isn't bad for updates anyway.

3. I am thinking about running PBS on a non-ECC and non-redundant storage. Chances of a VM failing and the HDD of PBS also failing at the same time, I consider to be very low. Not zero, but low.

4. Instead of a non-redundant HDD, I could also use a ZFS mirror. But I don't know how ZFS performs in the long run on normal HDDs, especially in regards to fragmentation.
 
Last edited:
Heh. Welcome to the party, bud. It's about time.

You got right into all the thorny questions. Let me address 2 subjects.

Fleecing (local write cache during backups). - Turn this on.
You say that you stop your VMs to run a backup. That would not be ok for our customers. So we do the backups hot.
PBS used to have a really terrible problem with (sort of) locking disk writes during a backup, such that the VM would freeze.
If it was backing up a particular sector, and that sector was being written to, the write had to wait until the original data had been written to the backup system before the sector could then be overwritten ... so the system waits till a bit of data gets written to some lousy backup box on the other side of the country.
On the third tab of a backup job, you'll find a checkbox marked 'fleecing'. Enable fleecing in order have less impact on a VM being backed up.
Instead of waiting for writes to happen on the far storage, it will locally cache the data when needed in a 'fleecing' file. Hopefully that file will be on your fastest storage.

ZFS special vdev aka hybrid drive - Consider this if you are using spinners.
Via some ZFS magic, you can speed up all the metadata work on your zfs array by adding an ssd to hold that data.
https://klarasystems.com/articles/openzfs-understanding-zfs-vdev-types/
 
Last edited:
  • Like
Reactions: IsThisThingOn
I was a little bit surprised to find out that you can't (or at least I think you can't) use the boot disk as a backup destination.
Guess I was wrong. I can create a datastore with /rpool/testdatastore1 as path.
Also with /var/test2 as path. For some reasons, the latter will use 1.2GB during creation, while the rpool one only needs 25MB.
 
You want to put the datastore in its own zvol dataset.
(Of course I got this wrong and then people actually read what I wrote.)
zfs create rpool/testdatastore1

That will create the mount
/rpool/testdatastore1

That way you can apply settings to it. Its its own file system.

And if you are doing a datastore on the same storage as root you risk crashing the entire box if it fills up
Do this.
zfs set reservation=40G rpool/ROOT
 
Last edited:
  • Like
Reactions: IsThisThingOn
The second run took only 25 seconds for a 50GB Linux VM!
Thanks to changed only data written to backup datastore.
Even more faster if you snapshot backup (hot backup) where dirty qemu bitmap allow skip reading whole disk source VM.

fine performance with one single HDD and ext4
GC job is required to delete older backups.
Verify job is required too as existing data / chunk is never write again,
Both Jobs take time as data is chunked into many many files which are not sequential.
Total chunks files are tied to the used space and backup history.
Restore job will take more time than post first-backup.
 
This is a zfs dataset!
I assume the 128k default?
AFAIK PBS uses dynamic chunk sizes or you can set it 4MB fixed size.
While I don't know how big these dynamic chunks are, my guess would be also around 4MB.
So it is probably a good idea to set the recordsize to 16MB?

Both Jobs take time as data is chunked into many many files which are not sequential.
Also also ran them, and both finished under one minute. But yeah, my chunks fragmentation is probably still very good and not realistic.

That is something that worries me a little bit to go with ZFS (and HDDs).
If I get ZFS free space fragmentation combined with chunks fragmentation, the results could be dramatic.
If it wasn't for ZFS fragmentation, with a special vdev the HDD only has to read 4MB chunks randomly, that will probably perform pretty well.
That is probably why a single HDD with ext4 also performs pretty well, but there I can't offload metadata and suffer from bad metadata performance.

I think it comes down to how "bad" the PBS workload is for ZFS.
My guess is not that bad, since it is only incremental (so not much new data written), and even if the fragmentation turns bad later on, I could offload the backups to a disk, delete everything in the pool and start fresh without fragmentation.