PBS testimonial (hardware requirements are a joke)

Nov 26, 2021
492
274
83
Clickbait headline, but I think it is true :)
Maybe others can chime in with their setups and why they need beefy hardware.
Here is my PBS testimonial.

Reading the documentation I was a little bit scared. Especially the storage part seemed brutal.
Turns out it is not that bad and I misunderstood how PBS works.
I wanted to backup 3 destination, each roughly hosting 5-10 VMs.
Mixture of Windows and Linux VMs.

Since some of the PVE are from my clients, I wanted to do it right.
AMD Ryzen 5 8500G, two 120GB SSDs for boot and two 2TB nvme for datastore.
Changed to recordsize to 4M for the datastore (which I think should be a changed default for PBS).
I was for a long time on the edge, if I should just get two HDD and combine them with L2ARC, but then decided that if I need more than 2TB in the future, I could still do that and use the NVME as svdev or L2ARC.

Turns out, the storage performance is not that important. I thought that PVE would create a chunk checksum, send that to PBS and then PBS would check some kind of metadata table or DB on the disk, to see if the checksum is already there. I think what happens instead is that the PVE will get a hashtable at the beginning. So it will check against a local hashtable and only send the chunks that are not on that table. Thank god, otherwise high latency VDSL backups would probably become a problem.

So in reality, not my storage is the bottleneck, but the 1GBit/s WAN of PBS and the 40Mbit/s upload of one PVE location.
No matter if garbage collection or prune jobs, they are under 10 seconds.

Maybe the random 4mb chunk reads when doing a restore profit a little bit from having SSDs, but probably even that would have been fine with HDDs.

If you are going to use an ACME, do it first! I did it afterwards only to realize that ACME changes the fingerprint when the cert gets renewed.
It is also easier to just get a real cert, then you don't have to use fingerprints on PVE and can just leave it empty.

Some things about permission I love, others I don't really like.

Love:
You can give PVE only the DatastoreBackup permission and have a ransomware protected backup system by doing so.

Don't like:
I struggled with api tokens and users.
I want to create an api token for PVE1 that has permission to access /datastore/PVE1namespace.
So I naturally assumed just creating a token with permission to /datastore/PVE1namespace is enough.
Instead I have to create a token tied up to a user.
Why is it combined with a user?
So I have to create the user PVE1user and then create the token with permission for /datastore/PVE1namespace.
And that is not enough either, I have to give PVE1user access to /datastore/ so PVE can read it. Which seems rather stupid, since I explicitly told PVE to use PVE1namespace.

Deduplication is pretty poor. Not because of PBS of course.
Looks like Windows just shuffles and changes too many data even during normal usage.
I currently have 18x backups of each VM and my deduplication factor is only 11x.

All in all, I am very happy with PBS! :)

Unlike PVE, for PBS pricing is a little bit too steep for me. But thankfully that isn't a problem since you let me use the community edition.
Keep up the good work!
 
Last edited:
Deduplication is pretty poor.
Obviously that depends on the modified data per interval and the "prune"-settings. And it needs a number of backups present to work at all :-)

Both my dayjobs cluster and my homelab show around 30 - with basically "everything daily". If I would run hourly backups the Dedup-Factor would probably rise.
 
Hi,

For deduplication YMMV, for example, if I look at our PBS, it's pretty good with more than 90% of Linux VMs (more than 60% of these Linux machines are Debian), and retention from 14 to 365 days.

2025-12-10T07:57:06+01:00: Original data usage: 10.225 PiB
2025-12-10T07:57:06+01:00: On-Disk usage: 14.638 TiB (0.14%)
2025-12-10T07:57:06+01:00: On-Disk chunks: 15675165
2025-12-10T07:57:06+01:00: Deduplication factor: 715.29

We have critical VMs which backups run every 2 hours, and the other run at least one per day (VM => 93 Groups, 78200 Snapshots).

Best regards,
 
wait is that the price in the us for to build a pc?
in sweden you need to pay like x2 or even x3 that amount..
Taxes about 25% then alot of other stuff like the ram increase SSD increases..
 
If it is not in a rack it's a toy IMHO :)

We have found that a Dell PowerVault MD1220 stuffed full of 900GB 10k RPM spinners has more than adequate performance for ~30VM's. Have several shelves hanging of the back of the server on dedicated redundant 6Gbps SAS connections and then a zpool per shelf use dRAID (8D+2P+1S) with the shelves dedicated to particular "clients" where a "client" is a department/faculty etc. We get about 17TiB usable from a shelf with 900GB drives. The verification takes a while, but over a 24h cycle the machine sits idle for nearly 50% of the time.
 
since I got asked about my PBS from @news, here are some more detailed informations:
Tip: Don't use newer Intenso High for anything important. I have 100% failure rate after 2 years. First they get slow as hell, then bad sectors show up long before the TBW is reached, and the SSD obviously doesn't proper maintenance and only finds errors when doing a full read on all the sectors e.g. with dd - or with ddrescue, to get the data somewhere safer. My four years old 480GB is still fast and has no errors, but every Intenso High we know of that has been bought three years or less ago has problems, every model from 120GB to 960GB.
Seems they enshittify their SSD lines every now and then, using worse chips. Heard they use QLC now, though the datasheet still says TLC. (And I'm pretty sure that a few years ago, their Top line was TLC, and the High line MLC.)
 
Maybe the random 4mb chunk reads when doing a restore profit a little bit from having SSDs, but probably even that would have been fine with HDDs.
From my experience, HDD-only storage for the PBS datastore really is a no go. As soon as you're on SSD, performance will be doable. I have been using mirrored consumer SATA SSD's (860 EVO, with DRAM cache) and this proved to be an infinite amount faster compared to HDD-only. A pool with HDDs combined with NVME svdev works great as well for larger datastores. From there, you can scale up to your requirements for operation speed and reliability.
 
Last edited: