PBS storage space consumed

jslanier

Member
Jan 19, 2019
40
0
11
39
Hey everyone,
I am looking at moving our higher ed org away from VMWare and completely over to Proxmox/Ceph/PBS/Cloud(for off-site backups). My question is about how PBS works. Is there just an initial full backup followed by an infinite amount of incrementals? Basically, if I have roughly 16TB of VMs running, and I back them up for 10 years, keeping 2 weeks of daily, 12 months of monthly backups and 10 yearly backups, how much storage am I going to be consuming? Does PBS have to take periodic full backups or can it just continue to take incremental backups indefinitely?

We are currently using VMWare/Veeam/Clumio (cloud with 10 year retention), and I cannot wait to get out of the VMWare business.
 

Dunuin

Famous Member
Jun 30, 2020
5,747
1,312
144
Germany
Hey everyone,
I am looking at moving our higher ed org away from VMWare and completely over to Proxmox/Ceph/PBS/Cloud(for off-site backups). My question is about how PBS works. Is there just an initial full backup followed by an infinite amount of incrementals?
All PBS backups are full backups, but because of replication nothing needs to be stored twice, so only the differences need additional space (so like with incremental backups).
Basically, if I have roughly 16TB of VMs running, and I back them up for 10 years, keeping 2 weeks of daily, 12 months of monthly backups and 10 yearly backups, how much storage am I going to be consuming? Does PBS have to take periodic full backups or can it just continue to take incremental backups indefinitely?
How much your backup will consume is hard to guess. Lets say your first full backup is 15% compressible, 10% deduplicatable and with 25% zeros than it will consume 8TB of storage after deduplication and zstd compression. Then lets say your VM data changes 100GB per day (and this is still 15% compressible, 10% deduplicatable) so each of the following 13 daily full backups will consume 75GB. So after two weeks with 14 full backups your PBS storage will consume 1x 8TB + 13x 75GB = 8.975 TB.

So if you want to guess how much storage you will need you first need to find out how compressible and deduplicatable your VMs are. How much of the VM disks is empty (zeros) and how much the VM data changes over time.
 
Last edited:
  • Like
Reactions: jslanier

jslanier

Member
Jan 19, 2019
40
0
11
39
All PBS backups are full backups, but because of replication nothing needs to be stored twice, so only the differences need additional space (so like with incremental backups).

How much your backup will consume is hard to guess. Lets say your first full backup is 15% compressible, 10% deduplicatable and with 25% zeros than it will consume 8TB of storage after deduplication and zstd compression. Then lets say your VM data changes 100GB per day (and this is still 15% compressible, 10% deduplicatable) so each of the following 13 daily full backups will consume 75GB. So after two weeks with 14 full backups your PBS storage will consume 1x 16TB + 13x 75GB = 16.975 TB.

So if you want to guess how much storage you will need you first need to find out how compressible and deduplicatable your VMs are. How much of the VM disks is empty (zeros) and how much the VM data changes over time.
Thanks for your reply. With the estimated math you did, wouldn't the two weeks of backup data be 1x 8TB + 13x 75GB? 8.975TB
 

Dunuin

Famous Member
Jun 30, 2020
5,747
1,312
144
Germany
Thanks for your reply. With the estimated math you did, wouldn't the two weeks of backup data be 1x 8TB + 13x 75GB? 8.975TB
Jup, your right. Would be 8.975TB. I edited it.

Also keep in mind that PBS was designed with local SSDs in mind. Because of the deduplication everything will be stored as small chunks of 4kb to 4 MB. So if you got 16TB of data that will result in atleast 4 million small files and each garbage collection, prune, reverify task will need to hash/read/touch all of those 4 million files again and again. So because of all those IOPS you really want to use local SSDs as your PBS datastore if used in production where downtime is a problem. If using HDDs or a SAN these maintaince tasks may run for hours (or incase of verify tasks even days).
 
Last edited:

jslanier

Member
Jan 19, 2019
40
0
11
39
Jup, your right. Would be 8.975TB. I edited it.

Also keep in mind that PBS was designed with local SSDs in mind. Because of the deduplication everything will be stored as small chunks of 4kb to 4 MB. So if you got 16TB of data that will result in atleast 4 million small files and each garbage collection, prune, reverify task will need to hash/read/touch all of those 4 million files again and again. So because of all those IOPS you really want to use local SSDs as your PBS datastore if used in production where downtime is a problem. If using HDDs or a SAN these maintaince tasks may run for hours (or incase of verify tasks even days).
Damn. That sucks. I have a Compellent SAN with mixed SSD/HDD and a couple Synology SANs as well that are only HDD. I guess I will have to try this out. I was going to do MPIO iSCSI to the Compellent and point to several 4TB volumes to do a RAIDZ1. I guess I could have a 1TB SSD only volume from the Compellent to be a cache disk too. Any recommendations based on my hardware?
 

Dunuin

Famous Member
Jun 30, 2020
5,747
1,312
144
Germany
Using SSDs as special devices to move metadata from the HDDs to the SSDs or using SSDs as L2ARC as metadata only cache should speed up GC and prune tasks but not verify tasks, because which chunk will be deleted is determined by its atime. But the verify task will need to hash the data, so SSD caching won't help there. But if you use ZFS that already got bit rot protection it is debatable if you need to enable verify tasks at all, because ZFS is already making sure that data won't corrupt over time.
And if you care about fast restores, so your downtime will be short, SSD caching also won't help much.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!