Backup Size Reporting

infinityM

Member
Dec 7, 2019
172
1
18
30
Hey Guys,

Ok so I have 1 EXTREMELY SMALL but EXTREMELY frustrating problem with proxmox backup server...

First off... 1 Million bonus points to Proxmox team for building the software and getting it to run so smooth!

I need to determine how I'm going to structure our backups... But I can't see the size differences between the revisions...
It's only showing the actual restore size of the backup, Bud I need to see the backup's file size...
Is there anyway to get it to show that too?
 
Hi,

I need to determine how I'm going to structure our backups... But I can't see the size differences between the revisions...

We do not really have that info at hand, and computing it would create quite some IO and be relatively slow.
Note here also that the difference does not tells you anything about actual deduplication ratio on a datastore, as that is not limited to groups.

You get the total deduplication ratio information on garbage collection runs, as there we need to check out all indexes and referenced chunks anyway, so we get it for free there.

What would you actually do with such an information? As said, it does not gives you a full picture and can change arbitrarily depending on VM/CT IO behavior.

Does the "datastore full estimation" on the summary page helps here? It uses the derivative of space usage to project a rough estimation.
 
Last edited:
Bud I need to see the backup's file size...

In a deduplicated storage there's not really such a thing, you could have thousands of identical backups, which would be effectively using the space of one, is now every backup "real" size 1/1000s, what if you then prune away half of it, did the backups "real" size then increased to 1/500s ? Such comparissions without a specific context just do not make sense, you cannot pick out one snapshot calculate the size of the referenced chunks and use that for planning, that defeats the benefits of deduplicaiton.

We suggest to avoid using similar metrics and space usage planning as with "plain" non-deduplicated backup strategies. Rather use a generous prune setting, keeping much more than you normally would, and see how real space usage and the days until the store is estimated to be full develops, you can always tighten the prune settings then, once you observe a higher usage than planned.
 
Hey GUys,

The reason behind it is simple...
I'd like to be able to see how much disk space us used per server to keep x number of retentions. Spesifically server's that don't change much, but do take up a lot of space...

It's more for planning purposes than anything else...
 
As said, use the Overview "Estimated Full" and the statistics on the datastore to get the disk usage growth rate and thecurrent physical disk usage.

There's no such thing as a single image file size, the whole point in deduplication is sharing that as much as possible, also between backup groups on the same datastore..

I already gave you tips how to plan those things in with PBS, for example:

We suggest to avoid using similar metrics and space usage planning as with "plain" non-deduplicated backup strategies. Rather use a generous prune setting, keeping much more than you normally would, and watch how real space usage, and the days until the store is estimated to be full, develops. You can always tighten the prune settings then, once you observe a higher usage than planned.
 
I totally agree with thomas that the chunk based deduplication way takes away that need of image file size backup planning.
However, it might be useful for the OP to have a look at what the verify job does since it does tell more about the size of the snapshot backup job.
As far as i remember it tells the backup jobs chunk numbers, duplications, size etc.
..I could be wrong..
 
Verify does not tells that, IIRC, but Garbage Collection GC does, as stated in this thread already :)

You get the total deduplication ratio information on garbage collection runs, as there we need to check out all indexes and referenced chunks anyway, so we get it for free there.

For example, you get an output like the following at the end of a GC task:

Code:
Original data usage: 139.00 GiB
On-Disk usage: 7.02 GiB (5.05%)
On-Disk chunks: 3413
 
  • Like
Reactions: oversite
Verify does not tells that, IIRC, but Garbage Collection GC does, as stated in this thread already :)



For example, you get an output like the following at the end of a GC task:

Code:
Original data usage: 139.00 GiB
On-Disk usage: 7.02 GiB (5.05%)
On-Disk chunks: 3413

yes, and that is superb info.
But i ment since the OP talked about the size of the single backup jobs and how much space they take to see the difference from one backup to another for the same guest,
"I need to determine how I'm going to structure our backups... But I can't see the size differences between the revisions...
It's only showing the actual restore size of the backup, Bud I need to see the backup's file size..."

I thought the verify json told that when verifying that backup job, how much uploaded chunks doesn't it?
"chunk_upload_stats": {
"compressed_size": 3969092,
"count": 17,
"duplicates": 1,
"size": 71303857
}
 
I think I saw, during a PVE-VM-to-PBS backup, something like how many MB/GB were transfered and how much data were reused (deduped). I'm quite note sure that those data were 100% accurate, and they will necessarly become less and less accurate when GC kicks in, but could it be usefull to store this in order to approximate how much new storage is used at each backup run ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!