Backup Size Reporting

infinityM · Sep 14, 2020

Hey Guys,

Ok so I have 1 EXTREMELY SMALL but EXTREMELY frustrating problem with proxmox backup server...

First off... 1 Million bonus points to Proxmox team for building the software and getting it to run so smooth!

I need to determine how I'm going to structure our backups... But I can't see the size differences between the revisions...
It's only showing the actual restore size of the backup, Bud I need to see the backup's file size...
Is there anyway to get it to show that too?

t.lamprecht · Sep 14, 2020

Hi,

infinityM said:
I need to determine how I'm going to structure our backups... But I can't see the size differences between the revisions...

We do not really have that info at hand, and computing it would create quite some IO and be relatively slow.
Note here also that the difference does not tells you anything about actual deduplication ratio on a datastore, as that is not limited to groups.

You get the total deduplication ratio information on garbage collection runs, as there we need to check out all indexes and referenced chunks anyway, so we get it for free there.

What would you actually do with such an information? As said, it does not gives you a full picture and can change arbitrarily depending on VM/CT IO behavior.

Does the "datastore full estimation" on the summary page helps here? It uses the derivative of space usage to project a rough estimation.

t.lamprecht · Sep 14, 2020

infinityM said:
Bud I need to see the backup's file size...

In a deduplicated storage there's not really such a thing, you could have thousands of identical backups, which would be effectively using the space of one, is now every backup "real" size 1/1000s, what if you then prune away half of it, did the backups "real" size then increased to 1/500s ? Such comparissions without a specific context just do not make sense, you cannot pick out one snapshot calculate the size of the referenced chunks and use that for planning, that defeats the benefits of deduplicaiton.

We suggest to avoid using similar metrics and space usage planning as with "plain" non-deduplicated backup strategies. Rather use a generous prune setting, keeping much more than you normally would, and see how real space usage and the days until the store is estimated to be full develops, you can always tighten the prune settings then, once you observe a higher usage than planned.

infinityM · Sep 17, 2020

Hey GUys,

The reason behind it is simple...
I'd like to be able to see how much disk space us used per server to keep x number of retentions. Spesifically server's that don't change much, but do take up a lot of space...

It's more for planning purposes than anything else...

t.lamprecht · Sep 18, 2020

As said, use the Overview "Estimated Full" and the statistics on the datastore to get the disk usage growth rate and thecurrent physical disk usage.

There's no such thing as a single image file size, the whole point in deduplication is sharing that as much as possible, also between backup groups on the same datastore..

I already gave you tips how to plan those things in with PBS, for example:

t.lamprecht said:
We suggest to avoid using similar metrics and space usage planning as with "plain" non-deduplicated backup strategies. Rather use a generous prune setting, keeping much more than you normally would, and watch how real space usage, and the days until the store is estimated to be full, develops. You can always tighten the prune settings then, once you observe a higher usage than planned.

oversite · Sep 18, 2020

I totally agree with thomas that the chunk based deduplication way takes away that need of image file size backup planning.
However, it might be useful for the OP to have a look at what the verify job does since it does tell more about the size of the snapshot backup job.
As far as i remember it tells the backup jobs chunk numbers, duplications, size etc.
..I could be wrong..

t.lamprecht · Sep 18, 2020

Verify does not tells that, IIRC, but Garbage Collection GC does, as stated in this thread already

t.lamprecht said:
You get the total deduplication ratio information on garbage collection runs, as there we need to check out all indexes and referenced chunks anyway, so we get it for free there.

For example, you get an output like the following at the end of a GC task:

Code:

Original data usage: 139.00 GiB
On-Disk usage: 7.02 GiB (5.05%)
On-Disk chunks: 3413

oversite · Sep 18, 2020

t.lamprecht said:
Verify does not tells that, IIRC, but Garbage Collection GC does, as stated in this thread already

For example, you get an output like the following at the end of a GC task:

Code:

Original data usage: 139.00 GiB On-Disk usage: 7.02 GiB (5.05%) On-Disk chunks: 3413

yes, and that is superb info.
But i ment since the OP talked about the size of the single backup jobs and how much space they take to see the difference from one backup to another for the same guest,
"I need to determine how I'm going to structure our backups... But I can't see the size differences between the revisions...
It's only showing the actual restore size of the backup, Bud I need to see the backup's file size..."

I thought the verify json told that when verifying that backup job, how much uploaded chunks doesn't it?
"chunk_upload_stats": {
"compressed_size": 3969092,
"count": 17,
"duplicates": 1,
"size": 71303857
}

pfoo · Oct 6, 2020

I think I saw, during a PVE-VM-to-PBS backup, something like how many MB/GB were transfered and how much data were reused (deduped). I'm quite note sure that those data were 100% accurate, and they will necessarly become less and less accurate when GC kicks in, but could it be usefull to store this in order to approximate how much new storage is used at each backup run ?

Search

Search

Backup Size Reporting

infinityM

Well-Known Member

t.lamprecht

Proxmox Staff Member

t.lamprecht

Proxmox Staff Member

infinityM

Well-Known Member

t.lamprecht

Proxmox Staff Member

oversite

Renowned Member

t.lamprecht

Proxmox Staff Member

oversite

Renowned Member

pfoo

Renowned Member

We value your privacy