determine vm backup size/growth - how ?

RolandK

Renowned Member
Mar 5, 2019
967
195
88
51
how do i determine which VM contributes most to backup growth, i.e. where the daily delta/increment is high ?

reason is, i want to put that VM into a different backup job/ds with a different gc/prune setting.

i have some datastore which is growing fast but i don't like to browse through all VMs backup jobs to see whats going on there.
 
mhmm.. there is no builtin way to see this aside from the looking into the task logs.
this metric is not trivial to calculate since it can be *very* misleading, for example:

when looking at the actual chunks saved:
for two vms that are near identical, only the first one would actually save the new chunks, the second one would simply use it (but reversing the order would also show the increase reversed)

when looking at the dirty-bitmap size:
a vm may have a big amount of data to backup each time, but it could happen that all this data is already on the server and does not actually increase the storage itself...

so as you see, it is hard to give good numbers for this. in general i would simply monitoring the datastore usage, and act accordingly
 
hello dominik,

thanks for explaining.

yes, this is complicated, but only monitoring datastore usage is a little bit dissatisfying.

if you have tons of VMs which are being saved to a datastore you will never know (without complex inspection at the PVE side) which one is the "evil one" and causing massive datastore growth ( for example because somebode is saving big compressed database dump file insive VM every day...)
you will spend hours on this in a more complex environment...

do you know borg-backup?

i think it should work in a similar manner regarding deduplication/block-storage and borg does have a feature to tell about the "unique chunks" for a backup.

have a look at:

https://borgbackup.readthedocs.io/en/stable/usage/info.html

by counting the chunks and unique chunks (exising ones and added ones) for every backup in a backup group and perhaps visualizing unique chunk count graphicallly, i think you could get at least an impression on which VM is "heavy weighted" in backup growth - and that's what counts for backup maintenance...

for borg, i have some script which creates some report for each backuped system and it's at least a good indicator to get an impression of backup traffic/volume for each client

Code:
db-server:
archive-name                           orig-size    compr-size        unique        added      deleted unique-chunks
archive-2020-04-10_1435                 62.92 GB      60.18 GB       6.24 GB     41.96 GB          0 B       166102
archive-2020-04-13_1154                 65.80 GB      63.05 GB       5.92 MB      7.98 GB          0 B       172110
archive-2020-04-14_1059                 66.77 GB      64.01 GB       5.90 MB    527.31 MB          0 B       172925
archive-2020-04-15_0857                 67.73 GB      64.97 GB       5.97 MB    485.70 MB          0 B       173740
archive-2020-04-16_0903                 68.86 GB      66.10 GB       3.21 GB      7.02 GB          0 B       179530
archive-2020-04-17_0920                 69.52 GB      66.74 GB       3.19 GB      3.74 GB          0 B       183944
archive-2020-04-18_0747                 70.66 GB      67.88 GB       3.17 GB      3.91 GB          0 B       188746
archive-2020-04-19_0453                 71.22 GB      68.42 GB       6.69 GB      6.69 GB          0 B       194126
----------------------------------------------------------------------------
All archives:                          544.03 GB     521.45 GB      72.31 GB
 
do you know borg-backup?
sure

i think it should work in a similar manner regarding deduplication/block-storage and borg does have a feature to tell about the "unique chunks" for a backup.
the problem is that this is an extremely expensive operation. you have to have a map for every snapshot of chunks, and remove all that are stored more than once, then go to the chunks and count the size
this is not something that we can save anywhere since it changes on every backup for potentially every other existing snapshot, and calculating it on demand would probably
be equal cpu/disk cost-intensive like a garbage collection but with an even bigger memory footprint (saving all mappings in memory etc)

for borg this might work ok, because they only expect a single source to write to a datastore, but on pbs, you have potentially hundreds of thousands of snapshots...
 
ok, thanks for pointing it out.

i didn't expect it to be that complex, so we will need to find a way to get such information from the source , i.e. pve systems...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!