How to get the exactly backup size in proxmox backup

parker0909

Member
Aug 5, 2019
64
0
6
33
Hi All,

We have a question, how to know exactly backup size for proxmox vm. I try to find the information in the proxmox VE but it seem the size should not correct and it showing the size bigger than vm hdd size, on the other hand, it seem the size also not showing proxmox backup server as well. May i know any idea how to get how many disk space using for particular vm. Thank you.

Parker
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
6,518
735
133
33
Vienna
It is not possible to get the 'exact' backup size for a backup in pbs, due to the deduplication.
the backup is split in chunks and deduplicated for the whole datastore (so other backups can reuse it), thus it does not really contribute to a single backup but possibly many
 

oversite

Active Member
Jul 13, 2011
144
23
38
Hi All,

We have a question, how to know exactly backup size for proxmox vm. I try to find the information in the proxmox VE but it seem the size should not correct and it showing the size bigger than vm hdd size, on the other hand, it seem the size also not showing proxmox backup server as well. May i know any idea how to get how many disk space using for particular vm. Thank you.

Parker
As dominik says, due to deduplication it's a bit difficult. However, depending on what you need to know, I find it very good info to know how much data was backuped on each snapshot. I wish the amount of transferred data during backup was easily available on both PVE and PBS gui afterwards.
If you run a backup, it ends with telling you "INFO: transferred 1.07 GiB in 5 seconds (218.4 MiB/s)". This number is what your backup transferred to PBS, and it gives a hint of the size added to the backup storage. This number is available on the PBS for each snapshot in the index.json.blob file you see under each snapshot. Again, this is not the size of that snapshot, it relates to all the other snapshots of the VM, but it is a good hint depending on why you want to know the size of the backup.
 
  • Like
Reactions: weehooey

stephen322

New Member
Aug 19, 2021
2
0
1
41
Some information is better than none, no?
I would find 2 size fields very useful:
  • Backup Size - Size of the backup ignoring dedup: sum of all the chunks in use by the backup. This would be a more appropriate size to be returned to Proxmox VE, rather than a useless Virtual HD size definition.
  • Delta Size - Size of the delta at the time of the backup: new chunks written.
I would think both should be easily calculated during the backup, unless I'm missing something...
 

SINOS

New Member
Jul 19, 2021
21
6
3
21
Backup Size - Size of the backup ignoring dedup: sum of all the chunks in use by the backup. This would be a more appropriate size to be returned to Proxmox VE, rather than a useless Virtual HD size definition.
I agree that the VHD size is not really useful, but this metric is hard to gather due to the deduplication. PVE / proxmox-backup-client would need to measure how much (non-zero) data was actually read and provide this metric to PBS, because PBS is unable(?) to calculate this on its own as the deduplicated pool is shared across backups.
And what about dirty-bitmaps? That'd ruin this metric instantly, because the backup client does not need to read everything.
Maybe some tricky math can still lead to a useful value, but still... makes things even more complicate.

Delta Size - Size of the delta at the time of the backup: new chunks written.
Assuming there is an existing backup, this would be rather easy to calculate then.
But counting the amounts of new chunks written by PBS would be another extra metric, because PBS only writes new chunks if the pool does not already have it stored - which means no other backup already created that chunk.
Thats also making this metric kinda useless, because other backups could also reference the same chunks in the future and therefore report way less "new chunks written" in its backup job, which makes the "new chunks written" value a heavily misleading information imho.


I'd prefer something like the cumulated size of all chunks referenced by an backup as additional "size" metric - this could provide a potentially more accurate number and on top, PBS can calculate *and update* this value on itself and does not rely on valid input data from backup clients to maintain those metrics.
Something like the "referenced" value that ZFS provides.
1629359164296.png
Image is from https://docs.oracle.com/cd/E19253-01/819-5461/gazss/index.html

@dcsapak What do you think about this? Is something like that practicable?

On a sidenote:
When calculating the referenced size of the backup, it'd be interesting to have ZFS compression ignored (or additionally shown) because a high compression rate could also let the backup look way smaller than it actually is.
For example, I have a recordsize of 4M and using zstd I get usual compression ratios between x2.0 and x3.0.
IIRC there was some way to read the uncompressed size of a file, ignoring ZFS' transparent compression and showing the original filesize and there was some way to read the filesize that ZFS reports, which is smaller if file is compressed.

Same applies for PBS built-in compression, that'd also change the displayed size.

I think a "referenced" size would be useful for guessing the required time to restore the VM.
If i have a 2TB VHD, will it take a minute to restore because it is 99% zeroes (which will restore extremely fast on compressing storage like ZFS or Ceph) or will it take like hours because there are actually tons of data inside?
 
Last edited:

stephen322

New Member
Aug 19, 2021
2
0
1
41
I agree that the VHD size is not really useful, but this metric is hard to gather due to the deduplication.
"Backup Size - Size of the backup *ignoring dedup*: sum of all the chunks in use by the backup."
I'm not concerned with dedup. Like everyone says, it's difficult and expensive to calculate, so leave dedup as GC calculated stats for now. I want to know how much disk space the single backup would consume if it were the only backup that exists.

And what about dirty-bitmaps? That'd ruin this metric instantly, because the backup client does not need to read everything.
Good point. Assuming it can't be live-counted during backup, worst case scenario is that it is a relatively inexpensive filesystem metadata calculation of the chunks. If it's still too expensive during backup, leave it blank and have a refresh button on the size field. Or at the very least calculate the field during a Verify or GC.

I'd prefer something like the cumulated size of all chunks referenced by an backup as additional "size" metric - this could provide a potentially more accurate number and on top, PBS can calculate *and update* this value on itself and does not rely on valid input data from backup clients to maintain those metrics.
I'm pretty sure we're now talking about the same thing...Not sure what there is to update, except maybe the Backup Group size?


Re: Delta Size
Assuming there is an existing backup, this would be rather easy to calculate then.
Looks like it's already done! I just checked index.json.blob -- just slap the contents of "chunk_upload_stats" - "compressed_size" into a Delta Size column and call it a day.

Thats also making this metric kinda useless, because other backups could also reference the same chunks in the future and therefore report way less "new chunks written" in its backup job, which makes the "new chunks written" value a heavily misleading information imho.
There's plenty of use-cases when you know you're not dealing with dedup-able data. In general still very useful information:
  • Shows how much disk space was used for this particular backup run.
  • Over a few backups, shows a general growth/change rate for the VM at a glance, even considering far-away dedup data.
  • Helps find a particular backup when you know there were a lot of changes.
It's not misleading; just document what the number represents -- everyone knows about dedup and can take it into consideration.
 

gerco

New Member
Sep 24, 2021
2
0
1
41
Here is a script that worked to get me the information I was looking for. It shows the cumulative on size of all chunks associated with a .fidx file. It doesn't support .didx files, so it's useless for containers but I'm sure similiar principles would apply for those should someone choose to do the work for it.

Bash:
datastore=/bulkpool/backups
backup=vm/100/2021-09-19T06:00:02Z

cd $datastore/$backup

for x in `xxd -s +4096 -p -c 32 *.img.fidx`
do
  ls -l $datastore/.chunks/${x:0:4}/$x | awk '{print $5}'
done | paste -sd+ | bc

Essentially it get all the chunk hashes from the .fidx, then "ls"'es each of those hashes from the .chunks directory and adds up the size. It's awful and terribly inefficient, but it got me what I wanted to know. Hopefully it will also work for someone else.
 
Last edited:

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
6,518
735
133
33
Vienna
Here is a script that worked to get me the information I was looking for. It shows the cumulative on size of all chunks associated with a .fidx file.
just for your information, the only difference between the result of your script and the 'full size' we show in the webui is the compression of the chunks, since you count duplicate chunks multiple times AFAICT
you'd first have to pipe the chunks through 'sort -u' or something like that to count each chunk only once
 

gerco

New Member
Sep 24, 2021
2
0
1
41
Since I was restoring a backup over a WAN link and I wanted to know how long it was going to take, the compressed size of the chunks is exactly what I wanted to know. The UI just says “32GB” for the backup size, but the amount of data to transfer was only “6GB”.

Good point about the duplicate chunks, that probably overcounted some empty chunks and maybe a few non-empty ones. It was close enough to be useful, though.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!