Hi, can you please expand on this point? Before proceeding on the wrong path i guess i need to understand why that would not be the case. I understand the garbage collector must probably run for the space to be "freed" but, afte that, shall we not reach the expected space occupation?
because i did not talk about the "unique" space but the more simple space of the sum of all referenced chunks of a snapshot. this would also count the chunks used by other backups...
we did consider reference counting the chunks, but this has a different set of problems:
* we could have some big file with all chunks+refcounts: updating this would be very costly as it would have to be done on every backup/prune/etc. for every chunk. since that operation has to be locked, the different jobs block each other
* we could have an extra file for each chunk with the refcount, but that has also multiple problems:
- we double the amount of files in the datastore. for big datastores the number of files for the chunks are already very high, and adding another would double that. (e.g. 100TiB datastore => has now 26 million chunks would become 52 million...) on slower storages (especially high latency ones like hdds) this makes such datastores unusable
- each update of the refcount needs to lock the chunk count, so different jobs will block each other too
- pruning would take much longer, since it'd now have to update the refcount for each chunk instead of simply deleting the index files
- we still have no cheap way to get the unique space, since we now have iterate over all chunks, check if they have a refcount of 1 and count it then so big backup still need to lock->open->read->close many files which is costly
- less robust, since if i delete a snapshot manually in the datastore, the refcounts in the chunks are not updated then
holding the count in-memory is not really an options, 1. because it would blow up the memory usage of the daemon quite significantly, and 2. we must persist that to disk (for restarts, etc.) and also there can be multiple daemons of different versions running (when a task is ongoing but an update triggers a reload. in that case the old daemon lives until the task is finished)
maybe i'm overlooking some way, but we often do discuss such things internally in detail
- Understand how much space can be recovered deleting what, when needed (unique space on the DS)
i get it, but as i said not easy
Understand/Monitor the growth trend of backups for each VM and for the overall DS, meaning observing if today backup required additional 10 GB or the usual 1GB avg of the "whatever schedule you like for your recurrent backups". (I'm guessing, the dirty map size would do it if stored along with backup metadata)
that metric can be found in the overall datastore usage as well the client task log should contain the amount of data uploaded to the server
- Out of topic, but not too much Understand which VM is what! No kidding, please make PBS read the VM conf file and extrapolate/display the VM name along with the ID!
could be done with comments + vzdump hooks