Hi,
So, there exists
these new config flags since version 3.x
allow me to be more precise here, so other users might not wonder why they do not see this mode: the
change-detection-mode
was introduced with
proxmox-backup-client
in version
3.2.5
and is backwards compatible to PBS hosts running a previous 3.x version, although some features are not available for snapshots created with the new mode on older server versions (e.g. browsing the contents via the WebUI).
I just tested it with a big LXC container (Backup via Proxmox gui) and the difference after the initial backup with metadata mode was day and night.
That LXC normally takes about 20 minutes to backup and now its done in 14 seconds.
Glad to see that you get such an nice performance improvement, if you are willing please share the task log for one of your backups with the new mode, this should provide us with some initial feedback regarding size, reused vs. re-encoded files, possible paddings ecc.
The metadata mode is obviously extremely fast but the question for me is: why even offer the data mode then?
There are mostly 2 reasons to provide this mode as well:
- Hosts/CTs with high frequency data changes will not see an improvement with the change detection mode, in the worst case this might perform even worse as the current default mode. This stems from the fact, that the change detection mode uses the previous metadata archive as reference to see if a file can be reused or if the file contents have to be rechunked (because the metadata changed). So if most/many of the files change frequently, there is the additional unwanted overhead of these lookups. The
data
mode allows you to gain the benefits of improved compressability of the medadata archive and the now not required additional catalog, while rechunking all files without reference lookup overhead.
- The
metadata
mode might introduce some wasted space (paddings) because of the reuse of already created and uploaded chunks on the server. This stems from the fact that a chunk boundary is not guaranteed to be aligned with a file boundary. So if a chunk can be reindexed, which however contains also contents from a file which did change and is therefore rechunked, this chunk contains some data not relevant for this snapshot (relevant for the previous snapshot however). Paddings are reduced internally to a threshold limit, can however take unwanted space. The data mode therefore allows to recreate a snapshot without paddings at any point in time.
Please not that backup snapshots are still self contained (for all 3 modes), the
metadata
mode only needs the previous snapshots metadata archive during snapshot creation, not for restore, ecc.
Is there anything to worry about in terms of safety? Is the chance of incomplete backups higher with the metadata mode?
This feature is currently flagged as experimental, not because it does not work as advertised, but because there might be some edge cases with respect to performance and usability we might not yet be aware of and find out with wider adoption. And backups are a critical part of your infrastructure, which you want to fully be able to rely on.
This is also why any form of testing and feedback is highly appreciated!
It is however
strongly suggested to see this as experimental for the time being, I recommend to also have backups using the current default mode at hand.
I really couldnt find much info other than the official doc, I guess that not much people have noticed it so far. Would be interested if someone has some more insights on this
The documentation is currently still limited, will however be improved upon. Also based on feedback, which might show what needs some more in depth explanation.
I hope this clarifies your questions and am happy for further feedback!
Edit: fixed some typos