question file level backup

wire2hire

Well-Known Member
Oct 20, 2020
60
6
48
41
Hey,

i make a backup from a cifs share (mounted on the pbs host) ( 1tb many small files) with the option metadata. The first run , was good 7 hours, but thats ok. But now the incremental backups, tooks long.

first i must limit the prfile , when not it comes to a copy error. but i takes hours , but only one files change or 10 files add. I though with the metadata option is comparse if the files are the same in the list and then copy only if not. Or have i choise the wrong option?

with othes options, example macrium, its takes incremental 10 minutes , pbs takes 5 hours.

i see two tasks when i start the incremental , objects read are one , where all chunks read again , but why , when he wrote the metadata checksum?

Greets
 
Last edited:
Hi,
Hey,

i make a backup from a cifs share (mounted on the pbs host) ( 1tb many small files) with the option metadata. The first run , was good 7 hours, but thats ok. But now the incremental backups, tooks long.
that is the worst case you can get. You will be severely limited by the I/O needed to simply walk the file system tree and check the metadata for all of the files. Since this is a network share, there will be increased latency as compared to a local filesystem.

first i must limit the prfile , when not it comes to a copy error. but i takes hours , but only one files change or 10 files add. I though with the metadata option is comparse if the files are the same in the list and then copy only if not. Or have i choise the wrong option?
You are using it in a rather non-ideal case, the lookahead cache is limited because of the open file handle limitations by your cifs share and the additional latency of the network share is not ideal as well. So your limitation here is mostly the fetching of the file metadata to see if something has changed. Some users reported running the client inside the WSL as an alternative which might give you better performance than using the CIFS share, but there is currently no no native Windows client implementation.

with othes options, example macrium, its takes incremental 10 minutes , pbs takes 5 hours.
So you do see a speedup, just not what you expected. But I suspect the tool you mentioned is natively running on Windows and not over the network share?
In that case you are comparing two completely different situations, which would explain why it performs so much better, although I do not know the tool and how it detects unchanged files.

i see two tasks when i start the incremental , objects read are one , where all chunks read again , but why , when he wrote the metadata checksum?
No, only the metadata file is read, not the full content. In order to detect which files have been changed since the last backup, the metadata has to be compared. Therefore the client fetches these from the previous snapshot. If you have many small files, the metadata is rather large in relation to the actual file payloads, so the gains of the metadata mode reduced.

The question is are you locked in on Windows for the server side? Maybe providing the CIFS share via either a Linux backed host (where you can use the native client) or a VM on top of Proxmox VE (where you can use dirty bitmap tracking) will be the better option if you would like to use PBS.
 
yes is where on windows , but with also mount a network share ( macrium) , so its where the same situation.

No, only the metadata file is read, not the full content. In order to detect which files have been changed since the last backup, the metadata has to be compared. Therefore the client fetches these from the previous snapshot. If you have many small files, the metadata is rather large in relation to the actual file payloads, so the gains of the metadata mode reduced.

but on the task i see a read with all chunks and not only read the metadata. or i have make a mistake. i give always the meta data option to the cli command. i can see the mpax file (350mb)

1. make a full backup with option metadata
2. make a backup (incremental) with the option metadata

3= its read only the metadatafile and compare with the file list on share? ( so i know it from other tools)
4= backup files the are new or changed

in the object read log , i see :

register chunks in share.mpxar.didx as downloadable
get chunk...
download chunk.
get chunk..

my backups are encrypted, where there the problem?

5. the share is on a truenas box, where it better to install there the proxmox backup client ?
 
yes is where on windows , but with also mount a network share ( macrium) , so its where the same situation.
So you are saying it is running on the same host as the PBS? Or natively on Windows on a Windows share?

No, only the metadata file is read, not the full content. In order to detect which files have been changed since the last backup, the metadata has to be compared. Therefore the client fetches these from the previous snapshot. If you have many small files, the metadata is rather large in relation to the actual file payloads, so the gains of the metadata mode reduced.

but on the task i see a read with all chunks and not only read the metadata. or i have make a mistake. i give always the meta data option to the cli command. i can see the mpax file (350mb)
Yes, you will have 2 files, mpxar contains metadata only, ppxar contains file payloads only. Both are however stored as chunked up index files, for PBS it does not matter.
1. make a full backup with option metadata
2. make a backup (incremental) with the option metadata

3= its read only the metadatafile and compare with the file list on share? ( so i know it from other tools)
Yes, what happens is that once you start the pbs client with change detection mode metadata, it tries to fetch the previous snapshot of the backup group you are backing up to, and if it exists and contains the previous metadata archive, that is used as reference to detect unchanged files for the new run.

The client then scans all your files on your source and compares it to the metadata from the previous backup.
4= backup files the are new or changed
yes, and the new metadata stream will be chunked and uploaded (on-demand, skipping if nothing changed for the metadata archive), while for the payload stream there is no need to re-read and re-process unchanged files.

in the object read log , i see :

register chunks in share.mpxar.didx as downloadable
get chunk...
download chunk.
get chunk..
this is the metadata chunks for the previous backups being downloaded. Since your metadata is rather large, I expect there is a bit of room for improvement here, since the client only caches some of then chunks already downloaded. In your case it might have sense to increase the cache and reduce re-download overhead if the metadata is located in vastly different chunks.

Here I do see potential for improvement: Please do open an enhancement request for this at https://bugzilla.proxmox.com. Making the cache size configurable on the client side or scale it according to the mpxar archive size is something which we can look into.

my backups are encrypted, where there the problem?
This will cause some additional overhead, yes. But I do not suspect it to be the limiting factor.

5. the share is on a truenas box, where it better to install there the proxmox backup client ?
Running the client on the native filesystem should give you performance improvements, yes.
 
Out of curiosity to get some idea of possible bottle necks, how long does a time ls -lahR . on the CIFS mountpoint on the PBS take?
 
the macrium whas on a windows server native , with mounted file share .

when the backup are done , i try the cli command. Now the incremental needs same time , like the full before.
 
Code:
Change detection summary:

- 2307735 total files (0 hardlinks)

- 2285685 unchanged, reusable files with 1.302 TiB data

- 22050 changed or non-reusable files with 3.619 GiB data

- 163.842 MiB padding in 93 partially reused chunks

share.ppxar: reused 1.302 TiB from previous snapshot for unchanged files (511119 chunks)

share.ppxar: had to backup 763.744 MiB of 1.306 TiB (compressed 373.843 MiB) in 20523.81 s (average 38.105 KiB/s)

share.ppxar: backup was done incrementally, reused 1.305 TiB (99.9%)

share.mpxar: had to backup 350.247 MiB of 350.247 MiB (compressed 47.013 MiB) in 20524.12 s (average 17.475 KiB/s)

Duration: 20527.37s


the cli commands runs :

real 48m56.535s

user 0m21.198s

sys 1m17.160s
 
Last edited:
Okay, so this we can take as estimated for walking the filesystem contents. Can you also post the reader task log for the last backup run, obtained from the task log on PBS. Would be interested to see how many chunks there where actually downloaded.
 
do you need the full log? its 90mb or specific lines?
No, in that case it is rather clear what is going on and also points towards where the bottleneck resides most likely. Parts of the metadata archive get probably re-downloaded over and over again because they do get evicted from the cache, this will definitely need improvement! Could you nevertheless do a wc -l <tasklog> on the reader task log from the PBS, to get an idea about how often this happens in your case?