Backup files seems to be updated all the time.

tdl

New Member
Sep 14, 2023
4
0
1
Hi

I am using Proxmox Backup Server 3.0-2 and I am backing up VMs to a remote server (NFS) with encryption. This remote NFS server has a tape backup (using Bareos).

I have done a full backup of the pbs backup folder, then I saw that every incremental is including all files over and over again (so bottom line every incremental is the same as a full backup) - while I expect that incremental will only backup new chunks (and of course some other meta data files potentially.

I looked at MD5sum of the files and most of them are unaltered. However a few of them have changed MD5 the next day (which I do not expect as chunks should be "immutable" in my understanding as it is the hash of the content).

I also see that the stat command on the chunks indicates that change date is frequently updated (which triggers the incremental backup to back up the file again)

This is one example (file created 28 Aug 23, and changed just now).

Code:
Access: 2023-09-14 21:48:49.180762573 +0200
Modify: 2023-08-28 10:33:38.893704229 +0200
Change: 2023-09-14 21:48:49.180762573 +0200
Birth: 2023-08-28 10:33:38.889704164 +0200


Does the verify of backup jobs (incorrectly) update the change of date?

What can explain that the MD5 of chunks changes over time?

Side note: the Verify job is "green" and the GC job is running during the night.

Thanks!
Thierry
 
Hi,
when backing up stuff to PBS, it is done incrementally. We build the image/pxar archive client-side and then only send the new chunks to the PBS instance.
The Verification job should only change the access time, as it only hashes the content and compares it. But the GC updates the access time and the change time (because we use utimensat, which also sets the change time) in the first phase.

Check here for more information about the GC: https://pbs.proxmox.com/docs/maintenance.html#garbage-collection
 
Last edited:
Hi Gabriel,
Thanks for your answer. So doing GC less frequently will reduce the occurrence of the issue. I could also configure my backup to only rely on the creation time and size for the chunks, however I am scared because of the MD5 changes.
What can explain the change of the MD5 of some files?
Thanks!
Thierry
 
Hi,
can you show me how you got the md5sum of a chunk?

Edit: PBS definitely does not change the chunk content, so the md5sum of the content doesn't change. It could be that Bareos uses the metadata+content in the md5sum calculation, which obvisouly changes because of the access time and change time update.
 
Last edited:
Hi Thanks for your answer.

Here is the procedure I used for the creation of the MD5:
Code:
cd <my PBS folder>/.chunks
find . -type f -exec md5sum {} \;  > ~/myChecksums.md5

This is what I do to validate the files
Code:
cd <my PBS folder>/.chunks
md5sum --ignore-missing  --quiet -c ~/myChecksums.md5

The md5 file was created on Sep 11. Here is relevant stat output
Code:
Modify: 2023-09-11 14:18:15.516003783 +0200
Change: 2023-09-11 14:18:15.516003783 +0200
 Birth: 2023-09-11 10:46:44.331264838 +0200

You'll find below the key elements of the output of the stat command of some files that have a non matching MD5 today (I added for some files their size).
Very surprisingly the birth date is AFTER the time I created the md5sum !
So either I am very unlucky and have a lot of SHA256 collisions ;-), or some files are really recreated/altered.
Can it be because the backup is encrypted?

Code:
  File: ./.chunks/0049/0049699087a94fd6fb2f89fa84715e84d2c19cea369cf5e9e8da052a5b170264 (size 484 399)
Access: 2023-09-13 21:33:57.164799025 +0200
Modify: 2023-09-12 08:28:05.338325030 +0200
Change: 2023-09-13 21:33:57.164799025 +0200
 Birth: 2023-09-12 08:28:05.334324963 +0200

  File: ./.chunks/0080/0080f4f9c1fcec83d5169c8a12aa266a6f01f78a099625a210e3a23b688dcde9 (size 463 472)
Access: 2023-09-15 14:27:28.058020699 +0200
Modify: 2023-09-15 14:27:28.062020764 +0200
Change: 2023-09-15 14:27:28.062020764 +0200
 Birth: 2023-09-15 14:27:28.058020699 +0200

File: ./.chunks/00ad/00adcf61c369667927b815bdb3785d239b11dc4c08586129a36453e81a631882 (size 505 549)
Access: 2023-09-15 09:27:28.642125758 +0200
Modify: 2023-09-15 09:27:28.646125821 +0200
Change: 2023-09-15 09:27:28.646125821 +0200
 Birth: 2023-09-15 09:27:28.642125758 +0200

  File: ./.chunks/00e6/00e6cf6348d6245425ea40895ab03194d234568dd7448b9747c50192c2ba2703 (size 470 614)
Access: 2023-09-15 11:27:18.699088813 +0200
Modify: 2023-09-15 11:27:18.703088876 +0200
Change: 2023-09-15 11:27:18.707088939 +0200
 Birth: 2023-09-15 11:27:18.699088813 +0200

  File: ./.chunks/012f/012fb986ee4256928ca08ac85110ff6143318b44e24e204a098577823a250ca6 (size 382 776)
Access: 2023-09-14 21:35:51.108349933 +0200
Modify: 2023-09-13 08:29:28.945289194 +0200
Change: 2023-09-14 21:35:51.108349933 +0200
 Birth: 2023-09-13 08:29:28.941289126 +0200

  File: ./.chunks/0159/01594f9d851ad14afb640d0555ec5c0f1932a4ceb483d8cf71a8131256f0891a
Access: 2023-09-15 09:27:21.642013942 +0200
Modify: 2023-09-15 09:27:21.642013942 +0200
Change: 2023-09-15 09:27:21.646014006 +0200
 Birth: 2023-09-15 09:27:21.638013879 +0200

  File: ./.chunks/01bb/01bb4bd2feab77dbd352d892c3565806d72c257fb7e564681fea184b6f4bf2a7
Access: 2023-09-15 08:27:55.488852062 +0200
Modify: 2023-09-15 08:27:55.492852127 +0200
Change: 2023-09-15 08:27:55.492852127 +0200
 Birth: 2023-09-15 08:27:55.488852062 +0200
 
A chunk is only modified if:
* the previous chunk has a size of 0 (could happen if PBS crashes), then obviously a backup writes data into that chunk
* the GC removes a chunk (or the verify job marks a chunk as bad and the GC throws it away) and then a new backup or sync with the same content is done. The same content will have the same digest and thus have the same chunk-name.

As you have encrypted backups, we don't look at the size and prioritize smaller backups, because we can't check if they contain the same data.
If you are interested the relevant code is here: https://git.proxmox.com/?p=proxmox-...c73230ffd610d8510a91ef0657a89e49;hb=HEAD#l442
 
Thanks a lot for your clear explainations

I stopped the md5 computation after folder 7a3e (so not the half of all the chunks - about 275 500 files).
I have run the validation of those md5, there are 466 md5 that do not match (ignoring the files that have been removed)!

We then agree that the chances of having so many chunks "garbage collected" and then recreated just a few day after with the same SHA256, but a different content should be zero.

I will setup fsnotify on the .chunks folder to trace what is deleted, potentially recreate or modified (and maybe see which process is doing it - I believe it will be NFS though ;-) ).

I will also dig into the code when I have some spare time.

Any other idea what I could do to validate this (maybe I am missing something?) or what could cause this? (note: every nightly verify job end up successfully - but I do not revalidate backups that have been validated less 30 days ago).
 
Last edited:
We then agree that the chances of having so many chunks "garbage collected" and then recreated just a few day after with the same SHA256, but a different content should be zero.
Yes.
I will setup fsnotify on the .chunks folder to trace what is deleted, potentially recreate or modified (and maybe see which process is doing it - I believe it will be NFS though ;-) ).
That's a good idea.

Any other idea what I could do to validate this (maybe I am missing something?) or what could cause this? (note: every nightly verify job end up successfully - but I do not revalidate backups that have been validated less 30 days ago).
Note that when using encryption, the verification job only checks the chunk CRC. We use the plain text + encryption key to generate the digest and as the server doesn't have the key, it can't generate a digest to compare the data (https://pbs.proxmox.com/docs/technical-overview.html#verification-of-encrypted-chunks). The CRC is taken from everything (encrypted data) except the chunk header.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!