may "zstd --rsyncable" be a handbrake for vzdump full backups ?

I'm pretty sure compressing without --resyncable will effectively disable deduplication: a single bit changed early in the input stream will most likely change all of the following compressed data of that stream. Deduplication over many backups and across all backups is the long run usually the most efficient compression algorithm - better than any other compression one would apply on the input data. So switching off compression is probably better (faster and more space efficient) than just disabling --resyncable...
 
Last edited:
does anybody have real world data on compressed vzdump deduplication ratio ?
If you do daily (or even hourly...) backups of mostly static data it can be arbitrarily high. While doing garbage collection all identical chunks of all backups in the system will be merged together and stored only once in the whole system.

So it depends on your data input. I have 200:1 for longer running daily backups with mostly static file share data.

Important is that you setup verification and garbage collection jobs that run from time to time (via webgui).
Otherwise each backup run will consume 100% of it's data volume...
 
While doing garbage collection all identical chunks of all backups in the system will be merged together and stored only once in the whole system.
Please note that garbage collection only removes unused chunks. The sharing of identical chunks already happens when making the backups: the checksum of the data is identical and therefore the filename of the chunk is identical and therefore the data does not even have to be send to PBS.
 
Please note that garbage collection only removes unused chunks. The sharing of identical chunks already happens when making the backups: the checksum of the data is identical and therefore the filename of the chunk is identical and therefore the data does not even have to be send to PBS.
Oh, even better. Did not know that!

One question: Are all new chunks first stored in here /var/log/proxmox-backup/tasks/... during the upload before they are moved to the actual backup volume?

Checking with the fatrace tool I monitored all file system activity on the root drive and while running a backup I observed a ton of high frequency I/O in this directory: /var/log/proxmox-backup/tasks/

I have a slow mechanical root drive and I'm under the impression the root drive is the bottle neck when creating new backups.

During backups there is a lot of I/O (root disk fully saturated) and only verily little apparently temp data - on my setup the whole directory is just only a few dozen Megabytes.

Maybe this could be put on a RAM disk as tasks would die on a reboot anyway?
Or does this have to be persistently stored - probably on some small Optane disk?
 
errr - this is about zstd compression and --rsyncable for ordinary vzdump backup here and NOT related to pbs at all.

i adjusted the subject of this post to better make that clear.
 
errr - this is about zstd compression and --rsyncable for ordinary vzdump backup here and NOT related to pbs at all.

i adjusted the subject of this post to better make that clear.
Did not know this thread was so narrow minded on vzdump only. Thanks for clarifying.
Yourself asked for real world deduplication ratios. As far as I know vzdump does not support deduplication.
Would recommend to switch to pbs. But I guess this is off-topic as well...
 
Last edited:
i'm using pbs intensively.

but did you every hear about the 3-2-1 rule for backup?

would you really bank on a single, incremental forever backup solution if you run a million dollar business on top of it ?
 
i'm using pbs intensively.

Ok, then I guess you know it all.
Sorry for responding to your questions.

but did you every hear about the 3-2-1 rule for backup?

Sure. But I think this is off topic.

would you really bank on a single, incremental forever backup solution if you run a million dollar business on top of it ?
yes, I do. And many others just as well.

The option that you CAN do incremental backups does not mean you HAVE to do incremental backups all the time.
You can do hourly incremental backups and daily full backups - whatever your requirements are.
And then you can use pbs and vzdump in parallel, just to be sure... whatever you think is appropriate.

But I'm sure you know this just as well. I'll move on.
 
Last edited:
>You can do hourly incremental backups and daily full backups - whatever your requirements are

you can't do/force full backups in pbs without vm reboot - can/do you ?
 
>You can do hourly incremental backups and daily full backups - whatever your requirements are

you can't do/force full backups in pbs without vm reboot - can/do you ?
Every backup to PBS is a full backup (and deduplicated, so you do need additional remote copies to be 3-2-1-safe). You can backup while the VM is running but it very much depends on QEMU Guest Agent. It does a "filesystem freeze" but maybe you have to do more specific things for specific software running inside the VM.

EDIT: QEMU keeps track of changed blocks on the virtual disks while the VM is running and skips reading&compressing unchanged blocks as they would be deduplicated anyway, but it's still a full backup.
 
Last edited:
Sure, it's not a complete new copy and that's why I included the "deduplicated, so you do need additional remote copies to be 3-2-1-safe" remark. It's also not an incremental backup as it does not depend on earlier backups (to restore it).
Indeed you need something like vzdump to create a whole new (unshared) backup, which also happens to be a single file that you can copy, archive and take with you.
 
I would consider deduplicated backups as 'real' 'full' backup.

I do not see any relevant technical difference as long as the deduplication software has no bugs.
But even without deduplication there could some software bugs that hinder successful recovery...
So relying on different backup software system could be well justified.

I guess it's more a philosophical than technically funded discussion what is considered a real "full backup" and what not.
 
I guess it's more a philosophical than technically funded discussion what is considered a real "full backup" and what not.

maybe - but what you call full backup in pbs, veeam calls "synthetic full backup" (veeam was one of the first to introduce such)

https://helpcenter.veeam.com/docs/backup/vsphere/synthetic_full_hiw.html?ver=120

others also call it that way:

https://www.backblaze.com/blog/what...ntal-differential-and-synthetic-full-backups/ ( -> Synthetic Full Backup Pros and Cons )
https://www.nakivo.com/blog/what-is-synthetic-backup/
https://www.msp360.com/resources/blog/synthetic-full-backup-explained/
https://iosafe.com/blog/active-full-backup-vs-synthetic-full-backup-for-virtual-machines/
 
If you consider a "full backup" as single failure domain in that one single system failure - either Software or Hardware - could damage access to some or probably all backups in this system - then we probably agree, that one pbs server instance should be considered as ONE SINGLE domain of potential fatal failure.

Having a second pbs server could protect against Hardware failures - but probably not against systematic software failures in the deduplication code (or anywhere else) as both identically build backup systems could suffer from exact the same software bug at exactly the same time. So the second pbs probably does not add any additional security to the data.

One note - even more about the topic of this thread: if you do not trust some deduplication algorithm than you might also do not want to trust any compression algorithm like zstd just as well: There is no guarantee that every random input data that get's compressed by this algorithm will be be decompressed to exactly the same original data... could be that zstd will cause in all backups some specific error that renders the whole backup useless..
 
Last edited:
  • Like
Reactions: RolandK
does anybody have real world data on compressed vzdump deduplication ratio ?
Yes, I did experiments back in 2017 and it was useless for deduplication. If only one block changes (added or removed) the whole file from that point on was changed and had to be retransmitted. In other cases, rsync ran for days to find some parts that should not have been retransmitted.

I shared by views and possible solution to this in my ProxTalks 2017 presentation (german).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!