Backup Optimization and Deduplication

LnxBil

Distinguished Member
Feb 21, 2015
8,776
1,387
273
Saarland, Germany
Hi everyone,

I'd like to know how you handle backup optimization and deduplication. Is there some golden rule for that?

Currently, I optimize my VMs once a month with cleaning the harddisk of temporary files and zeroing the filesystem for minimal harddisk footprint. Some linux VMs are also equipped with virtio-scsi-adapter to use fstrim/discard, but I use clustered LVM, so there is no real space saving benefit of this yet. It is planned to use gfs for cluster filesystem, but I had no time to try this in my test cluster environment.

I plan to backup my machines without compression and write it to a volume with internal deduplication (ZFS or OpenDedup) over network. Any suggestions on software (e.g. FreeBSD ZFS over Linux-ZFS, etc.)? I know that I need at least 4 GB of RAM per 1 TB of backup storage with a 4k block size for deduplication - depending on the used software.

Best,
LnxBil
 
For best ZFS experience I would recommend Omnios or SmartOS. The 4 GB of RAM per TB storage is highly debatted but a general rule of thumb is: The more RAM the merrier. Disabling compression will only degrade performance if you choose LZ4. This has been documented in many tests. Deduplication is only interesting if your data is highly identical and mostly consists of text. This is not the case when used as storage for virtualization so avoid deduplication since this requires a lot of RAM.

Read more here about dedup:
http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html
http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe
 
Thank you mir for your explanations. I'll try Omnios and SmartOS.

Yet only to be clear: I want to use deduplication only for backups (and not on my virtualization hardware), which are by default full backups in proxmox and therefore there will be mostly identical on a 4k block level (if not compressed, what they'll not be).
 
For the record:
Please be aware that the vma and especially vma.lzo files are not made to be deduplicated. I tried and only achieved 1.1 deduplication rate of 3 TB of images (backup once a week). It only worked reasonable as long as I unpack the vma files (vma extract). If you use clever snapshotting and rsync --no-whole-file --inplace, you can optimize further and do not even need to use deduplication to get multiple block usage.

Be aware that for best deduplication rates, you need to change the recordsize to 4K (assume you have also 4K block sizes and alignment in your VMs). This 32x the RAM usage for 'ordinary stock deduplication on default 128K recordsize', really! This is huge!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!