[SOLVED] Deduplication on an dataset - has anyone this feature active?

fireon

Distinguished Member
Oct 25, 2010
4,489
469
153
Austria/Graz
deepdoc.at
Hello,

i read about Deduplication. I use Proxmox4 with ZFS. Really nice feature. It is also in LinuxZFS a good idea to activate this feature. I will activate it only in the home dataset. Here are alle data shared with NFS/Samba.

Code:
v-machines/home  dedup                 off                    default

Thanks and
Best Regards
 
Hi,

dedub is very expensive (you need very much cache) and it match on 128 kb blocks.
so it is only interesting if you have the same os image very often and you have a ssd witch provide arc2.
 
No, the dedup cache has an entry for each block, as ZFS is variable block size up to "recordsize". If all blocks are 128k then it would be nice.

Smaller blocks are worse, because their amount is higher meaning more entries in DDT.

Dedup is useless anyway, because storage is cheaper than RAM.

@fireon: Rule of thumb: you do not want dedup if you need to ask about it :)
 
Hi Fireon,

Only my very personal impression on ZFS:

I really like the idea, the possibilities and the cross-platform support. Caching tiering is also a great tool and I like the internal volume manager and of course the only compression.

I evaluated this over the past two month and I cannot get any stable environment for this. I started out with virtual machines with a reasonably low amount of data and it works great for extracted backups. I moved to commodity hardware, which crashed on OmniOS, FreeBSD and Proxmox itself randomly after an amount of ca. 100 GB of data (machine hung). I suspected RAM shortage and upgraded try one of our productive servers. I evacuated the machine (Proxmox 3.4) of all running VMs, plugged in an additional FC card and connected a 12 disk shelf of 450 GB SAS 15k and build a raid-z2 image. The machine has 128 GB RAM, 3 TB internal SSD storage (128 GB used for L2ARC) and 24 Cores. I worked up to ca. 100 GB of logic data (13 GB of physical data) on a 4K record size for extracted vma backups (normal vma and compressed vma is not good deduplicatable). After a while, the node got fenced and according to arcstat there was only 59 GB ARC used, L2ARC was only 4 GB. The crash itself was in kernel function

I personally will not use deduplication for our backups due to the very bad test behavior. Despite the big machine, I was not able to put more than roughly 100 MB/sec to disk and I had a constant load of 30 (spiked to over 70). I cannot advise to use this on normal VM data. I think it is not ready for "small scale" production yet.
 
We had similar problems wile testing with Freenas,NAS4free, Proxmox. The Problem was on everytest the SATAcontroller. You MUSST have an REALLY SATA Controller. This is the importest thing to work with ZFS. We buy always the one from IBM and flash it with the right firmware. Never has crashes wile copy.
IBM ServeRAID M1015: http://www.heise.de/preisvergleich/ibm-serveraid-m1015-90y4556-a815389.html
Will merge some testszenario before i activate it in production.
 
1. You are talking about SAS, not SATA.
2. Unless you find your controller in a dumpster, any recent (7-8 years) controller should do it. Event stupid SATA multipliers work, although not recommended.

The controllers have nothing to do with deduplication. The story is simple: DDT fits in memory? works. Doesn't fit? Disable it or upgrade memory.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!