ZFS Deduplication

bitblue.lab

Member
Oct 7, 2015
75
0
6
Has anyone used Deduplication in ZFS pools ? Is it going to work after data are written ?

I have read a lot about ram memory but I have enough and also have SSD drive as cache with 80k read IOPS which also will be used for deduplication.
 
No, ZFS deduplication doesn't work after the data is written. Are you sure you have enough RAM? At least with FreeBSD's ZFS implementation, they recommend a minimum of 5 GB RAM per TB of storage, on top of any other RAM requirements (e.g., for ARC, or for actually running your VMs)
 
Sometimes when server is rented...ram or l2arc can cost you less than to buy another server...but if it not works after data is written then I would not benefit from it.
 
Is deduplication going to work if I backup my vm-s in nfs storage then enable deduplication in my pool and restore them in the pool ?
 
Once you've enabled deduplication, it will be effective whenever you write data to the pool (which, because ZFS is copy-on-write, includes any time you change data on the pool).
 
Is deduplication going to work if I backup my vm-s in nfs storage then enable deduplication in my pool and restore them in the pool ?

If I backup VM-s to NFS and enable deduplication and restore them ? Is it going to work because at the moment a lot of duplicated data are written which means they will not overwrite all ( so with backup and restore I think I can save up more space ).
 
Yes. All you data writes will be market as deduplication enabled. Have you tested your filesystem or volume with "zdb -S" ? Dedublication can help to save space then you do backup like with rsync in different catalogs ( /path/filesystem/backup-1 /path/filesystem/backup-2 ... etc ) In other way better to use compression.
 
Have tested simulation commands and ratio is showing as 4.33 for deduplication which is a lot save. One thing that I dont understand is how its going to save space when VM-s have been created in proxmox storage was not enabled for thin provision!
 
Have tested simulation commands and ratio is showing as 4.33 for deduplication which is a lot save. One thing that I dont understand is how its going to save space when VM-s have been created in proxmox storage was not enabled for thin provision!
deduplication is a block level feature. It knows nothing about file systems or images.
 
If the checksum of one or more blocks are identical only one block will be physical stored while the rest of the blocks is replaced with a reference. Since ZFS is copy-on-write there is no maintenance overhead with this since a write will either result in a block or a reference to an existing block.
 
I'm going to have to encourage you to NOT enable deduplication, it can cripple your system with the overhead it takes to do this. If you don't follow this advice, you better have boatloads of RAM. Even on our production NAS at work where we have hundreds of gigs of ram, we do NOT run deduplication on them ... the gains do not outweigh the drawbacks in performance. It is much cheaper to add more storage than it is to buy the amount of ram it will take to enable this feature and have it run sanely. Opting for compression is usually a much better option, that has very little in the way of a performance hit.
 
I'm going to have to encourage you to NOT enable deduplication, it can cripple your system with the overhead it takes to do this. If you don't follow this advice, you better have boatloads of RAM. Even on our production NAS at work where we have hundreds of gigs of ram, we do NOT run deduplication on them ... the gains do not outweigh the drawbacks in performance. It is much cheaper to add more storage than it is to buy the amount of ram it will take to enable this feature and have it run sanely. Opting for compression is usually a much better option, that has very little in the way of a performance hit.

I have to test it how its going to react, at the moment I have more than 8GB ram for the 2x2TB mirrored drives and also SSD drives read as l2arch if its needed for ram.

One thing that I don't understand is my vm-s are not in thin provision storage container...so how its going dedup to reduce size when this hard drives have fixed GB space when they get created ( restored from backup etc )
 
Forget about thin provision. It has nothing to do with dedup. You write a block with hash H1. If you write again a block that has the same hash (restore, new VM, whatever) it will simply put a link to the first block instead of writing the full block.

So you want to have 8GB of RAM for: ARC, L2ARC headers AND DDT (dedup table that keeps all the hashes of all the written blocks in the pool)?

Please, use the above advice: DON'T!
 
Okay guys I am not going to test dedup in my production server but I will play with it in a test server...in my production server I have 64GB DDR 4 RAM, Intel Xeon E5-1650v3, (2x2TB Drives mirrored) with 2x300 SSD drives which one partition of one SSD is 50GB for zil and from other ssd drive one partition of 150GB for read cache.

I sell VPS to friends that develop websites and other usage so the ratio for dedup was 4.46% without compression on and I thought I will save lots of space from windows and centos vm-s and also have better I/O with dedup because same Operating systems get read/write in same hdd sectors.
 
@sigxcpu: Hopefully he meant 4.46 (in factor, not in per cent). It also depends on the alignment of the virtual machines.

@bitblue: Still working on the ZFS ... you're like me, can't be stopped by the fascinating world of ZFS.

I have to say that I spend a lot of time trying out all the high class features of ZFS but not all are useable for me at the moment. I will stick to non-deduplicating, compressing hard disks.

Yet I have to say: Don't use Dedup. I tried it on a 24 core 128 GB-RAM with two 12x SAS-15k Shelfs over two 4Gbit connections and it repeatedly crashed my node. Does not matter how full my disks were or what OS is tried: Illuminos, FreeBSD and Proxmox (v3 and v4) all crashed (this machine and less powerfull ones). I never had problems when I did not enable dedup.

I also confirmed that L2ARC on SSD is slow if you do not have enough RAM, because the amount of ARC used to manage the L2ARC is non-negligible and this yields to less ARC. I had poor throughput than without using L2ARC. The L2ARC was also filling very slowly.

Another thing is compression. It is amazing that you will speed up throughput by enabling LZ4 compression. It seems odd at first, but it is really measurable. For compressible data, the amount of cpu time to compress is less than the time it would take to write the whole file to disk. This can also be seen if you backup a VM from proxmox as LZO, GZIP or without compression. LZO is almost always the fastest. Parallel compression would yield even better results.
 
The reason for increased performance using LZ4:
1) Writes are done in chunks with the same size as the disk block size meaning no wasted IO
2) Compression reduces the required size to write to disk meaning less IO used
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!