ZFS Deduplication

bitblue.lab · Nov 8, 2015

Has anyone used Deduplication in ZFS pools ? Is it going to work after data are written ?

I have read a lot about ram memory but I have enough and also have SSD drive as cache with 80k read IOPS which also will be used for deduplication.

danb35 · Nov 9, 2015

No, ZFS deduplication doesn't work after the data is written. Are you sure you have enough RAM? At least with FreeBSD's ZFS implementation, they recommend a minimum of 5 GB RAM per TB of storage, on top of any other RAM requirements (e.g., for ARC, or for actually running your VMs)

sigxcpu · Nov 9, 2015

And considering that 1TB of storage is cheaper than 5GB of RAM...

bitblue.lab · Nov 9, 2015

Sometimes when server is rented...ram or l2arc can cost you less than to buy another server...but if it not works after data is written then I would not benefit from it.

bitblue.lab · Nov 9, 2015

Is deduplication going to work if I backup my vm-s in nfs storage then enable deduplication in my pool and restore them in the pool ?

danb35 · Nov 9, 2015

Once you've enabled deduplication, it will be effective whenever you write data to the pool (which, because ZFS is copy-on-write, includes any time you change data on the pool).

bitblue.lab · Nov 9, 2015

bitblue.lab said:
Is deduplication going to work if I backup my vm-s in nfs storage then enable deduplication in my pool and restore them in the pool ?

If I backup VM-s to NFS and enable deduplication and restore them ? Is it going to work because at the moment a lot of duplicated data are written which means they will not overwrite all ( so with backup and restore I think I can save up more space ).

Nemesiz · Nov 10, 2015

Yes. All you data writes will be market as deduplication enabled. Have you tested your filesystem or volume with "zdb -S" ? Dedublication can help to save space then you do backup like with rsync in different catalogs ( /path/filesystem/backup-1 /path/filesystem/backup-2 ... etc ) In other way better to use compression.

bitblue.lab · Nov 11, 2015

Have tested simulation commands and ratio is showing as 4.33 for deduplication which is a lot save. One thing that I dont understand is how its going to save space when VM-s have been created in proxmox storage was not enabled for thin provision!

mir · Nov 11, 2015

bitblue.lab said:
Have tested simulation commands and ratio is showing as 4.33 for deduplication which is a lot save. One thing that I dont understand is how its going to save space when VM-s have been created in proxmox storage was not enabled for thin provision!

deduplication is a block level feature. It knows nothing about file systems or images.

bitblue.lab · Nov 11, 2015

mir said:
deduplication is a block level feature. It knows nothing about file systems or images.

I know but how is going to save me space when hard drives of VM-s are fixed in space without thin provision ?

mir · Nov 11, 2015

If the checksum of one or more blocks are identical only one block will be physical stored while the rest of the blocks is replaced with a reference. Since ZFS is copy-on-write there is no maintenance overhead with this since a write will either result in a block or a reference to an existing block.

davec · Nov 11, 2015

I'm going to have to encourage you to NOT enable deduplication, it can cripple your system with the overhead it takes to do this. If you don't follow this advice, you better have boatloads of RAM. Even on our production NAS at work where we have hundreds of gigs of ram, we do NOT run deduplication on them ... the gains do not outweigh the drawbacks in performance. It is much cheaper to add more storage than it is to buy the amount of ram it will take to enable this feature and have it run sanely. Opting for compression is usually a much better option, that has very little in the way of a performance hit.

bitblue.lab · Nov 11, 2015

davec said:
I'm going to have to encourage you to NOT enable deduplication, it can cripple your system with the overhead it takes to do this. If you don't follow this advice, you better have boatloads of RAM. Even on our production NAS at work where we have hundreds of gigs of ram, we do NOT run deduplication on them ... the gains do not outweigh the drawbacks in performance. It is much cheaper to add more storage than it is to buy the amount of ram it will take to enable this feature and have it run sanely. Opting for compression is usually a much better option, that has very little in the way of a performance hit.

I have to test it how its going to react, at the moment I have more than 8GB ram for the 2x2TB mirrored drives and also SSD drives read as l2arch if its needed for ram.

One thing that I don't understand is my vm-s are not in thin provision storage container...so how its going dedup to reduce size when this hard drives have fixed GB space when they get created ( restored from backup etc )

sigxcpu · Nov 11, 2015

Forget about thin provision. It has nothing to do with dedup. You write a block with hash H1. If you write again a block that has the same hash (restore, new VM, whatever) it will simply put a link to the first block instead of writing the full block.

So you want to have 8GB of RAM for: ARC, L2ARC headers AND DDT (dedup table that keeps all the hashes of all the written blocks in the pool)?

Please, use the above advice: DON'T!

bitblue.lab · Nov 11, 2015

Okay guys I am not going to test dedup in my production server but I will play with it in a test server...in my production server I have 64GB DDR 4 RAM, Intel Xeon E5-1650v3, (2x2TB Drives mirrored) with 2x300 SSD drives which one partition of one SSD is 50GB for zil and from other ssd drive one partition of 150GB for read cache.

I sell VPS to friends that develop websites and other usage so the ratio for dedup was 4.46% without compression on and I thought I will save lots of space from windows and centos vm-s and also have better I/O with dedup because same Operating systems get read/write in same hdd sectors.

sigxcpu · Nov 11, 2015

4.46% is peanuts. You get more with lz4 compression.

LnxBil · Nov 13, 2015

@sigxcpu: Hopefully he meant 4.46 (in factor, not in per cent). It also depends on the alignment of the virtual machines.

@bitblue: Still working on the ZFS ... you're like me, can't be stopped by the fascinating world of ZFS.

I have to say that I spend a lot of time trying out all the high class features of ZFS but not all are useable for me at the moment. I will stick to non-deduplicating, compressing hard disks.

Yet I have to say: Don't use Dedup. I tried it on a 24 core 128 GB-RAM with two 12x SAS-15k Shelfs over two 4Gbit connections and it repeatedly crashed my node. Does not matter how full my disks were or what OS is tried: Illuminos, FreeBSD and Proxmox (v3 and v4) all crashed (this machine and less powerfull ones). I never had problems when I did not enable dedup.

I also confirmed that L2ARC on SSD is slow if you do not have enough RAM, because the amount of ARC used to manage the L2ARC is non-negligible and this yields to less ARC. I had poor throughput than without using L2ARC. The L2ARC was also filling very slowly.

Another thing is compression. It is amazing that you will speed up throughput by enabling LZ4 compression. It seems odd at first, but it is really measurable. For compressible data, the amount of cpu time to compress is less than the time it would take to write the whole file to disk. This can also be seen if you backup a VM from proxmox as LZO, GZIP or without compression. LZO is almost always the fastest. Parallel compression would yield even better results.

sigxcpu · Nov 13, 2015

You can increase the L2 filling rate. There are some parameters.

mir · Nov 13, 2015

The reason for increased performance using LZ4:
1) Writes are done in chunks with the same size as the disk block size meaning no wasted IO
2) Compression reduces the required size to write to disk meaning less IO used

ZFS Deduplication

Member

Renowned Member

Well-Known Member

Member

Member

Renowned Member

Member

Renowned Member

Member

Famous Member

Member

Famous Member

New Member

Member

Well-Known Member

Member

Well-Known Member

Distinguished Member

Well-Known Member

Famous Member