ZFS Dedup not working on backups

jmcorrea · Oct 7, 2019

Hello!

We have 2 Proxmox host and we use the included vzdump tool to backup the VMs and containers that run on them. Our backup location is a ZFS server shared over NFS to both Proxmox hosts. This ZFS server has a RAID-10 like configuration with 4 disks, 40GB SSD cache partition, 20GB RAM and around 7.5TB of usable disk space. The backups are configured to use lzo compression, and we have compression disabled on the ZFS server.

Now, the issue we are facing and really cant see why is that ZFS deduplication is hardly saving any space. We keep up to 15 copies of the same VM at all times, so my though was that deduplication would save lots of space, but it doesnt.

This is the storage.cfg file on our hosts:

Code:

root@server:~# cat /etc/pve/storage.cfg

nfs: bkp_una-semana
    export /backup-pool/una-semana
    path /mnt/pve/bkp_una-semana
    server 172.16.16.220
    content backup
    maxfiles 7

nfs: bkp_dos-semanas
    export /backup-pool/dos-semanas
    path /mnt/pve/bkp_dos-semanas
    server 172.16.16.220
    content backup
    maxfiles 14

nfs: bkp_shared
    export /backup-pool/shared
    path /mnt/pve/bkp_shared
    server 172.16.16.220
    content backup
    maxfiles 2

And this is the information of the ZFS:

Code:

[root@zfs ~]# zfs list
NAME                      USED  AVAIL  REFER  MOUNTPOINT
backup-pool              15.8T  1.74T   104K  /backup-pool
backup-pool/dos-semanas  5.78T  1.74T  5.78T  /backup-pool/dos-semanas
backup-pool/shared       4.31T  1.74T  4.31T  /backup-pool/shared
backup-pool/una-semana   5.64T  1.74T  5.64T  /backup-pool/una-semana

[root@zfs ~]# zpool status -D backup-pool
  pool: backup-pool
 state: ONLINE
  scan: none requested
config:

    NAME                                        STATE     READ WRITE CKSUM
    backup-pool                                 ONLINE       0     0     0
      mirror-0                                  ONLINE       0     0     0
        wwn-0x61866da060a4140024ffab24dcf460a0  ONLINE       0     0     0
        wwn-0x61866da060a4140024ffac17eb788c94  ONLINE       0     0     0
      mirror-1                                  ONLINE       0     0     0
        wwn-0x61866da060a4140024ffacf4f8a09606  ONLINE       0     0     0
        wwn-0x61866da060a4140024ffadcd059719fd  ONLINE       0     0     0
    cache
      wwn-0x6002248024fe90ddf48ef8ab67378f5d    ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 131831224, size 709B on disk, 229B in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1     126M   15.7T   15.7T   15.7T     126M   15.7T   15.7T   15.7T
     2    26.5K   3.31G   3.31G   3.31G    54.4K   6.80G   6.80G   6.80G
     4        2    256K    256K    256K        8      1M      1M      1M
   512        1    512B    512B      4K      536    268K    268K   2.09M
 Total     126M   15.7T   15.7T   15.7T     126M   15.7T   15.7T   15.7T

[root@zfs ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          19880        1087       18603           9         189       18516
Swap:             0           0           0

[root@zfs ~]# zfs get dedup
NAME                     PROPERTY  VALUE          SOURCE
backup-pool              dedup     on             local
backup-pool/dos-semanas  dedup     on             inherited from backup-pool
backup-pool/shared       dedup     on             inherited from backup-pool
backup-pool/una-semana   dedup     on             inherited from backup-pool

[root@zfs ~]# zfs get compression
NAME                     PROPERTY     VALUE     SOURCE
backup-pool              compression  off       local
backup-pool/dos-semanas  compression  off       inherited from backup-pool
backup-pool/shared       compression  off       inherited from backup-pool
backup-pool/una-semana   compression  off       inherited from backup-pool

[root@zfs ~]# zfs get sharenfs
NAME                     PROPERTY  VALUE               SOURCE
backup-pool              sharenfs  rw=@172.16.16.0/24  local
backup-pool/dos-semanas  sharenfs  rw=@172.16.16.0/24  inherited from backup-pool
backup-pool/shared       sharenfs  rw=@172.16.16.0/24  inherited from backup-pool
backup-pool/una-semana   sharenfs  rw=@172.16.16.0/24  inherited from backup-pool

As far as I know, ZFS dedup works on block write level, so the fact that the backups are made over LAN instead of locally shouldnt be an issue, neither the fact that LZO is enabled for vzdump. This is why I really cant see the cause of this lack of deduplication. Any idea or comment is welcome!

Thank you very much!

Juan

ozdjh · Oct 8, 2019

Hi

We're about to run up and test a very similar environment later in the week. We looked at dedup for backups on a different hosting platform a while back using VDO on Linux. We found that even with small changes to the snapshot, any form of compression of the file changed it so significantly that the dedup was ineffective. Have you tried the same tests without compressing the backups? I've been testing dedup on ZFS over NFS outside of Proxmox today and the results look very promising.

David

wolfgang · Oct 8, 2019

Hi,

jmcorrea said:
Now, the issue we are facing and really cant see why is that ZFS deduplication is hardly saving any space.

The problem is the block order.
vzdump has the capability to make backups online.
For this feature, the blocks will come out of order to the backup file.
And ZFS has a default recordsize of 128K which must be identical in case dedup should work.

ozdjh · Oct 8, 2019

Hi Wolfgang,

So in this case, if we ran a job once the backup file had been written that copied the resulting file to a tmp file on the ZFS target, deleted the original, and renamed the tmp file back to the original file we'd get in-order writes. It's not ideal but could give us the dedup we're hoping to see ??

David

fabian · Oct 8, 2019

ozdjh said:
Hi Wolfgang,

So in this case, if we ran a job once the backup file had been written that copied the resulting file to a tmp file on the ZFS target, deleted the original, and renamed the tmp file back to the original file we'd get in-order writes. It's not ideal but could give us the dedup we're hoping to see ??

David

no, that's not how it works. also, 20GB of RAM is way to little for ZFS deduplication.. IIRC @LnxBil has some tooling around extracting vzdump backups on a ZFS host to achieve differential/snapshot-based storage of guest backups?

jmcorrea · Oct 8, 2019

Hello!

Thank you very much for the information. So, if i understood correctly, the "snapshot" mode (online backups) wont ever save space over ZFS dedup because the order it makes its writes, no matter if the ZFS unit is local or shared over NFS, neither matter if backup compression is on, correct? Changing the mode to either "suspend" or "shutdown" is not possible for us, so this could be a big complication for us.

I am aware of the 5GB RAM per 1TB of data to dedup, but right now i cant assing that to the NFS server, so im complementing the RAM with those 40GB SSD cache for the ZFS, and changed secondarycache to meta.

Regarding the recordsize, both proxmox hosts (which have ZFS for VM storage) and the backup ZFS have the default value:

Code:

root@server:~# zfs get recordsize
NAME                                        PROPERTY    VALUE    SOURCE
storage-pool                              recordsize   128K      default

[root@zfs ~]# zfs get recordsize
NAME                                     PROPERTY    VALUE    SOURCE
backup-pool                           recordsize   128K      default
backup-pool/dos-semanas   recordsize    128K      default
backup-pool/shared              recordsize   128K      default
backup-pool/una-semana    recordsize    128K      default

So, i would really like to confirm if with my current setup have dedup is possible, i understand that is not.

It seems that the simplest way to achieve dedup on "snapshot" mode backups is to use PVE-zsync instead of vzdump, is this right?

Thanks you very much!

Best regards,

Juan

fabian · Oct 8, 2019

jmcorrea said:
Hello!

Thank you very much for the information. So, if i understood correctly, the "snapshot" mode (online backups) wont ever save space over ZFS dedup because the order it makes its writes, no matter if the ZFS unit is local or shared over NFS, neither matter if backup compression is on, correct? Changing the mode to either "suspend" or "shutdown" is not possible for us, so this could be a big complication for us.

backup mode does not change at all how the vzdump and the VMA file format work. snapshot vs suspend vs stop just change what vzdump does to get a consistent view of the backed up disks. you simply won't get any deduplication out of stored VMA files.

I am aware of the 5GB RAM per 1TB of data to dedup, but right now i cant assing that to the NFS server, so im complementing the RAM with those 40GB SSD cache for the ZFS, and changed secondarycache to meta.

that won't help either, as ZFS needs the whole dedup table in RAM to get any kind of reasonable performance. unless you have carefully evaluated your use case, and really understand how ZFS dedup is implemented, I don't recommend turning it on.

Code:
Regarding the recordsize, both proxmox hosts (which have ZFS for VM storage) and the backup ZFS have the default value:

Code:

root@server:~# zfs get recordsize NAME PROPERTY VALUE SOURCE storage-pool recordsize 128K default [root@zfs ~]# zfs get recordsize NAME PROPERTY VALUE SOURCE backup-pool recordsize 128K default backup-pool/dos-semanas recordsize 128K default backup-pool/shared recordsize 128K default backup-pool/una-semana recordsize 128K default

So, i would really like to confirm if with my current setup have dedup is possible, i understand that is not.

It seems that the simplest way to achieve dedup on "snapshot" mode backups is to use PVE-zsync instead of vzdump, is this right?

pve-zsync is not for backups, but for quick disaster recovery. those are two different things. you can get deduplication by extracting the vma files on your backup host (which gives you the data "in-order" again) and de-duplicating the result, but you need to do the conversion both ways yourself. there are people doing it with ZFS (by extracting a new backup over the previous one, then doing a snapshot), but it is not something that PVE does/supports out of the box.

jmcorrea · Oct 8, 2019

Hello Fabian,

thanks! Thats a lot of really useful information. So, in my scenario one way to have dedup actually save space is to extract the backup files (and re-compress them if wanted) on the ZFS backup server, correct? From what i have read another way would be to copy or move the backups into the same ZFS backup pool, would this also work?

Regarding PVE-zsync, the documentation says that its main goal is offsite backup, which is what we want. Also, its incremental and can keep multiple copies. The only shortcomming compared to vzdump is that there is no GUI.
Can you please explain why you dont see it fit for backups?

Thank you very much!

Juan Correa.

fabian · Oct 8, 2019

jmcorrea said:
Hello Fabian,

thanks! Thats a lot of really useful information. So, in my scenario one way to have dedup actually save space is to extract the backup files (and re-compress them if wanted) on the ZFS backup server, correct? From what i have read another way would be to copy or move the backups into the same ZFS backup pool, would this also work?

compression and deduplication don't mix well, and I don't know what your second question means?

Regarding PVE-zsync, the documentation says that its main goal is offsite backup, which is what we want. Also, its incremental and can keep multiple copies. The only shortcomming compared to vzdump is that there is no GUI.
Can you please explain why you dont see it fit for backups?

it's offsite replication. it can be part of a (bigger) backup strategy, but it is not what most people would consider a "backup" on its own. it does not only not have a GUI, it's also not integrated with the rest of PVE. a user cannot set it up on their own, or restore from it on their own (unless they are root).

jmcorrea · Oct 8, 2019

Hello Fabian,

thank you very much for your help!

My second question was if instead of decompressing the backups just copying or moving them would be enough for dedup to work. Anyway, don't mind that, its much easier to set up a cron that decompress the backups than having to copy or move and then delete the original.

So, what i really need to achieve is a backup that run on host level (cant backup from an agent on guest level), that is incremental both at "extracting" the backup from the vms and also at saving them on disk. Because this is not an option with vzdump i tried to achieve the closest thing to that it can have and that was ZFS dedup.
Now, vzdump + zfs dedup (if i can get it to work, have enough RAM, etc) only solves my space issue (for example, we need to have 15 copies of some vms that are over 100GB, using 1.5TB to backup them is not viable) but there is also the time backups take.
That is why after searching and reading about many options i came across this one and after that started to see how to implement PVE-zsync, because, at least on paper, it fixed both our need for more space efficient and faster backups via incrementality.

We need to have the backups for 2 things only, disaster recovery and eventually to restore a copy of a VM in case something is deleted or broken inside it. Its ok if we lost the ability to give VM admins the option to backup or restore at they request, as long as we gain incrementality on backups. Would you consider PVE-zsync as a valid tool for this, or its a no-go from start?
Im asking this because if PVE-zsync can cover this needs, instead of focusing on having vzdump + ZFS dedup i will directly start to test and see how to move to PVE-zsync.

Again, thank you very much, this insight you are sharing is going to be really helpful for us.

Juan.

fabian · Oct 9, 2019

jmcorrea said:
So, what i really need to achieve is a backup that run on host level (cant backup from an agent on guest level), that is incremental both at "extracting" the backup from the vms and also at saving them on disk. Because this is not an option with vzdump i tried to achieve the closest thing to that it can have and that was ZFS dedup.
Now, vzdump + zfs dedup (if i can get it to work, have enough RAM, etc) only solves my space issue (for example, we need to have 15 copies of some vms that are over 100GB, using 1.5TB to backup them is not viable) but there is also the time backups take.
That is why after searching and reading about many options i came across this one and after that started to see how to implement PVE-zsync, because, at least on paper, it fixed both our need for more space efficient and faster backups via incrementality.

We need to have the backups for 2 things only, disaster recovery and eventually to restore a copy of a VM in case something is deleted or broken inside it. Its ok if we lost the ability to give VM admins the option to backup or restore at they request, as long as we gain incrementality on backups. Would you consider PVE-zsync as a valid tool for this, or its a no-go from start?
Im asking this because if PVE-zsync can cover this needs, instead of focusing on having vzdump + ZFS dedup i will directly start to test and see how to move to PVE-zsync.

yes, pve-zsync does that (incremental offsite replication with retention of last X states, with manual recovery only).

LnxBil · Oct 17, 2019

In order to get good dedup rates, you need to unpack the backup with vma.

We discussed dedup an backup more than once already, so I just link to my last comment on that matter:

https://forum.proxmox.com/threads/backup-deduplication.57459/post-265652

sdet00 · Aug 15, 2024

If anyone is reading this thread from the future, the answer to this is to switch to using Proxmox Backup Server.

Search

Search

ZFS Dedup not working on backups

jmcorrea

New Member

ozdjh

Renowned Member

wolfgang

Proxmox Retired Staff

ozdjh

Renowned Member

fabian

Proxmox Staff Member

jmcorrea

New Member

fabian

Proxmox Staff Member

jmcorrea

New Member

fabian

Proxmox Staff Member

jmcorrea

New Member

fabian

Proxmox Staff Member

LnxBil

Distinguished Member

sdet00

Well-Known Member

We value your privacy