Replication needs so much additional storage space

Vynn · Jun 20, 2022

Hi,

I have two nodes cluster running, Each node has a ZFS Pool with 2TB storage. These two pools are joined together for replication.

All the virtual disks are thick provisioned. Replication between the two nodes works fine but it raises the needed amount of storage data for each disk by up to 80%.

For example: Disk 3 on vm 121 has 450 GB but it needs 693 GB of storage on each node:

Bash:

> zfs list -o space
NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL                46.2G  1.71T        0B     96K             0B      1.71T
...
PVE_ZFS_POOL/vm-121-disk-3   510G   693G        0B    229G           464G         0B

Bash:

> zfs list -t all
NAME                                                           USED  AVAIL     REFER  MOUNTPOINT
PVE_ZFS_POOL                                               1.71T  46.2G       96K  /PVE_ZFS_POOL
...
PVE_ZFS_POOL/vm-121-disk-3                                  693G   510G      229G  -
PVE_ZFS_POOL/vm-121-disk-3@__replicate_121-0_1655755215__     0B      -      229G  -

Because of this additional storage allocated for the replication snapshots i loose 70% of my overall storage. Is there any way to delete these snapshots right after the sync or any other solution? Why do these snapshots need so much storage? I don't want to use thin provisioning.

Thanks for any suggestion.

fiona · Jun 21, 2022

Hi,
unfortunately, you can't delete the snapshots, because they are required for incremental sync. Without such snapshots the disk would need to be completely re-sent every time. If you replicate more often, there should be less data to sync (for a single sync) and less data needed by the snapshot.

Vynn · Jun 21, 2022

The replication runs every 30 minutes. Even if there is no change on the disk, the amount of storage needed by the VM disk is not decreased.

Bash:

2022-06-21 09:30:00 121-0: start replication job
2022-06-21 09:30:00 121-0: guest => VM 121, running => 1380186
2022-06-21 09:30:00 121-0: volumes => PVE_ZFS_POOL:PVE_ZFS_POOL:vm-121-disk-3
2022-06-21 09:30:00 121-0: freeze guest filesystem
2022-06-21 09:30:05 121-0: create snapshot '__replicate_121-0_1655796600__' on PVE_ZFS_POOL:vm-121-disk-3
2022-06-21 09:30:05 121-0: thaw guest filesystem
2022-06-21 09:30:05 121-0: using secure transmission, rate limit: none
2022-06-21 09:30:08 121-0: incremental sync 'PVE_ZFS_POOL:vm-121-disk-3' (__replicate_121-0_1655794800__ => __replicate_121-0_1655796600__)
2022-06-21 09:30:09 121-0: send from @__replicate_121-0_1655794800__ to PVE_ZFS_POOL/vm-121-disk-3@__replicate_121-0_1655796600__ estimated size is 113M
2022-06-21 09:30:09 121-0: total estimated size is 113M
2022-06-21 09:30:10 121-0: successfully imported 'PVE_ZFS_POOL:vm-121-disk-3'
2022-06-21 09:30:11 121-0: delete previous replication snapshot '__replicate_121-0_1655794800__' on PVE_ZFS_POOL:vm-121-disk-3
2022-06-21 09:30:12 121-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_121-0_1655794800__' on PVE_ZFS_POOL:vm-121-disk-3
2022-06-21 09:30:12 121-0: end replication job

In this sync are 113MB transferred. Size of the vm disk is 450GB. Size consumed in the ZFS Pool is still 693 GB.

Bash:

zfs list -t all
PVE_ZFS_POOL/vm-121-disk-3                                  693G   550G      229G  -
PVE_ZFS_POOL/vm-121-disk-3@__replicate_121-0_1655796600__   132M      -      229G  -

I made a test, i created an empty 10 GB disk

Bash:

NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL/vm-216-disk-1  56.8G  10.3G        0B     56K          10.3G         0B

I copied 4 GB data on that disk

Bash:

NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL/vm-216-disk-1  52.7G  10.3G       56K   4.11G          6.21G         0B

Consumed space is still 10.3G, that is fine. With the first replication the consumed space on the ZFS pool jumps to 14.6 GB

Code:

NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL/vm-216-disk-1  52.3G  14.6G      252K   4.29G          10.3G         0B

These additional 4,3 GB will not be released with further replications, even if there is no change on the disk and even if i run the replication every 5 minutes. The space is not released unless i delete the data on the vm disk.

What does ZFS store here? I see no reason for this additional amount of space. Is this the normal behavior of ZFS replication?

This is a real problem. Even though in my pool there are only 50% used for vm disks it is very close to run out of space because of this behavior of the replication.

fabian · Jun 21, 2022

this is just how zfs snapshots work for fully provisioned zvols. think of it like this:

- you have a volume with size 10G (it will take 10G of reservation + overhead)
- you write 4G (it will still take 10G + overhead - 4G used data, 6G reservation)
- you take a snapshot
- you now need 4G+overhead for the data referenced by the snapshot, but also still 10G (reservation) + overhead to allow for the volume to be fully (re)written
- you write another 2G (it will still take 10 + 4 + overhead - the 2G are contained in the 10G, which is actually 2G used and 8G reservation now)
- you take another snapshot (total usage now 10 (reservation) + 4 (used by snapshot #1) + 2 (used by snapshot #2) + overhead)

it's actually a bit more complicated because obviously every snapshot might contain data unique to that snapshot but also data shared with other snapshots, so the accounting will only show you the unique data used when you examine the snapshot itself.

you can trade this safety of being able to always fully write every zvol for less space usage by using thin-provisioned zvols - the volume itself will then only use as much as is actually written, not the full size - but there is no guarantee you even have enough free space to fully write it, with bad results including potential data corruption or server outages if you (or users/services inside your guests using such volumes!) ever do.

Vynn · Jun 21, 2022

ok, thanks for the great explanation. I understand now that with thick provisioning the full disk size needs to be reserved because in theory it could be possible that the disk is completely rewritten with completely different data.

So the only solution is thin provisioning. I tik the "Thin provisioning" checkbox in the settings of the pool in the datacenter. Will this automatically reduce the needed reserved space and free up space in the pool or do i have to do something else?

fabian · Jun 21, 2022

that checkbox only affects newly allocated volumes - for existing ones you have to clear the corresponding property (refreservation) with zfs set

Vynn · Jun 21, 2022

Switch volume from thick provisioning to thin provisioning

Bash:

> zfs list -o space
NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL                96.2G  1.66T        0B     96K             0B      1.66T
PVE_ZFS_POOL/vm-215-disk-2   148G  88.4G     2.44G   34.4G          51.6G         0B

> zfs get reservation,refreserv PVE_ZFS_POOL/vm-215-disk-2
NAME                           PROPERTY        VALUE      SOURCE
PVE_ZFS_POOL/vm-215-disk-2  reservation     none       default
PVE_ZFS_POOL/vm-215-disk-2  refreservation  51.6G      local

> zfs set refreservation=none PVE_ZFS_POOL/vm-215-disk-2

> zfs get reservation,refreserv PVE_ZFS_POOL/vm-215-disk-2
NAME                           PROPERTY        VALUE      SOURCE
PVE_ZFS_POOL/vm-215-disk-2  reservation     none       default
PVE_ZFS_POOL/vm-215-disk-2  refreservation  none       local

>zfs list -o space
NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL                 148G  1.61T        0B     96K             0B      1.61T
PVE_ZFS_POOL/vm-215-disk-2   148G  36.8G     2.44G   34.4G             0B         0B

This releases all the reserved space.

In Proxmox i can only set thin provisioning for the whole pool.

Q1: Is it safe to mix thick and thin provisioning that way?

I leave important VMs thick provisioned and change not so important VMs to thin provisioning with zfs set refreservation=none [volume]?

Q2: The reservation property is also set to none, this is correct for a thin provisioned volume, right?

Q3: What will happen if i set the reservation property?

Bash:

zfs set reservation=51.6G  PVE_ZFS_POOL/vm-215-disk-2
> zfs get reservation,refreserv PVE_ZFS_POOL/vm-215-disk-2
NAME                           PROPERTY        VALUE      SOURCE
PVE_ZFS_POOL/vm-215-disk-2  reservation     51.6G      local
PVE_ZFS_POOL/vm-215-disk-2  refreservation  none       local

zfs list -o space
NAME                           AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
PVE_ZFS_POOL                 133G  1.63T        0B     96K             0B      1.63T
PVE_ZFS_POOL/vm-215-disk-2   148G  36.8G     2.44G   34.4G             0B         0B

Q4: Does it mean the disk is thick provisioned but replication might fail because of missing reservation?
Is this an advisable configuration to sacrifice reliable replication to avoid data corruption because of thin provisioning in case the pool runs out of space?

Sorry for so many more questions

and thank you so much for your great help!

fabian · Jun 21, 2022

Q1: yes
Q2: yes
Q3/Q4: reservation is also used to reserve space - but its used for reserving space for the dataset/volume and descendants, where refreservation is used just for the dataset/volume itself - see man zfsprops. for PVE, it doesn't make much sense to set reservation. you can only either reserve the space or be thin-provisioned, there is no meaningful "halfway" there.

Vynn · Jun 21, 2022

Thank you very much Fabian, this was very helpful!

Search

Search

Replication needs so much additional storage space

Vynn

New Member

fiona

Proxmox Staff Member

Vynn

New Member

fabian

Proxmox Staff Member

Vynn

New Member

fabian

Proxmox Staff Member

Vynn

New Member

fabian

Proxmox Staff Member

Vynn

New Member