ZFS Storage Replication - transferred size bigger than actual disc size

rholighaus

Well-Known Member
Dec 15, 2016
97
8
48
61
Berlin
We are using ZFS storage replication. A container running on host carrier-1 is replicated to carrier-2.

Code:
NAME                           USED  AVAIL     REFER  MOUNTPOINT
rpool/data/subvol-115-disk-1  3.29G  96.7G     3.26G  /rpool/data/subvol-115-disk-1
rpool/data/subvol-115-disk-2   333G  82.6G      317G  /rpool/data/subvol-115-disk-2

Replication log, however, says it wants to transfer 4.92TB:

Code:
:2020-01-17 07:49:56 115-1: start replication job
2020-01-17 07:49:56 115-1: guest => CT 115, running => 1
2020-01-17 07:49:56 115-1: volumes => rpool:subvol-115-disk-1,rpool:subvol-115-disk-2
2020-01-17 07:49:58 115-1: freeze guest filesystem
2020-01-17 07:49:58 115-1: create snapshot '__replicate_115-1_1579243796__' on rpool:subvol-115-disk-1
2020-01-17 07:49:58 115-1: create snapshot '__replicate_115-1_1579243796__' on rpool:subvol-115-disk-2
2020-01-17 07:49:58 115-1: thaw guest filesystem
2020-01-17 07:49:58 115-1: incremental sync 'rpool:subvol-115-disk-1' (__replicate_115-1_1579169701__ => __replicate_115-1_1579243796__)
2020-01-17 07:49:59 115-1: send from @__replicate_115-1_1579169701__ to rpool/data/subvol-115-disk-1@__replicate_115-2_1579189875__ estimated size is 19.0M
2020-01-17 07:49:59 115-1: send from @__replicate_115-2_1579189875__ to rpool/data/subvol-115-disk-1@__replicate_115-1_1579243796__ estimated size is 14.4M
2020-01-17 07:49:59 115-1: total estimated size is 33.4M
2020-01-17 07:49:59 115-1: TIME        SENT   SNAPSHOT rpool/data/subvol-115-disk-1@__replicate_115-2_1579189875__
2020-01-17 07:49:59 115-1: rpool/data/subvol-115-disk-1@__replicate_115-1_1579169701__    name    rpool/data/subvol-115-disk-1@__replicate_115-1_1579169701__    -
2020-01-17 07:50:00 115-1: 07:50:00   4.37M   rpool/data/subvol-115-disk-1@__replicate_115-2_1579189875__
2020-01-17 07:50:00 115-1: TIME        SENT   SNAPSHOT rpool/data/subvol-115-disk-1@__replicate_115-1_1579243796__
2020-01-17 07:50:01 115-1: 07:50:01   2.99M   rpool/data/subvol-115-disk-1@__replicate_115-1_1579243796__
2020-01-17 07:50:02 115-1: 07:50:02   2.99M   rpool/data/subvol-115-disk-1@__replicate_115-1_1579243796__
2020-01-17 07:50:03 115-1: 07:50:03   2.99M   rpool/data/subvol-115-disk-1@__replicate_115-1_1579243796__
2020-01-17 07:50:07 115-1: incremental sync 'rpool:subvol-115-disk-2' (__replicate_115-1_1579169701__ => __replicate_115-1_1579243796__)
2020-01-17 07:50:07 115-1: send from @__replicate_115-1_1579169701__ to rpool/data/subvol-115-disk-2@__replicate_115-2_1579189875__ estimated size is 3.25T
2020-01-17 07:50:07 115-1: send from @__replicate_115-2_1579189875__ to rpool/data/subvol-115-disk-2@__replicate_115-1_1579243796__ estimated size is 1.68T
2020-01-17 07:50:07 115-1: total estimated size is 4.92T
2020-01-17 07:50:07 115-1: TIME        SENT   SNAPSHOT rpool/data/subvol-115-disk-2@__replicate_115-2_1579189875__


zfs get rpool/data/subvol-115-disk-2 shows that this dataset is heavily compressed

Code:
rpool/data/subvol-115-disk-2  used                  333G                           -
rpool/data/subvol-115-disk-2  available             82.5G                          -
rpool/data/subvol-115-disk-2  referenced            317G                           -
rpool/data/subvol-115-disk-2  compressratio         23.17x                         -
rpool/data/subvol-115-disk-2  usedbysnapshots       15.9G                          -
rpool/data/subvol-115-disk-2  usedbydataset         317G                           -
rpool/data/subvol-115-disk-2  written               212M                           -
rpool/data/subvol-115-disk-2  logicalused           7.51T                          -
rpool/data/subvol-115-disk-2  logicalreferenced     7.47T                          -

Unfortunately, all the compressed data is replicated uncompressed over our replication network, which takes forever and also clogs up the network.

Any idea of how to fix that?
 
Maybe Bug 1824 and the published (but not implemented) patch will do the trick.
I'm currently testing the patch ( zfs send -Rpvc instead of zfs send -Rpv). I will report.

Either replication has to check whether compression is enabled and then use the -c flag (but compression is enabled by default in PVE afaik) or just use it anyway.

Comments, opinions?
 
the problem is that -c can fail if source and target don't have the same compression features available.. we could maybe add it as an option? the same is true for -L and -e..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!