ZFS Replication running away?

rholighaus

Renowned Member
Dec 15, 2016
97
8
73
61
Berlin
I have the following file system that gets replicated to two other PVE nodes:

Code:
root@carrier:~# zfs list -ospace -tall -r rpool/data/subvol-104-disk-2
NAME                                                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool/data/subvol-104-disk-2                                  855G   330G      249G   81.1G             0B         0B
rpool/data/subvol-104-disk-2@2019-04-18-000000                   -  1.64G         -       -              -          -
rpool/data/subvol-104-disk-2@2019-04-26-080000                   -   457M         -       -              -          -

[snapshots removed to shorten list]

rpool/data/subvol-104-disk-2@2019-05-20-080000                   -  10.0M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-090000                   -  15.3M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-100000                   -  14.1M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-110000                   -  12.8M         -       -              -          -
rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__      -  8.28M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-120000                   -  7.79M         -       -              -          -
rpool/data/subvol-104-disk-2@__replicate_104-2_1558348775__      -  8.75M         -       -              -          -
rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__      -  9.42M         -       -              -          -

Replication to one node (replication job 104-2) went well, the other one (replication job 104-1) is causing problems but only for the second subvol (subvol-104-disk-2, old PVE subvol counting that still started at 1):

Code:
2019-05-20 14:59:16 104-1: incremental sync 'rpool:subvol-104-disk-2' (__replicate_104-1_1558344610__ => __replicate_104-1_1558357140__)
2019-05-20 14:59:16 104-1: rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__    name    rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__    -
2019-05-20 14:59:17 104-1: send from @__replicate_104-1_1558344610__ to rpool/data/subvol-104-disk-2@2019-05-20-120000 estimated size is 157M
2019-05-20 14:59:17 104-1: send from @2019-05-20-120000 to rpool/data/subvol-104-disk-2@__replicate_104-2_1558348775__ estimated size is 47.7M
2019-05-20 14:59:17 104-1: send from @__replicate_104-2_1558348775__ to rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__ estimated size is 1.52T
2019-05-20 14:59:17 104-1: total estimated size is 1.52T

1.52TB is very strange. Current transfer has already reached 144GB and still running.
Any idea what is happening?

I had migrated the LXC from one PVE node to the current one (restart method).
Replication to the former node ran well (job 104-2), while replication to another node (job 104-1) seems to cause problems.

EDIT: Log from pending send job:

Code:
2019-05-20 15:35:12 104-1: 15:35:12    225G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 15:35:13 104-1: 15:35:13    225G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 15:35:14 104-1: 15:35:14    225G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 15:35:15 104-1: 15:35:15    226G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__

Here is the receiving side - note only 6.90G have been allocated so there must be some wasted data in the send job:

Code:
NAME                                                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool/data/subvol-104-disk-2                                  777G   281G      248G   25.8G             0B      6.90G
rpool/data/subvol-104-disk-2@2019-04-18-000000                   -  1.64G         -       -              -          -
rpool/data/subvol-104-disk-2@2019-04-26-080000                   -   457M         -       -              -          -

[snapshots omitted to short list]

rpool/data/subvol-104-disk-2@2019-05-20-060000                   -  58.0M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-070000                   -  10.0M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-080000                   -  9.98M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-090000                   -  15.2M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-100000                   -  14.1M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-110000                   -  12.8M         -       -              -          -
rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__      -  8.22M         -       -              -          -
rpool/data/subvol-104-disk-2@2019-05-20-120000                   -  7.75M         -       -              -          -
rpool/data/subvol-104-disk-2@__replicate_104-2_1558348775__      -     0B         -       -              -          -
 
Last edited:
By now we have reached 770GB transfer:

Code:
2019-05-20 17:01:31 104-1: 17:01:31    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 17:01:32 104-1: 17:01:32    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 17:01:33 104-1: 17:01:33    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 17:01:34 104-1: 17:01:34    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
2019-05-20 17:01:35 104-1: 17:01:35    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__

And created 18GB more data on the receiving side:

Code:
root@carrier-3:~# zfs list -ospace -tall -r rpool/data/subvol-104-disk-2 | head
NAME                                                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool/data/subvol-104-disk-2                                  759G   298G      248G   25.8G             0B      24.1G
 
Looks like here is one explanation - gotta dig deeper why there is such a huge compressratio (the sub-volume holds a zimbra file system which uses OpenLDAP, which makes use of sparse files, so there could be one reason):

Code:
rpool/data/subvol-104-disk-2  refcompressratio            27.51x                                                          -
rpool/data/subvol-104-disk-2  written                     93.9G                                                           -
rpool/data/subvol-104-disk-2  logicalused                 11.9T                                                           -
rpool/data/subvol-104-disk-2  logicalreferenced           4.50T