ZFS Replication running away?

Discussion in 'Proxmox VE: Installation and configuration' started by rholighaus, May 20, 2019.

  1. rholighaus

    rholighaus Member
    Proxmox Subscriber

    Joined:
    Dec 15, 2016
    Messages:
    45
    Likes Received:
    2
    I have the following file system that gets replicated to two other PVE nodes:

    Code:
    root@carrier:~# zfs list -ospace -tall -r rpool/data/subvol-104-disk-2
    NAME                                                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
    rpool/data/subvol-104-disk-2                                  855G   330G      249G   81.1G             0B         0B
    rpool/data/subvol-104-disk-2@2019-04-18-000000                   -  1.64G         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-04-26-080000                   -   457M         -       -              -          -
    
    [snapshots removed to shorten list]
    
    rpool/data/subvol-104-disk-2@2019-05-20-080000                   -  10.0M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-090000                   -  15.3M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-100000                   -  14.1M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-110000                   -  12.8M         -       -              -          -
    rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__      -  8.28M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-120000                   -  7.79M         -       -              -          -
    rpool/data/subvol-104-disk-2@__replicate_104-2_1558348775__      -  8.75M         -       -              -          -
    rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__      -  9.42M         -       -              -          -
    
    Replication to one node (replication job 104-2) went well, the other one (replication job 104-1) is causing problems but only for the second subvol (subvol-104-disk-2, old PVE subvol counting that still started at 1):

    Code:
    2019-05-20 14:59:16 104-1: incremental sync 'rpool:subvol-104-disk-2' (__replicate_104-1_1558344610__ => __replicate_104-1_1558357140__)
    2019-05-20 14:59:16 104-1: rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__    name    rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__    -
    2019-05-20 14:59:17 104-1: send from @__replicate_104-1_1558344610__ to rpool/data/subvol-104-disk-2@2019-05-20-120000 estimated size is 157M
    2019-05-20 14:59:17 104-1: send from @2019-05-20-120000 to rpool/data/subvol-104-disk-2@__replicate_104-2_1558348775__ estimated size is 47.7M
    2019-05-20 14:59:17 104-1: send from @__replicate_104-2_1558348775__ to rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__ estimated size is 1.52T
    2019-05-20 14:59:17 104-1: total estimated size is 1.52T
    
    1.52TB is very strange. Current transfer has already reached 144GB and still running.
    Any idea what is happening?

    I had migrated the LXC from one PVE node to the current one (restart method).
    Replication to the former node ran well (job 104-2), while replication to another node (job 104-1) seems to cause problems.

    EDIT: Log from pending send job:

    Code:
    2019-05-20 15:35:12 104-1: 15:35:12    225G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 15:35:13 104-1: 15:35:13    225G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 15:35:14 104-1: 15:35:14    225G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 15:35:15 104-1: 15:35:15    226G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    Here is the receiving side - note only 6.90G have been allocated so there must be some wasted data in the send job:

    Code:
    NAME                                                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
    rpool/data/subvol-104-disk-2                                  777G   281G      248G   25.8G             0B      6.90G
    rpool/data/subvol-104-disk-2@2019-04-18-000000                   -  1.64G         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-04-26-080000                   -   457M         -       -              -          -
    
    [snapshots omitted to short list]
    
    rpool/data/subvol-104-disk-2@2019-05-20-060000                   -  58.0M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-070000                   -  10.0M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-080000                   -  9.98M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-090000                   -  15.2M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-100000                   -  14.1M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-110000                   -  12.8M         -       -              -          -
    rpool/data/subvol-104-disk-2@__replicate_104-1_1558344610__      -  8.22M         -       -              -          -
    rpool/data/subvol-104-disk-2@2019-05-20-120000                   -  7.75M         -       -              -          -
    rpool/data/subvol-104-disk-2@__replicate_104-2_1558348775__      -     0B         -       -              -          -
     
    #1 rholighaus, May 20, 2019
    Last edited: May 20, 2019
  2. rholighaus

    rholighaus Member
    Proxmox Subscriber

    Joined:
    Dec 15, 2016
    Messages:
    45
    Likes Received:
    2
    By now we have reached 770GB transfer:

    Code:
    2019-05-20 17:01:31 104-1: 17:01:31    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 17:01:32 104-1: 17:01:32    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 17:01:33 104-1: 17:01:33    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 17:01:34 104-1: 17:01:34    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    2019-05-20 17:01:35 104-1: 17:01:35    771G   rpool/data/subvol-104-disk-2@__replicate_104-1_1558357140__
    And created 18GB more data on the receiving side:

    Code:
    root@carrier-3:~# zfs list -ospace -tall -r rpool/data/subvol-104-disk-2 | head
    NAME                                                         AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
    rpool/data/subvol-104-disk-2                                  759G   298G      248G   25.8G             0B      24.1G
    
     
  3. rholighaus

    rholighaus Member
    Proxmox Subscriber

    Joined:
    Dec 15, 2016
    Messages:
    45
    Likes Received:
    2
    Looks like here is one explanation - gotta dig deeper why there is such a huge compressratio (the sub-volume holds a zimbra file system which uses OpenLDAP, which makes use of sparse files, so there could be one reason):

    Code:
    rpool/data/subvol-104-disk-2  refcompressratio            27.51x                                                          -
    rpool/data/subvol-104-disk-2  written                     93.9G                                                           -
    rpool/data/subvol-104-disk-2  logicalused                 11.9T                                                           -
    rpool/data/subvol-104-disk-2  logicalreferenced           4.50T       
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice