HowTo defrag an zfs-pool?

Discussion in 'Proxmox VE: Installation and configuration' started by udo, Apr 13, 2018.

Tags:
  1. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,736
    Likes Received:
    150
    Hi,
    I have an zfs pool with an MsSql-VM, wich change a lot of data. I use zfs for disaster recovery - send snapshots with pve-zsync to another cluster-node and with znapzend to an remote-host.

    After a short time of use, the pool has an high fragmentation:
    Code:
    zpool get capacity,size,health,fragmentation
    NAME       PROPERTY       VALUE   SOURCE
    pve02pool  capacity       73%     -
    pve02pool  size           1.73T   -
    pve02pool  health         ONLINE  -
    pve02pool  fragmentation  40%     -
    
    I have read, that defragmentation isn't possible on zfs. Is this still valid?

    And the REFER blow up, which don't fit to the sanpshots:
    Code:
    zfs list -t snapshot
    NAME                                                      USED  AVAIL  REFER  MOUNTPOINT
    pve02pool/vm-200-disk-2@rep_default_2018-04-11_11:45:01   476M      -   605G  -
    pve02pool/vm-200-disk-2@2018-04-11-180000                62.6M      -   685G  -
    pve02pool/vm-200-disk-2@2018-04-12-000000                30.9M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-12-060000                11.3M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-12-120000                94.2M      -   684G  -
    pve02pool/vm-200-disk-2@rep_default_2018-04-12_17:45:16  2.01M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-12-180000                   2M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-13-000000                 142G      -   880G  -
    pve02pool/vm-200-disk-2@rep_default_2018-04-13_05:45:07  2.45M      -   727G  -
    pve02pool/vm-200-disk-2@2018-04-13-060000                2.42M      -   727G  -
    pve02pool/vm-200-disk-2@2018-04-13-120000                91.7M      -   727G  -
    pve02pool/vm-200-disk-2@rep_default_2018-04-13_13:30:05  46.7M      -   727G  -
    
    If the only way is storage migration, it's not realy usable - if I migrate the 1TB-Volume with storage migration the volume use the full space after that. Mean migrate away, migrate back, write zeros to free space.
    And, if i see it right, an new zfs-sync (with pve-zsync and znapzend) will transmit then the whole vm-disk again, because it's an new one (and 600GB over an wan-connection takes some days).

    Any hints?

    Udo
     
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,072
    Likes Received:
    250
    Hi,

    AFIK and as the git log says there is no defrag option.

    But the fragmentation level tells you only how the new data will be written and not how the written data are fragmented.
    Normally if your stay under 70% pool use you get no performance problems.

    Do you have any performance problems with 73%?

    Yes, this is true.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,736
    Likes Received:
    150
    bad...
    yesterday the monitoring show some strange things during heavy io on this pool.
    The biggest problem is that 70% is reached after two weeks!
    I must learned before that with 93% nothing work anymore... So I removed this one VM and after migrate back it's work til now but it' don't look that I schould use that for another week...

    zfs list show, that the volume use 1,01T but the refer with 727G plus all snaps are much less...
    But usedbysnapshots show an higher value:
    Code:
    zfs get used,usedbydataset,usedbysnapshots pve02pool/vm-200-disk-2
    NAME                     PROPERTY         VALUE     SOURCE
    pve02pool/vm-200-disk-2  used             1.01T     -
    pve02pool/vm-200-disk-2  usedbydataset    647G      -
    pve02pool/vm-200-disk-2  usedbysnapshots  382G      -
    
    this don't fit with the zfs -list -t snapshot
    Code:
    zfs list -t snapshot | grep 200-disk-2
    pve02pool/vm-200-disk-2@rep_default_2018-04-11_11:45:01   476M      -   605G  -
    pve02pool/vm-200-disk-2@2018-04-11-180000                62.6M      -   685G  -
    pve02pool/vm-200-disk-2@2018-04-12-000000                30.9M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-12-060000                11.3M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-12-120000                94.2M      -   684G  -
    pve02pool/vm-200-disk-2@rep_default_2018-04-12_17:45:16  2.01M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-12-180000                   2M      -   684G  -
    pve02pool/vm-200-disk-2@2018-04-13-000000                 142G      -   880G  -
    pve02pool/vm-200-disk-2@rep_default_2018-04-13_05:45:07  2.45M      -   727G  -
    pve02pool/vm-200-disk-2@2018-04-13-060000                2.42M      -   727G  -
    pve02pool/vm-200-disk-2@2018-04-13-120000                 110M      -   727G  -
    pve02pool/vm-200-disk-2@rep_default_2018-04-13_14:45:06  36.9M      -   647G  -
    
    Perhaps it's has something to do with the blocksize?
    Code:
    get volblocksize pve02pool/vm-200-disk-2
    NAME                     PROPERTY      VALUE     SOURCE
    pve02pool/vm-200-disk-2  volblocksize  8K        default
    
    Udo
     
  4. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,072
    Likes Received:
    250
    Do you use "thin provision"?
    What Raid level do you use?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,129
    Likes Received:
    478
    the 'used' value of one snapshot just tells you how much space is used only in that snapshot (or, in other words, how much space you are guaranteed to get back if you delete it). data that is stored in more than one snapshot is not counted in any individual snapshot's 'used' value, but only in 'usedbysnapshots' of the dataset itself.

     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,736
    Likes Received:
    150
    Hi,
    it's an raid1 with two big SSDs.
    Du to storage migration the Volume is first a thick volume, but with writing zeros inside the VM and the discard option it's like thin provisioning.

    Udo
     
  7. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,736
    Likes Received:
    150
    Hi Fabian,
    thanks for this info.

    Now, after deleting some older snapshots (e.g. with write zeros) it's looks better (capacity) except the fragmentation:
    Code:
    zpool get capacity,size,health,fragmentation
    NAME       PROPERTY       VALUE   SOURCE
    pve02pool  capacity       56%     -
    pve02pool  size           1.73T   -
    pve02pool  health         ONLINE  -
    pve02pool  fragmentation  36%     -
    
    Udo
     
  8. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,129
    Likes Received:
    478
    the fragmentation refers to free space. if you are on SSDs, I wouldn't worry about 36% fragmentation.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    udo likes this.
  9. puertorico

    puertorico Member

    Joined:
    Mar 30, 2014
    Messages:
    33
    Likes Received:
    0
    I just saw this and have a though, the ssd's you use what are these ?

    When i first was introduced to zfs i thought that running on any fast consumer ssd like Samsung 950 evo and pro was good enough, we had several 1tb disks i raid 1. we found out the hard way that every ssd we owned was not good enough for our workload, with containers.

    THey where slow to start with but in a very short time the disk got much slower. in some cases way below performance on regular hdd, i tried every thing possible, changed server, HBA added up to 64 gb ram. nothing worked.

    Then one guy here at the forums pointed me in the direction that consumer devices was not meant for zfs.
    today we changed all the disk on all our nodes to run on raid1 with Intel DC S3710. in our test we see insanely speed improvements and even when deframnetation is 61%
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice