ZFS Pool Optimization

Alek5 · Nov 29, 2022

Hi,
I've a PVE box setup with two zfs pools :

Bash:

root@pve:~# zpool status -v ONE_Pool
  pool: ONE_Pool
 state: ONLINE
  scan: scrub in progress since Tue Nov 29 11:48:09 2022
    194G scanned at 6.91G/s, 2.67M issued at 97.7K/s, 948G total
    0B repaired, 0.00% done, no estimated completion time
config:

    NAME                                                STATE     READ WRITE CKSUM
    ONE_Pool                                        ONLINE       0     0     0
      raidz1-0                                          ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6HJPTTJ        ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N6HJPPKV        ONLINE       0     0     0
        ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N4AY4UFU        ONLINE       0     0     0
    logs   
      ata-KINGSTON_SA400S37240G_50026B7782F57CEF-part1  ONLINE       0     0     0
    cache
      ata-KINGSTON_SA400S37240G_50026B7782F57CEF-part2  ONLINE       0     0     0

errors: No known data errors

root@pve:~# zpool status -v TWO_Pool
  pool: TWO_Pool
 state: ONLINE
  scan: scrub repaired 0B in 01:54:32 with 0 errors on Sun Nov 13 02:18:35 2022
config:

    NAME                        STATE     READ WRITE CKSUM
    TWO_Pool                   ONLINE       0     0     0
      raidz1-0                  ONLINE       0     0     0
        scsi-35000c500565d8e37  ONLINE       0     0     0
        scsi-35000c500565daf43  ONLINE       0     0     0
        scsi-35000c500565ddb63  ONLINE       0     0     0

errors: No known data errors

Bash:

root@pve:~# arc_summary

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Nov 29 12:05:18 2022
Linux 5.15.74-1-pve                                           2.1.6-pve1
Machine: pve (x86_64)                                  2.1.6-pve1

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                   100.0 %   39.3 GiB
        Target size (adaptive):                       100.0 %   39.3 GiB
        Min size (hard limit):                          6.2 %    2.5 GiB
        Max size (high water):                           16:1   39.3 GiB
        Most Frequently Used (MFU) cache size:         13.7 %    5.1 GiB
        Most Recently Used (MRU) cache size:           86.3 %   32.1 GiB
        Metadata cache size (hard limit):              75.0 %   29.4 GiB
        Metadata cache size (current):                 15.8 %    4.7 GiB
        Dnode cache size (hard limit):                 10.0 %    2.9 GiB
        Dnode cache size (current):                     0.5 %   13.6 MiB

ARC hash breakdown:
        Elements max:                                              30.7M
        Elements current:                              25.2 %       7.7M
        Collisions:                                                 1.6G
        Chain max:                                                    14
        Chains:                                                     1.3M

ARC misc:
        Deleted:                                                    1.5G
        Mutex misses:                                             198.4k
        Eviction skips:                                            30.6k
        Eviction skips due to L2 writes:                            2.5M
        L2 cached evictions:                                     2.9 TiB
        L2 eligible evictions:                                   6.0 TiB
        L2 eligible MFU evictions:                      9.7 %  597.9 GiB
        L2 eligible MRU evictions:                     90.3 %    5.4 TiB
        L2 ineligible evictions:                                 3.0 TiB

ARC total accesses (hits + misses):                                 3.3G
        Cache hit ratio:                               61.5 %       2.0G
        Cache miss ratio:                              38.5 %       1.3G
        Actual hit ratio (MFU + MRU hits):             61.2 %       2.0G
        Data demand efficiency:                        23.5 %       1.2G
        Data prefetch efficiency:                       2.5 %     338.7M

Cache hits by cache type:
        Most frequently used (MFU):                    78.0 %       1.6G
        Most recently used (MRU):                      21.7 %     437.1M
        Most frequently used (MFU) ghost:               0.3 %       5.9M
        Most recently used (MRU) ghost:                 0.5 %      10.3M

Cache hits by data type:
        Demand data:                                   14.2 %     287.1M
        Demand prefetch data:                           0.4 %       8.5M
        Demand metadata:                               85.3 %       1.7G
        Demand prefetch metadata:                     < 0.1 %     566.8k

Cache misses by data type:
        Demand data:                                   73.7 %     933.1M
        Demand prefetch data:                          26.1 %     330.3M
        Demand metadata:                                0.1 %       1.7M
        Demand prefetch metadata:                       0.1 %     635.8k

DMU prefetch efficiency:                                           46.7M
        Hit ratio:                                     16.4 %       7.7M
        Miss ratio:                                    83.6 %      39.1M

L2ARC status:                                                   DEGRADED
        Low memory aborts:                                             4
        Free on write:                                             20.8M
        R/W clashes:                                                 246
        Bad checksums:                                                21
        I/O errors:                                                    0

L2ARC size (adaptive):                                          20.3 GiB
        Compressed:                                    89.2 %   18.1 GiB
        Header size:                                    0.6 %  132.9 MiB
        MFU allocated size:                            19.8 %    3.6 GiB
        MRU allocated size:                            80.0 %   14.5 GiB
        Prefetch allocated size:                        0.2 %   34.3 MiB
        Data (buffer content) allocated size:          97.8 %   17.7 GiB
        Metadata (buffer content) allocated size:       2.2 %  417.5 MiB

L2ARC breakdown:                                                  816.9M
        Hit ratio:                                      2.8 %      23.3M
        Miss ratio:                                    97.2 %     793.6M
        Feeds:                                                    882.8k

L2ARC writes:
        Writes sent:                                    100 %     443.3k

L2ARC evicts:
        Lock retries:                                              21.0k
        Upon reading:                                                172

Solaris Porting Layer (SPL):
        spl_hostid                                                     0
        spl_hostid_path                                      /etc/hostid
        spl_kmem_alloc_max                                       1048576
        spl_kmem_alloc_warn                                        65536
        spl_kmem_cache_kmem_threads                                    4
        spl_kmem_cache_magazine_size                                   0
        spl_kmem_cache_max_size                                       32
        spl_kmem_cache_obj_per_slab                                    8
        spl_kmem_cache_reclaim                                         0
        spl_kmem_cache_slab_limit                                  16384
        spl_max_show_tasks                                           512
        spl_panic_halt                                                 0
        spl_schedule_hrtimeout_slack_us                                0
        spl_taskq_kick                                                 0
        spl_taskq_thread_bind                                          0
        spl_taskq_thread_dynamic                                       1
        spl_taskq_thread_priority                                      1
        spl_taskq_thread_sequential                                    4

Yesterday I moved one VM disk from poll ONE to TWO and it took way too much time :

Code:

drive-scsi2: transferred 2.0 TiB of 2.0 TiB (100.00%) in 15h 22m 23s, ready

Any advice on how to get better performance ?

LnxBil · Nov 29, 2022

Alek5 said:
Any advice on how to get better performance ?

The only configuration parameters you can vary are:
- use one pool
- use more vdevs
- use better hardware (-> SSD instead of HDD)
- use more hardware (and in more vdevs)

BTW: You L2ARC is cache hit ratio is very bad and is IMHO just making things slower in this constellation.

Dunuin · Nov 29, 2022

Alek5 said:
SA400S37240G

And for SLOG/L2ARC you are using a consumer QLC SSD. The SLOG will kill it very quickly and it is terrible slow (does not have to faster than a HDD when using sync writes...and that is what a SLOG is doing...).

If you really want to get more performance, buy an additional HDD (or even more) and recreate it as a striped mirror. If you then still need more performance get two very small enterprise grade SSDs and add them as a mirrored special device. And if you still need more performance, throw those HDDs out and only use enterprise SSDs instead.

cuttys · Nov 29, 2022

what would i pay attention to:

scan: scrub in progress since Tue Nov 29 11:48:09 2022

L2ARC status: DEGRADED
Low memory aborts: 4
Free on write: 20.8M
R/W clashes: 246
Bad checksums: 21

ZFS Pool Optimization

Alek5

Member

LnxBil

Distinguished Member

Dunuin

Distinguished Member

cuttys

Member

We value your privacy