What is metadata and dnode cache in ZFS ARC and their limits?

mailinglists

Renowned Member
Mar 14, 2012
641
69
93
Hi.

Looking at:
Code:
ARC size (current):                                   100.2 %   20.0 GiB
        Target size (adaptive):                       100.0 %   20.0 GiB
        Min size (hard limit):                         95.3 %   19.1 GiB
        Max size (high water):                            1:1   20.0 GiB
        Most Frequently Used (MFU) cache size:         98.3 %   18.0 GiB
        Most Recently Used (MRU) cache size:            1.7 %  316.7 MiB
        Metadata cache size (hard limit):              75.0 %   15.0 GiB
        Metadata cache size (current):                 13.2 %    2.0 GiB
        Dnode cache size (hard limit):                 10.0 %    1.5 GiB
        Dnode cache size (current):                     1.5 %   22.9 MiB

I see that ZFS now has separate dnode and metadata cache.

I have some idea what metadata cache might be (fs attributes and map), but I have no idea what dnode cache would be.
When setting up a zfs "special" device for metadata caching, does it contain also dnode cache?
If anyone can point to some docs about this, I will be grateful.

As I have max and min ARC cache values defined, I wonder do these dnode and metadata cache types fall inside these limits, or do I need to count them separately?

I see that options exist:
Code:
root@p35:~# ls -la /sys/module/zfs/parameters/ | grep -i 'dnode\|meta'
-rw-r--r--  1 root root 4096 Oct 12 16:50 dbuf_metadata_cache_max_bytes
-rw-r--r--  1 root root 4096 Oct 12 16:50 dbuf_metadata_cache_shift
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_aliquot
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_bias_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_debug_load
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_debug_unload
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_df_max_search
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_df_use_largest_segment
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_force_ganging
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_fragmentation_factor_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_lba_weighting_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_preload_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 spa_load_verify_metadata
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_dnode_limit
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_dnode_limit_percent
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_dnode_reduce_percent
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_adjust_restarts
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_limit
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_limit_percent
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_min
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_prune
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_strategy
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_metaslab_fragmentation_threshold
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_metaslab_segment_weight_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_metaslab_switch_threshold
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_special_class_metadata_reserve_pct
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_trim_metaslab_skip
 
Hi.

Looking at:
Code:
ARC size (current):                                   100.2 %   20.0 GiB
        Target size (adaptive):                       100.0 %   20.0 GiB
        Min size (hard limit):                         95.3 %   19.1 GiB
        Max size (high water):                            1:1   20.0 GiB
        Most Frequently Used (MFU) cache size:         98.3 %   18.0 GiB
        Most Recently Used (MRU) cache size:            1.7 %  316.7 MiB
        Metadata cache size (hard limit):              75.0 %   15.0 GiB
        Metadata cache size (current):                 13.2 %    2.0 GiB
        Dnode cache size (hard limit):                 10.0 %    1.5 GiB
        Dnode cache size (current):                     1.5 %   22.9 MiB

I see that ZFS now has separate dnode and metadata cache.

those are not really separate caches, but limits inside the cache for specific types of data

I have some idea what metadata cache might be (fs attributes and map), but I have no idea what dnode cache would be.
When setting up a zfs "special" device for metadata caching, does it contain also dnode cache?

the special device is not related to the ARC at all. it also does not cache metadata, it stores it (this is a very important difference when you think about pool reliablity/failure modes!).

If anyone can point to some docs about this, I will be grateful.

there is man zfs-module-parameters, which is rather on the terse side and assumes a bit of knowledge of ZFS internals. you might be interested in https://openzfs.github.io/openzfs-docs/Performance and Tuning/ZFS on Linux Module Parameters.html , which is basically the same info but annotated with hints on then to change which values. there are also older docs that describe how ZFS is organized internally, but most of that is not up to date. e.g., https://web.archive.org/web/2008123...rg/os/community/zfs/docs/ondiskformat0822.pdf

As I have max and min ARC cache values defined, I wonder do these dnode and metadata cache types fall inside these limits, or do I need to count them separately?

inside, and in general you'd only need to tune them if you run into issues.

I see that options exist:
Code:
root@p35:~# ls -la /sys/module/zfs/parameters/ | grep -i 'dnode\|meta'
-rw-r--r--  1 root root 4096 Oct 12 16:50 dbuf_metadata_cache_max_bytes
-rw-r--r--  1 root root 4096 Oct 12 16:50 dbuf_metadata_cache_shift
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_aliquot
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_bias_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_debug_load
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_debug_unload
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_df_max_search
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_df_use_largest_segment
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_force_ganging
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_fragmentation_factor_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_lba_weighting_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 metaslab_preload_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 spa_load_verify_metadata
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_dnode_limit
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_dnode_limit_percent
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_dnode_reduce_percent
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_adjust_restarts
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_limit
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_limit_percent
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_min
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_prune
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_arc_meta_strategy
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_metaslab_fragmentation_threshold
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_metaslab_segment_weight_enabled
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_metaslab_switch_threshold
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_special_class_metadata_reserve_pct
-rw-r--r--  1 root root 4096 Oct 12 16:50 zfs_trim_metaslab_skip

only those with zfs_arc prefix affect the limits/behaviour of ARC
 
Hi,

My monitoring tool is complaining about high ARC dnode usage: "ZFS ARC dnode size > 90% dnode max size on Hypervisor" but my ARC hit rate is still fine with around 95-98%.

Do I need to increase the ARC size or isn't it problematic if the ARC dnode size is very high? I'm already using 8GB ARC for 2x 3TB + 5x 200GB drives and don't really got unused RAM to increase it.

Is it possible to manually increase the "Dnode cache size (hard limit)" of 10.0 % to something higher?

For example editing "/etc/modprobe.d/zfs.conf" and adding the line "options zfs zfs_arc_dnode_limit_percent=15"?



arc_summary
Code:
ZFS Subsystem Report                            Tue Jan 12 03:45:49 2021
Linux 5.4.78-2-pve                                            0.8.5-pve1
Machine: Hypervisor (x86_64)                                  0.8.5-pve1


ARC status:                                                      HEALTHY
        Memory throttle count:                                         0


ARC size (current):                                   100.1 %    8.0 GiB
        Target size (adaptive):                       100.0 %    8.0 GiB
        Min size (hard limit):                        100.0 %    8.0 GiB
        Max size (high water):                            1:1    8.0 GiB
        Most Frequently Used (MFU) cache size:         45.1 %    3.1 GiB
        Most Recently Used (MRU) cache size:           54.9 %    3.7 GiB
        Metadata cache size (hard limit):              75.0 %    6.0 GiB
        Metadata cache size (current):                 40.8 %    2.4 GiB
        Dnode cache size (hard limit):                 10.0 %  614.4 MiB
        Dnode cache size (current):                    96.1 %  590.4 MiB


ARC hash breakdown:
        Elements max:                                               1.0M
        Elements current:                              58.8 %     602.4k
        Collisions:                                                 1.1M
        Chain max:                                                     5
        Chains:                                                    20.9k


ARC misc:
        Deleted:                                                    6.7M
        Mutex misses:                                                406
        Eviction skips:                                               98


ARC total accesses (hits + misses):                               174.5M
        Cache hit ratio:                               95.9 %     167.3M
        Cache miss ratio:                               4.1 %       7.2M
        Actual hit ratio (MFU + MRU hits):             95.5 %     166.6M
        Data demand efficiency:                        97.1 %     138.2M
        Data prefetch efficiency:                      12.1 %       3.3M


Cache hits by cache type:
        Most frequently used (MFU):                    53.6 %      89.6M
        Most recently used (MRU):                      46.0 %      77.0M
        Most frequently used (MFU) ghost:             < 0.1 %      76.2k
        Most recently used (MRU) ghost:                 0.1 %      95.3k
        Anonymously used:                               0.3 %     553.1k


Cache hits by data type:
        Demand data:                                   80.2 %     134.2M
        Demand prefetch data:                           0.2 %     397.7k
        Demand metadata:                               19.3 %      32.3M
        Demand prefetch metadata:                       0.2 %     399.7k


Cache misses by data type:
        Demand data:                                   56.8 %       4.1M
        Demand prefetch data:                          40.5 %       2.9M
        Demand metadata:                                2.2 %     159.6k
        Demand prefetch metadata:                       0.5 %      35.0k


DMU prefetch efficiency:                                           68.4M
        Hit ratio:                                      2.0 %       1.4M
        Miss ratio:                                    98.0 %      67.1M


L2ARC not detected, skipping section


Solaris Porting Layer (SPL):
        spl_hostid                                                     0
        spl_hostid_path                                      /etc/hostid
        spl_kmem_alloc_max                                       1048576
        spl_kmem_alloc_warn                                        65536
        spl_kmem_cache_expire                                          2
        spl_kmem_cache_kmem_limit                                   2048
        spl_kmem_cache_kmem_threads                                    4
        spl_kmem_cache_magazine_size                                   0
        spl_kmem_cache_max_size                                       32
        spl_kmem_cache_obj_per_slab                                    8
        spl_kmem_cache_obj_per_slab_min                                1
        spl_kmem_cache_reclaim                                         0
        spl_kmem_cache_slab_limit                                  16384
        spl_max_show_tasks                                           512
        spl_panic_halt                                                 0
        spl_schedule_hrtimeout_slack_us                                0
        spl_taskq_kick                                                 0
        spl_taskq_thread_bind                                          0
        spl_taskq_thread_dynamic                                       1
        spl_taskq_thread_priority                                      1
        spl_taskq_thread_sequential                                    4

VDEV cache disabled, skipping section

ZIL committed transactions:                                       630.6k
        Commit requests:                                          207.1k
        Flushes to stable storage:                                207.1k
        Transactions to SLOG storage pool:            0 Bytes          0
        Transactions to non-SLOG storage pool:        8.8 GiB     249.6k
Removed tunables section because post was exceeding limits.

Edit:
Ok, increasing the "Dnode cache size" by 5% to worked:
Dnode cache size (hard limit): 15.0 % 921.6 MiB
Do I need to decrease something else like "Metadata cache size" by 5% so in total it is the same?
 
Last edited:
  • Like
Reactions: mailinglists
Me again. The monitoring tool again complained about dnode cache being full. So I increased it again from 15% to 20%. But now its full again ...

What exactly is the dnode cache used for and more important, is it a problem if the dnode cache is full or will ZFS still work fine?

Code:
ZFS Subsystem Report                            Mon Jun 07 16:11:41 2021
Linux 5.4.114-1-pve                                           2.0.4-pve1
Machine: Hypervisor (x86_64)                                  2.0.4-pve1

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    99.5 %    8.0 GiB
        Target size (adaptive):                       100.0 %    8.0 GiB
        Min size (hard limit):                        100.0 %    8.0 GiB
        Max size (high water):                            1:1    8.0 GiB
        Most Frequently Used (MFU) cache size:         50.4 %    3.0 GiB
        Most Recently Used (MRU) cache size:           49.6 %    2.9 GiB
        Metadata cache size (hard limit):              70.0 %    5.6 GiB
        Metadata cache size (current):                 71.5 %    4.0 GiB
        Dnode cache size (hard limit):                 20.0 %    1.1 GiB
        Dnode cache size (current):                    97.7 %    1.1 GiB

ARC hash breakdown:
        Elements max:                                             591.4k
        Elements current:                              77.9 %     460.6k
        Collisions:                                               523.9k
        Chain max:                                                     4
        Chains:                                                    12.2k

ARC misc:
        Deleted:                                                    6.3M
        Mutex misses:                                                375
        Eviction skips:                                              327

ARC total accesses (hits + misses):                               241.5M
        Cache hit ratio:                               97.2 %     234.8M
        Cache miss ratio:                               2.8 %       6.7M
        Actual hit ratio (MFU + MRU hits):             97.0 %     234.2M
        Data demand efficiency:                        99.2 %     197.5M
        Data prefetch efficiency:                       5.7 %       5.3M

Cache hits by cache type:
        Most frequently used (MFU):                    35.1 %      82.4M
        Most recently used (MRU):                      64.7 %     151.8M
        Most frequently used (MFU) ghost:             < 0.1 %       5.0k
        Most recently used (MRU) ghost:               < 0.1 %      32.6k
        Anonymously used:                               0.2 %     509.8k

Cache hits by data type:
        Demand data:                                   83.5 %     196.0M
        Demand prefetch data:                           0.1 %     303.9k
        Demand metadata:                               16.2 %      38.1M
        Demand prefetch metadata:                       0.2 %     369.1k

Cache misses by data type:
        Demand data:                                   22.6 %       1.5M
        Demand prefetch data:                          74.6 %       5.0M
        Demand metadata:                                2.1 %     141.6k
        Demand prefetch metadata:                       0.7 %      47.4k

DMU prefetch efficiency:                                           30.9M
        Hit ratio:                                     36.6 %      11.3M
        Miss ratio:                                    63.4 %      19.6M
 
Me again. The monitoring tool again complained about dnode cache being full. So I increased it again from 15% to 20%. But now its full again ...

What exactly is the dnode cache used for and more important, is it a problem if the dnode cache is full or will ZFS still work fine?
Did you find an answer? I also wonder. dnode_size keeps increasing, doesn't matter what is the zfs_arc_dnode_limit_percent I set.
I've been tracking dnod_size while doing a dump of a 120+ GB of size, and I can see it keeps increasing and end above the limit set.

Update: by tracking minute by minute the use of dnode_size while running the backup, I finally found a time when dnode_size starts to decrease. So I've set the limit accordingly. It's just a pretty strange behavior that it keeps increasing despite the limit set...
 
Last edited: