ZFS arc growing without limit PVE 8.1

niorix

New Member
Mar 26, 2024
8
1
3
Hi,
ZFS arc growing without limit after update from pve 7.4 to pve 8.1 when torrent seeding.
My hardware:
8GB ram
NVME ssd for hypervisor and VM's
2TB usb hdd with ZFS for data and torrents.
I have 1 VM (homeassistant, 4GB RAM) and 1 LXC (nas, transmission, nextcloud, samba 2GB RAM)
There are no problem on proxmox 7.4
Graph:
Blue: arc current
Purple: arc target mx size
ARC size.png
Logs:
Code:
cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=2147483648

arcstat 5
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
11:08:42     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:08:47     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:08:52     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:08:57     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:09:02     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:09:07     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:09:12     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:09:17     0       0     0       0     0      0    0   2.0G   2.0G   3.3G
11:09:22    29       0     0      29    79      0    0   2.0G   2.0G   3.3G
11:09:27     0       0     0       0     0      0    0   2.0G   2.0G   3.3G

-----------Start torrent seeding-------------------------------------------

11:09:52  1.4K     105    98     869    52    440    0   2.0G   2.0G   3.3G
11:09:57     4       0     0       3   100      1    0   2.0G   2.0G   3.3G
11:10:02     9       2   100       6   100      1    0   2.0G   2.0G   3.3G
11:10:07    10       3   100       6   100      1    0   2.0G   2.0G   3.3G
11:10:12  3.4K     286    97    2.0K    26   1.1K    0   2.0G   2.0G   3.3G
11:10:17   755      75    96     455     9    223    1   2.0G   2.0G   3.3G

---------------------------------------------------------------------------
11:54:06   954     170    92     525    20    257    0   6.3G   595M    17M
11:54:11  1.6K     225    93     935    11    450    0   6.1G   595M    67M
11:54:16  2.3K     269    93    1.3K     6    645    0   6.2G   593M   -28M
11:54:21  1.7K     260    94     955    13    471    1   6.1G   591M   149M
11:54:26   954     193    94     507    57    249    0   6.0G   729M   281M
11:54:31  1.0K     245    94     527    20    241    0   6.0G   1.3G   262M
11:54:36  2.8K     212    94    1.7K     5    845    0   6.2G   1.9G    45M
11:54:41  2.5K     330    93    1.5K    28    727    2   6.2G   1.8G    26M
11:54:46  2.3K     318    92    1.4K     9    647    0   6.4G   1.6G   -16M
11:54:51  1.9K     309    92    1.0K    19    512    2   6.2G   1.5G    59M
11:54:56  1.3K     221    94     819    23    276    3   6.1G   1.7G    92M
11:55:01  2.2K     287    91    1.2K    29    668    0   6.0G   1.7G   130M
11:55:09  2.8K     228    93    1.8K    18    700    1   6.3G   1.6G   -10M
11:55:14   383      63    88      76    96    242    0   6.3G   1.5G    55M
11:55:19  1.6K     202    92     972    14    456    0   6.3G   1.4G    29M
11:55:24  3.0K     211    91    1.9K    10    911    1   6.2G   1.3G    16M
11:55:29  1.5K     276    92     906    19    271    1   6.2G   1.3G    18M
11:55:34  1.4K     281    93     672    22    483    1   6.0G   1.3G   182M
11:55:39  2.1K     159    91    1.3K     6    641    0   6.2G   2.0G   -53M
11:55:44  1.2K     298    93     617    31    303    3   6.1G   1.9G   119M
11:55:49   352     157    92     135    98     57   17   6.0G   2.0G   215M
11:55:54   777     339    94     379    43     57    0   5.8G   2.0G   347M
11:55:59  3.0K     118    93    2.0K     7    893    1   5.9G   2.0G   197M
11:56:04  3.1K     184    92    1.8K    31   1.1K    1   6.0G   2.0G    51M
11:56:09  2.2K     118    93    1.4K    10    693    1   6.2G   1.8G   -12M
11:56:14  2.2K     197    94    1.4K     8    658    0   6.3G   1.7G    17M
11:56:19  3.1K     153    94    2.0K    38    888    1   6.3G   1.7G  -3.0M
11:56:24  3.4K     248    93    2.0K    19   1.1K    1   6.3G   1.5G    20M
Connection closing...Socket close.
-------OOM kill...---------------------------------------------------------


arc_summary -s arc

------------------------------------------------------------------------
ZFS Subsystem Report                            Thu Mar 28 11:25:55 2024
Linux 6.5.13-3-pve                                            2.2.3-pve1
Machine: pve (x86_64)                                         2.2.3-pve1

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                   217.2 %    4.3 GiB
        Target size (adaptive):                       100.0 %    2.0 GiB
        Min size (hard limit):                         11.9 %  243.4 MiB
        Max size (high water):                            8:1    2.0 GiB
        Anonymous data size:                            0.0 %    0 Bytes
        Anonymous metadata size:                        0.0 %    0 Bytes
        MFU data target:                               70.8 %    3.1 GiB
        MFU data size:                                  5.8 %  258.6 MiB
        MFU ghost data size:                                   954.5 MiB
        MFU metadata target:                            5.4 %  240.0 MiB
        MFU metadata size:                              0.9 %   40.4 MiB
        MFU ghost metadata size:                               125.0 MiB
        MRU data target:                               18.5 %  822.7 MiB
        MRU data size:                                 93.0 %    4.0 GiB
        MRU ghost data size:                                   154.2 MiB
        MRU metadata target:                            5.2 %  230.8 MiB
        MRU metadata size:                              0.2 %    9.7 MiB
        MRU ghost metadata size:                                20.4 MiB
        Uncached data size:                             0.0 %    0 Bytes
        Uncached metadata size:                         0.0 %    0 Bytes
        Bonus size:                                   < 0.1 %  146.9 KiB
        Dnode cache target:                            10.0 %  204.8 MiB
        Dnode cache size:                               0.4 %  752.4 KiB
        Dbuf size:                                    < 0.1 %  430.3 KiB
        Header size:                                    0.2 %   10.5 MiB
        L2 header size:                                 0.0 %    0 Bytes
        ABD chunk waste size:                         < 0.1 %   24.0 KiB

ARC hash breakdown:
        Elements max:                                              66.2k
        Elements current:                              69.2 %      45.8k
        Collisions:                                                50.6k
        Chain max:                                                     3
        Chains:                                                     1.0k

ARC misc:
        Deleted:                                                    1.6M
        Mutex misses:                                               6.2k
        Eviction skips:                                             2.9M
        Eviction skips due to L2 writes:                               0
        L2 cached evictions:                                     0 Bytes
        L2 eligible evictions:                                  29.8 GiB
        L2 eligible MFU evictions:                     42.0 %   12.5 GiB
        L2 eligible MRU evictions:                     58.0 %   17.3 GiB
        L2 ineligible evictions:                               167.3 GiB

Any idea?

EDIT:
The problem occurs only with intensive random reading of highly fragmented data (torrents seeding in my case).
In other cases, such as copy files inside system, sequential copy highly fragmented data (e.g. torrents), samba does not cause problems.
Besides, if I copy downloaded torrents to another place and then I'll return it back (eliminating fragmentation) than torrent seeding does not cause ARC overflow.
 
Last edited:
Hi niorix,

I saw that you got 8gb ram total this is not enough to runnen ZFS and give arround 6gb total to your Apps = Home assistant and your Lxc.
Step 1: to upgrade memory this will be better in the long term.
Alternitive: is to set this
cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=2147483648

TO This.
cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=147483648

and reboot the machine.
 
Hi, Netwerkfix,
Could you tell me why the PVE 7.4 ARC grows no more than a given value, but on PVE 8.1 with the same settings ARC occupies memory in excess of the established limit? Is it normal, or is the kernel (or zfs) bug?

P.S.
I tried set limit at 146Mb. Zfs could care absolutely less about my limit:
canvas.png
I got OOM again
 
Could you tell me why the PVE 7.4 ARC grows no more than a given value, but on PVE 8.1 with the same settings ARC occupies memory in excess of the established limit? Is it normal, or is the kernel (or zfs) bug?

P.S.
I tried set limit at 146Mb. Zfs could care absolutely less about my limit:
It could be that newer ZFS versions (that come with the newer kernel of PVE 8) ignore the max when it is less than the (default) min. Please try setting both zfs_arc_max and zfs_arc_min (to the same value, for example).
 
Set primarycache to metadata only.
Since you have so little ram for zfs, the default value of all makes no sense. Dont set primarycache to off, since that will slowdown zfs utterly (because it will still copy blocks to ram for checksums/compression/deduplication/etc...) and is not recommended at all.

metadata only should use your limited memory more effectively at least.
If it uses less memory is another story, that depends on your ZFS pool size. But in theory it should consume less memory.

Otherwise, like people said, putting more memory inside that system, if its possible, is a far better solution.

Cheers :-)
 
It could be that newer ZFS versions (that come with the newer kernel of PVE 8) ignore the max when it is less than the (default) min. Please try setting both zfs_arc_max and zfs_arc_min (to the same value, for example).
I tried set
cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=147483648
options zfs zfs_arc_max=14748364
Limit was applied, but no effect. Arc grown more than limit and allocate all RAM.
The limits can also be set online, so try to reset the maximum on a system that its maximum has been surpassed.
I set limit at 2GB and seen whan ARC allocate about 3GB. I tried set limit 1Gb (and less) use
echo "$[1 * 1024*1024*1024]" >/sys/module/zfs/parameters/zfs_arc_max
No effect...

Set primarycache to metadata only.
Since you have so little ram for zfs, the default value of all makes no sense. Dont set primarycache to off, since that will slowdown zfs utterly (because it will still copy blocks to ram for checksums/compression/deduplication/etc...) and is not recommended at all.

metadata only should use your limited memory more effectively at least.
If it uses less memory is another story, that depends on your ZFS pool size. But in theory it should consume less memory.

Otherwise, like people said, putting more memory inside that system, if its possible, is a far better solution.

Cheers :)
At first glance it seems to work!

But I still don’t understand why the PVE 7.4 ARC grows no more than a given value, but on PVE 8.1 with the same settings ARC occupies memory in excess of the established limit...
It seems the problem occurs when reading highly fragmented data. There are no problems when seeding torrents if I copy the torrent data to eliminate fragmentation.
 
I tried set
cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=147483648
options zfs zfs_arc_max=14748364
Limit was applied, but no effect. Arc grown more than limit and allocate all RAM.
It looks like you set max twice and 14(0)MB is very little. Try options zfs zfs_arc_min=268435456 zfs_arc_max=1073741824
But I still don’t understand why the PVE 7.4 ARC grows no more than a given value, but on PVE 8.1 with the same settings ARC occupies memory in excess of the established limit...
That would be a serious ZFS bug. Can you show that c (current) grows beyond c_max (Max size) in the output of arc_summary -r?

EDIT: Indeed, the first post already shows this. I'm confused and could not find a reason (or known ZFS issue) to explain the 217% current size.
 
Last edited:
> Limit was applied, but no effect.

If the zfs_arc_max setting is "too low" (from ZFS's point of view) it will ignore it and print an error message about that in the system log.

The way around that is to set the zfs_arc_min value first (ie change the ARC minimum size). After that, ZFS will accept the value you give it for the maximum size rather than ignore it.

At least, that's the theory. :)
 
In my first post:
Code:
ARC size (current):                                   217.2 %    4.3 GiB
        Target size (adaptive):                       100.0 %    2.0 GiB
        Min size (hard limit):                         11.9 %  243.4 MiB
        Max size (high water):                            8:1    2.0 GiB
It can be seen that the minimum and maximum values are set correctly.

and
Code:
time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
11:56:24  3.4K     248    93    2.0K    19   1.1K    1   6.3G   1.5G    20M
The limit was reduced due to lack of memory, but the ARC continues to grow.
It can be seen that the minimum and maximum values are set correctly.
I'll do more testing and post later
 
Last edited:
In my first post:
Code:
ARC size (current):                                   217.2 %    4.3 GiB
        Target size (adaptive):                       100.0 %    2.0 GiB
        Min size (hard limit):                         11.9 %  243.4 MiB
        Max size (high water):                            8:1    2.0 GiB
It can be seen that the minimum and maximum values are set correctly.

and
Code:
time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
11:56:24  3.4K     248    93    2.0K    19   1.1K    1   6.3G   1.5G    20M
The limit was reduced due to lack of memory, but the ARC continues to grow.
It can be seen that the minimum and maximum values are set correctly.
I'll do more testing and post later

That would be a serious ZFS bug. Can you show that c (current) grows beyond c_max (Max size) in the output of arc_summary -r?
May I use this?
Code:
cat /proc/spl/kstat/zfs/arcstats | grep '^c \|^c_min\|^c_max\|^size'
c                               4    268435456
c_min                           4    268435456
c_max                           4    1073741824
size                            4    31894368
 
Hello, all wrong.

This work for all my Proxmox machine.
You must set zfs_arc_min and zfs_arc_max.
After you run $ update-initramfs -u -k all as root, it will work.

The Option zfs_flags is set, because i don't have ECC Ram on my desktop pc.
I set it also on my servers.

Code:
# https://forum.proxmox.com/threads/disable-zfs-arc-or-limiting-it.77845/
# https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2020.04%20Root%20on%20ZFS.html
#
# cat /sys/module/zfs/parameters/zfs_arc_min
# cat /sys/module/zfs/parameters/zfs_arc_max
#
# https://www.reddit.com/r/zfs/comments/8102nf/any_experience_with_the_unsupported_openzfs/
options zfs zfs_flags=0x10
#
# Set Min ARC size => 1GB == 1073741824
options zfs zfs_arc_min=1073741824

# Set Max ARC Size
# 2 GB
#options zfs zfs_arc_max=2147483648
# 3 GB
#options zfs zfs_arc_max=3221225472
# 4 GB
options zfs zfs_arc_max=4294967296

# sudo update-initramfs -u -k all
 
So, i configured my zfs
Code:
cat /etc/modprobe.d/zfs.conf
# Set Min ARC size => 256MB

options zfs zfs_arc_min=268435456
# Set Max ARC Size
# 1 GB

options zfs zfs_arc_max=1073741824
# 2 GB
#options zfs zfs_arc_max=2147483648
# 3 GB
#options zfs zfs_arc_max=3221225472
Code:
zfs get primarycache
NAME                  PROPERTY      VALUE         SOURCE
data                  primarycache  all           default
data/pve              primarycache  all           default
data/storage          primarycache  all           default
data/storage/torrent  primarycache  all           local
The initial state:
Lxc, named nas is running
Homeassistant VM is stopped
arc_min=256MB
arc_max=1G (I tried 2Gb and 3GB with same result)

arcstat
Code:
time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
12:47:55     2       0     0       2   100      0    0   238M   256M   4.0G
cat /proc/spl/kstat/zfs/arcstats | grep '^c \|^c_min\|^c_max\|^size'
Code:
c                               4    268697600
c_min                           4    268435456
c_max                           4    1073741824
size                            4    249602648
arc_summary -r
in arc_summary_init.txt

I start transmission-daemon and wait. After some time I got OOM

State before OOM:
arcstat
Code:
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
13:37:37   857     159    93     466    10    229    0   6.3G  1024M   263M
13:37:42  1.4K     143    93     871     5    433    0   6.4G  1024M   142M
13:37:47  1.5K     169    92     879     5    437    0   6.4G  1024M   120M
13:37:52   841     159    93     456     9    225    0   6.4G  1024M   167M
13:37:57  3.3K     159    93    2.1K     2   1.0K    0   6.6G   867M   -19M
13:38:02  2.1K     164    93    1.3K     3    637    0   6.7G   866M  -114M
13:38:07  1.4K     145    93     863     4    428    0   6.7G   500M   -76M
I see that the target size is decreasing due to lack of memory, but the cache is not cleared and continues to grow until RAM is exhausted

cat /proc/spl/kstat/zfs/arcstats | grep '^c \|^c_min\|^c_max\|^size'
Code:
Tue Apr  2 01:37:55 PM +07 2024

c                               4    1072168960
c_min                           4    268435456
c_max                           4    1073741824
size                            4    7173696928
Tue Apr  2 01:38:00 PM +07 2024
c                               4    909517119
c_min                           4    268435456
c_max                           4    1073741824
size                            4    6968620368
Tue Apr  2 01:38:05 PM +07 2024
c                               4    708970124
c_min                           4    268435456
c_max                           4    1073741824
size                            4    7135511896
Tue Apr  2 01:38:10 PM +07 2024
c                               4    451144509
c_min                           4    268435456
c_max                           4    1073741824
size                            4    7115670280
с does not exceed c_max, but real size lot more of c

arc_summary -r
in arc_summary_last.txt

iotop
Code:
Total DISK READ:        35.08 M/s | Total DISK WRITE:        22.30 K/s
Current DISK READ:      36.87 M/s | Current DISK WRITE:       2.91 M/s
    278 be/3 root        0.00 B/s    7.17 K/s ?unavailable?  [jbd2/dm-1-8]   1061 be/4 root      296.30 K/s    6.37 K/s ?unavailable?  pmxcfs
   1060 be/4 root       52.57 K/s    3.19 K/s ?unavailable?  pmxcfs
   1177 be/3 root        0.00 B/s    3.19 K/s ?unavailable?  [jbd2/dm-6-8] 102956 be/4 www-data  174.43 K/s 1631.24 B/s ?unavailable?  pveproxy worker
 107148 be/4 root       58.94 K/s  815.62 B/s ?unavailable?  python3 /usr/sbin/iotop -oq -d 5
    550 be/0 root        8.70 M/s    0.00 B/s ?unavailable?  [z_rd_int]    661 be/4 _rpc       17.52 K/s    0.00 B/s ?unavailable?  rpcbind -f -w
   1243 be/4 root       23.10 K/s    0.00 B/s ?unavailable?  rrdcached -B -b /var/lib/rrdcached/db/ -j /var/lib/rrdcached/journal/ -p /var/run/rrdcached.pid -l unix:/var/run/rrdcached.sock
   1059 be/4 root      523.30 K/s    0.00 B/s ?unavailable?  pmxcfs [server]   1104 be/4 root     1631.24 B/s    0.00 B/s ?unavailable?  pmxcfs
   1055 be/4 root       27.08 K/s    0.00 B/s ?unavailable?  master -w
   1062 be/4 root       30.27 K/s    0.00 B/s ?unavailable?  cron -f
   1069 be/4 root      618.88 K/s    0.00 B/s ?unavailable?  pve-firewall
   1096 be/4 root        3.21 M/s    0.00 B/s ?unavailable?  pvestatd
   1172 be/4 root      171.25 K/s    0.00 B/s ?unavailable?  lxc-start -F -n 101
   1221 be/4 root      202.31 K/s    0.00 B/s ?unavailable?  init
   1314 be/3 root       11.95 K/s    0.00 B/s ?unavailable?  systemd-journald
   1505 be/4 root      175.23 K/s    0.00 B/s ?unavailable?  cron -f -P
   1521 be/4 root       73.28 K/s    0.00 B/s ?unavailable?  php-fpm: master process (/etc/php/8.2/fpm/php-fpm.conf)   1522 be/4 statd      49.38 K/s    0.00 B/s ?unavailable?  redis-server.sock
   1651 be/4 root       13.54 K/s    0.00 B/s ?unavailable?  pvescheduler
  30971 be/4 root       29.47 K/s    0.00 B/s ?unavailable?  sshd: root@pts/1
  79633 be/4 postfix   320.19 K/s    0.00 B/s ?unavailable?  transmission-daemon -f --log-level=error
  79635 be/4 postfix     4.44 M/s    0.00 B/s ?unavailable?  transmission-daemon -f --log-level=error
 102384 be/4 postfix     7.97 K/s    0.00 B/s ?unavailable?  pickup -l -t unix -u -c
 102947 be/4 root       79.65 K/s    0.00 B/s ?unavailable?  pvedaemon worker
 102948 be/4 root       41.42 K/s    0.00 B/s ?unavailable?  pvedaemon worker
 102949 be/4 root       33.45 K/s    0.00 B/s ?unavailable?  pvedaemon worker
 102955 be/4 www-data  815.62 B/s    0.00 B/s ?unavailable?  pveproxy worker
 105474 be/4 root      815.62 B/s    0.00 B/s ?unavailable?  [kworker/u8:2-writeback] 105881 be/4 root      129.83 K/s    0.00 B/s ?unavailable?  python3 /usr/sbin/arcstat 5
 107140 be/4 root      815.62 B/s    0.00 B/s ?unavailable?  python3 /usr/sbin/arcstat 5
 107255 be/0 root        8.61 M/s    0.00 B/s ?unavailable?  [z_rd_int] 107257 be/0 root        7.00 M/s    0.00 B/s ?unavailable?  [z_rd_int] 108897 be/4 root      815.62 B/s    0.00 B/s ?unavailable?  python3 /usr/sbin/iotop -o
 109917 be/4 root        2.39 K/s    0.00 B/s ?unavailable?  [kworker/u8:0-events_power_efficient] 111324 be/0 root       26.28 K/s    0.00 B/s ?unavailable?  [z_rd_iss]
There is heavy disk load by z_rd_int process...
Full logs are in attachments
 

Attachments

Ouch, that sounds like a bug. :( :( :(

This is one of the systems here (a little HP Microserver Gen8):

Bash:
root@someserver:~# cat /etc/modprobe.d/zfs.conf
# 2024.03.27 JC Limit ZFS ARC to 1GB max
options zfs zfs_arc_min=536870912
options zfs zfs_arc_max=1073741824

Bash:
root@someserver:~# arcstat
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
06:02:18     0       0     0       0     0      0    0   1.0G  1024M   9.6G

Maybe as an experiment, copy the two lines above used on my (working) system and see if anything changes?
 
Oh hang on. Are you rebooting your system after changing the values in that file?

Asking because the values in /etc/modprobe.d/ files are only applied when a system boots.

To apply values to a running system, write them into the appropriate virtual file under the /sys/ structure. Like this:
Bash:
# echo 536870912 > /sys/module/zfs/parameters/zfs_arc_min
# echo 1073741824 > /sys/module/zfs/parameters/zfs_arc_max

Note that those two echo commands need to be run as the root user, as other users don't have permission to write into those two files.
 
Last edited:
Oh hang on. Are you rebooting your system after changing the values in that file?
Yes of cource. I update initramfs and reboot pve node
My current params is
Code:
root@pve:~# cat /sys/module/zfs/parameters/zfs_arc_min
268435456
root@pve:~# cat /sys/module/zfs/parameters/zfs_arc_max
1073741824

Edit: I'll repeat it again -
The problem occurs only with intensive random reading of highly fragmented data (torrents seeding in my case).
In other cases, such as copy files inside system, sequential copy highly fragmented data (e.g. torrents), samba does not cause problems.
Besides, if I copy downloaded torrents to another place and then I'll return it back (eliminating fragmentation) than torrent seeding does not cause ARC overflow.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!