Over the last few days I had some odd failures and I started monitoring memory use much more closely. I've noticed that on all of my servers that do not have explicit arc limit set the ZFS arc growing to consume 100% of ram and then I start losing processes due to OOM. I don't know exactly when this started but I recently did some housekeeping and got everything updated to current versions. I believe it probably started shortly after that. All of the affected servers are running pve 7.3.3 and kernel 5.15.74.1.
Previously, arc was automatically limited to 25% of system ram on systems without an explicit arc limit. I am running a couple of machines on the 5.19 kernel but they are small servers with limited ram and I already had explicit arc limits set.
Setting a value in /sys/module/zfs/parameters/zfs_arc_max as described here to limit the arc size DOES seem to be respected and appears to fix the problem.
Previously, arc was automatically limited to 25% of system ram on systems without an explicit arc limit. I am running a couple of machines on the 5.19 kernel but they are small servers with limited ram and I already had explicit arc limits set.
Setting a value in /sys/module/zfs/parameters/zfs_arc_max as described here to limit the arc size DOES seem to be respected and appears to fix the problem.