Hello,
We have begun experimenting with ZFS encryption on JBOD SSDs and NVMes for certain VMs where Ceph with size=3 would be inappropriate. Each SSD and NVMe is understood and expected to be a single point of failure.
Servers are either 768GB or 1.5TB RAM at this point, and periodically (every few hours or sometimes as little as 1x per day) we see load averages climb from 50/80/100 that would be normal to 600/800 that are not. Then RAM consumed on host drops drastically by hundreds of GBs. Typically the system can be unresponsive for 5 minutes or so, and seems to happen sometimes when VMs are rebooted.
NB: Proxmox boot disk is NOT ZFS, but other drives are.
I believe the issue is due to the pre-allocation of RAM by ZFS, and want to curb that behavior as we don't use spinning disk, and don't need the caching per se. In essence, ZFS is solely being used in this case for the encryption feature and only where we believe it to be a better fit than our external Ceph clusters.
That said, i'm curious if anyone else is having similar behavior, and has experimented with drastically reducing ZFS memory allocation? Also, has anyone assessed performance impact of doing so when using SSD/NVMe (no spinning disk) and only standalone disks (no ZFS Raid).
Thanks,
Marco
We have begun experimenting with ZFS encryption on JBOD SSDs and NVMes for certain VMs where Ceph with size=3 would be inappropriate. Each SSD and NVMe is understood and expected to be a single point of failure.
Servers are either 768GB or 1.5TB RAM at this point, and periodically (every few hours or sometimes as little as 1x per day) we see load averages climb from 50/80/100 that would be normal to 600/800 that are not. Then RAM consumed on host drops drastically by hundreds of GBs. Typically the system can be unresponsive for 5 minutes or so, and seems to happen sometimes when VMs are rebooted.
NB: Proxmox boot disk is NOT ZFS, but other drives are.
I believe the issue is due to the pre-allocation of RAM by ZFS, and want to curb that behavior as we don't use spinning disk, and don't need the caching per se. In essence, ZFS is solely being used in this case for the encryption feature and only where we believe it to be a better fit than our external Ceph clusters.
That said, i'm curious if anyone else is having similar behavior, and has experimented with drastically reducing ZFS memory allocation? Also, has anyone assessed performance impact of doing so when using SSD/NVMe (no spinning disk) and only standalone disks (no ZFS Raid).
Thanks,
Marco