Dear Proxmox community,
After several searches in the forum, I couldn't find much information regarding ZFS storage and its performance tuning. Thus, I'd like to start this thread to share best practices, tests, and tunning tips on how you design your data storage.
Recently, I built a home NAS/Sever, and to determine whether to use the Local ZFS Pool Backend or create a ZFS pool with datasets and use bind mount, I ran the tests below.
The
Detailed ZFS pool information: RAID 10 consists of 4 HDDs and 1 NVMe SSD (SLOG) created with following command:
The fio test results I ran in LXC container with mounted:
1. a disk created with Local ZFS module,
2. a bind mounted dataset.
EDIT: after @waltar and @LnxBil valuable comments I adjusted those tests and re-ran them again to get more accurate results. Unfortunately, even with parameter
Conclusion:
You might say that I could expect such numbers, as we can find many good resources about tuning ZFS performance all around the internet e.g. [1], [2], [3], [4]
However, I was mainly curious about the raw data and comparison between the Local ZFS module and bind-mounted dataset performance.
So far, I'll stick with:
- For all LXC containers and VMs - disk created with Local ZFS module (with blocksize=128K)
- For file share applications (e.g. Nextcloud) - I'll use bind-mounted dataset with blocksize=64K
- For media app (e.g. Plex) - I'll bind-mounted dataset with blocksize=512K or 1M and compression=zstd-12.
If you've performed any tests or you have any tips please share and comment.
After several searches in the forum, I couldn't find much information regarding ZFS storage and its performance tuning. Thus, I'd like to start this thread to share best practices, tests, and tunning tips on how you design your data storage.
Recently, I built a home NAS/Sever, and to determine whether to use the Local ZFS Pool Backend or create a ZFS pool with datasets and use bind mount, I ran the tests below.
The
fio
tests:
Code:
# SEQ Write with 4 vCPU:
sync; fio --filename=testfile-seq-160g --size=160G --direct=1 --rw=write --bs=1M --ioengine=libaio --numjobs=4 --iodepth=32 --name=seq-write-test --group_reporting --ramp_time=4
# SEQ Read with 4 vCPU:
sync; fio --filename=testfile-seq-160g --direct=1 --rw=read --bs=1M --ioengine=libaio --numjobs=4 --iodepth=32 --name=seq-read-test --group_reporting --readonly --ramp_time=4
# Random write with 4 vCPU:
sync; fio --filename=testfile-rand-4g --size=4G --direct=1 --rw=randwrite --bs=4k --ioengine=libaio --numjobs=4 --iodepth=32 --name=rand-write-test --group_reporting --ramp_time=4 --time_based --runtime=300
# Random read with 4 vCPU:
sync; fio --filename=testfile-rand-4g --direct=1 --rw=randread --bs=4k --ioengine=libaio --numjobs=4 --iodepth=32 --name=rand-read-test --group_reporting --readonly --ramp_time=4
# Mixed Random Read/Write database file with 8 vCPU:
sync; fio --filename=database-testfile --size=4G --direct=1 --rw=randrw --bs=8k --ioengine=libaio --iodepth=32 --numjobs=8 --rwmixread=70 --name=db-mixed-rw-test --group_reporting --ramp_time=4
# Multi-Threaded Application Simulation e.g. a data analytics tool with with 16 vCPU:
sync; fio --filename=testfile-seq-readwrite --size=160G --direct=1 --rw=readwrite --bs=64k --ioengine=libaio --iodepth=16 --numjobs=16 --name=multi-thread-app --group_reporting --ramp_time=4
Detailed ZFS pool information: RAID 10 consists of 4 HDDs and 1 NVMe SSD (SLOG) created with following command:
Code:
# zpool create \
-o ashift=12 \
-O encryption=on -O keylocation=file:///root/zfs-pool.key -O keyformat=raw \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd-7 \
-O normalization=formD \
rpool mirror /dev/sda /dev/sdb mirror /dev/sdc /dev/sdd log /dev/nvme0n1p1
The fio test results I ran in LXC container with mounted:
1. a disk created with Local ZFS module,
2. a bind mounted dataset.
blocksize=128K | SEQ Write | SEQ Read | Random write 4GB | Random read 4GB | Mixed Random Read/Write 4GB database file | Multi-Threaded Application Simulation |
---|---|---|---|---|---|---|
Local ZFS module | 484MB/s | 1581MB/s | 12.4MB/s | 78.9MB/s | 29.8MB/s; 12.8MB/s | 434MB/s; 434MB/s |
Bind mounted dataset | 255MB/s | 1602MB/s | 13.3MB/s | 82.6MB/s | 29.3MB/s; 12.6MB/s | 576MB/s; 576MB/s |
EDIT: after @waltar and @LnxBil valuable comments I adjusted those tests and re-ran them again to get more accurate results. Unfortunately, even with parameter
--size=10*$mb_memory
I was getting a much higher sequential read speed (about 1600 MB/s) for a 160 GB file, despite using only 16 GB of RAM. I guess it could be due to a combination of several ZFS features as advanced caching, prefetching, and striping mechanisms, particularly optimized for sequential workloads.Conclusion:
You might say that I could expect such numbers, as we can find many good resources about tuning ZFS performance all around the internet e.g. [1], [2], [3], [4]
However, I was mainly curious about the raw data and comparison between the Local ZFS module and bind-mounted dataset performance.
So far, I'll stick with:
- For all LXC containers and VMs - disk created with Local ZFS module (with blocksize=128K)
- For file share applications (e.g. Nextcloud) - I'll use bind-mounted dataset with blocksize=64K
- For media app (e.g. Plex) - I'll bind-mounted dataset with blocksize=512K or 1M and compression=zstd-12.
If you've performed any tests or you have any tips please share and comment.
Last edited: