While I have always known about ZFS write amplification I have never really tested or investigated the exact amount that was occurring on our Proxmox server builds.
I recently wrote a post investigating performance of the various local storage options (https://forum.proxmox.com/threads/quick-and-dirty-io-performance-testing.82846/). While I am still working on updating that post with ZFS performance metrics, I decided at the same time to measure the write amplification for fio commands sent to the ZFS pool.
The System specs are below (if you read the post I made earlier I explain that this is a test system with parts that I had lying around)
Supermicro X11SLH-F
Xeon E3-1246 v3
32GB DDR3 ECC
Onboard C226 6 x 6 Gbps SATA3
1 x 300GB Intel 320 SSD (Proxmox Installed to this drive, 31GB ext4 root partition, 8GB SWAP, remainder is default lvm-thin)
5 x 120GB Intel 320 SSD (Used for ZFS and other storage options for testing)
The Intel 320SSDs come in around 2100 fsync/s as measured by pveperf.
All tests were run with ashift=13 as a too high ashift should have negligible effect on write amplification (a too low value will absolutely cause tremendous undesired write amplification).
I experimented with various record sizes, and the measurement of actual data written were taken from the Smart data of the physical SSD drives. (LBAs written before compared to after the individual tests).
I used FIO to write to the device directly from within a VM, so I am removing the VM OS partition and format for the time being and these are theoretical test using FIO no real world operating measurements (I hope and intend to do tests with real world operations in the future).
The first test I ran was against a one drive ZFS and a ZFS mirror using the default 8k blocksize (ZFS recordsize parameter)
As expected the write amplification values were identical for single drive and mirror.
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~5X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=1M --numjobs=1 --iodepth=1 --size=1G --name seq_read --filename=/dev/sdX (Write Amp ~2X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~6.15)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=8 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~4.38X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.8X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=8k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.15X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=512 --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~14.8X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=16k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.13X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=32k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.06X)
Then the above tests were rerun with sync=disabled on the pool (from Proxmox, zfs set sync=disabled 'poolname')
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=1M --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name seq_read --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.55)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=8 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.44X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.44X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=8k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=512 --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~5.44X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=16k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.31X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=32k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1X)
The tests were repeated a third time with sync=standard and using the default FIO Buffer=true (sync=false implied)
fio --ioengine=libaio --rw=write --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=write --bs=1M --numjobs=1 --iodepth=1 --size=1G --buffered=1 --name seq_read --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=8 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=8k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=512 --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~4X)
fio --ioengine=libaio --rw=randwrite --bs=16k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=32k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
I will be continuing these tests with two additional goals, ZFS Zraid1 and Zraid2 measurements, and real world usage write amplification values.
From what I have read the best (lowest) write amplification we can expect from a ZFS volume when preforming a sync write would be ~2X, does anyone have any insight to if I am understanding what I have read about ZFS and write amplification with a sync write?
I recently wrote a post investigating performance of the various local storage options (https://forum.proxmox.com/threads/quick-and-dirty-io-performance-testing.82846/). While I am still working on updating that post with ZFS performance metrics, I decided at the same time to measure the write amplification for fio commands sent to the ZFS pool.
The System specs are below (if you read the post I made earlier I explain that this is a test system with parts that I had lying around)
Supermicro X11SLH-F
Xeon E3-1246 v3
32GB DDR3 ECC
Onboard C226 6 x 6 Gbps SATA3
1 x 300GB Intel 320 SSD (Proxmox Installed to this drive, 31GB ext4 root partition, 8GB SWAP, remainder is default lvm-thin)
5 x 120GB Intel 320 SSD (Used for ZFS and other storage options for testing)
The Intel 320SSDs come in around 2100 fsync/s as measured by pveperf.
All tests were run with ashift=13 as a too high ashift should have negligible effect on write amplification (a too low value will absolutely cause tremendous undesired write amplification).
I experimented with various record sizes, and the measurement of actual data written were taken from the Smart data of the physical SSD drives. (LBAs written before compared to after the individual tests).
I used FIO to write to the device directly from within a VM, so I am removing the VM OS partition and format for the time being and these are theoretical test using FIO no real world operating measurements (I hope and intend to do tests with real world operations in the future).
The first test I ran was against a one drive ZFS and a ZFS mirror using the default 8k blocksize (ZFS recordsize parameter)
As expected the write amplification values were identical for single drive and mirror.
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~5X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=1M --numjobs=1 --iodepth=1 --size=1G --name seq_read --filename=/dev/sdX (Write Amp ~2X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~6.15)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=8 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~4.38X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.8X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=8k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.15X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=512 --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~14.8X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=16k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.13X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=32k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~2.06X)
Then the above tests were rerun with sync=disabled on the pool (from Proxmox, zfs set sync=disabled 'poolname')
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=1M --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name seq_read --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.55)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=8 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.44X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.44X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=8k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=512 --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~5.44X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=16k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1.31X)
fio --ioengine=libaio --direct=1 --sync=1 --rw=randwrite --bs=32k --numjobs=1 --iodepth=64 --size=1G --buffered=0 --name XXX --filename=/dev/sdX (Write Amp ~1X)
The tests were repeated a third time with sync=standard and using the default FIO Buffer=true (sync=false implied)
fio --ioengine=libaio --rw=write --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=write --bs=1M --numjobs=1 --iodepth=1 --size=1G --buffered=1 --name seq_read --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=1 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=8 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=4k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=8k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=512 --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~4X)
fio --ioengine=libaio --rw=randwrite --bs=16k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
fio --ioengine=libaio --rw=randwrite --bs=32k --numjobs=1 --iodepth=64 --size=1G --buffered=1 --name XXX --filename=/dev/sdX (Write Amp ~1X)
I will be continuing these tests with two additional goals, ZFS Zraid1 and Zraid2 measurements, and real world usage write amplification values.
From what I have read the best (lowest) write amplification we can expect from a ZFS volume when preforming a sync write would be ~2X, does anyone have any insight to if I am understanding what I have read about ZFS and write amplification with a sync write?