Hi,
I want to rebuild my pool and I will most likely use 6x Intel S3710 200GB as a striped mirror for the new VM pool.
Right now I'm using 4x S3710 200GB + 1x S3700 200GB as a raidz and got a write amplification from guest to NAND of around factor 20 and I really would like to lower that.
There are several parameters that might influence performance and write amplification that I might think of:
Right know I created two identical Debian 10 VMs with only fio and qemu-guest-agent installed. One VM is using "
For the benchmark my idea was to collect SMART attributes on the host (my SSDs are monitoring real NAND writes in 32MiB units every second), run some fio tests inside the VM that write a fixed amount of data and after all tests has been done I would collect SMART attributes again so I can see how much actually was written to the NAND to calculate the total write amplification.
Fio tests that sounds useful would be:
And what ZFS/virtio settings should I test with a 6x SSD striped mirror that would sound promising?
Anything I didn't think of that might make the benchmarks uncomparable?
I would think this sound good:
1.) 4K volblocksize + ashift=12
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=4K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=1 -E stripe-width=3
2.) 16K volblocksize + ashift=12 + aes-256-ccm
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
3.) 16K volblocksize + ashift=12 + aes-256-gcm
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-gcm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
4.) 16K volblocksize + ashift=12 + aes-128-ccm
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-128-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
5.) 16K volblocksize + ashift=13
ashift=13
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
Someone knows how to calculate the stride and stripe-width for VMs ontop of ZFS? The manuals always refer to physical disks on a real SW/HW raid with a defined stripe size that the OS has direct access to. Here I only use ZFS what is not a raid with a fixed stripe size and there is also the virtio in between the hosts ZFS and the guests OS.
Anyone know if it is possible to calculate what stride ans stripe-width to use? I've got a write amplification from guest to host of factor 7 and that is quite high. So I hoped I might optimize how the guests ext4 is writing data to the virtio so virtio isn'T amplifying that much because of all the mixed blocksizes in that chain.
I want to rebuild my pool and I will most likely use 6x Intel S3710 200GB as a striped mirror for the new VM pool.
Right now I'm using 4x S3710 200GB + 1x S3700 200GB as a raidz and got a write amplification from guest to NAND of around factor 20 and I really would like to lower that.
There are several parameters that might influence performance and write amplification that I might think of:
- ashift of the pool
- atime of the pool on/off
- zfs_txg_timeout of the pool
- volbocksize of the vzol
- thin vs non thin
- with and without SLOG
- with and without L2ARC
- ZFS native encryption on/off
- encryption algorithm
- ZFS compression on/off
- compression algorithm
- discard fstab option vs fstrim -a as cron
- virtio SCSI vs virtio block
- virtio SCSI blocksize of 512B vs 4K
- cache mode of virtio
- ssd emulation on/off
- io thread on/off
- blocksize of the guest OSs filesystem (4K for ext4 for example)
- stride and stripe-width for ext4 inside guest
- sync vs async writes
- random 4k vs sequential 1M IO
- ...
Right know I created two identical Debian 10 VMs with only fio and qemu-guest-agent installed. One VM is using "
args: -global scsi-hd.physical_block_size=4k
" so virtio uses 4K block size and the other one is using the default 512B blocksize. Now I wanted to create different partitions and format them with ext4 but with different values for stride and stripe-width. Then I would backup both VMs so I could import them later after destroying and recreating pools with different ZFS/virtio configs.For the benchmark my idea was to collect SMART attributes on the host (my SSDs are monitoring real NAND writes in 32MiB units every second), run some fio tests inside the VM that write a fixed amount of data and after all tests has been done I would collect SMART attributes again so I can see how much actually was written to the NAND to calculate the total write amplification.
Fio tests that sounds useful would be:
- 4K random sync writes
- 4K random async writes
- 1M sequential sync writes
- 1M sequential async writes
- 4K async random reads
- 1M async sequential reads
And what ZFS/virtio settings should I test with a 6x SSD striped mirror that would sound promising?
Anything I didn't think of that might make the benchmarks uncomparable?
I would think this sound good:
1.) 4K volblocksize + ashift=12
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=4K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=1 -E stripe-width=3
2.) 16K volblocksize + ashift=12 + aes-256-ccm
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
3.) 16K volblocksize + ashift=12 + aes-256-gcm
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-gcm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
4.) 16K volblocksize + ashift=12 + aes-128-ccm
ashift=12
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-128-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
5.) 16K volblocksize + ashift=13
ashift=13
atime=off
zfs_txg_timeout=default
volblocksize=16K
thin
without SLOG
without L2ARC
native encryption=on
encryption algorithm=aes-256-ccm
compression=lz4
discard using fstab
virtio SCSI
virtio SCSI blocksize=512B + 4K
cachemode=none
ssd emulation=on
io thread=on
ext4 blocksize=4K
stride and stripe-width: -b 4096 -E stride=4 -E stripe-width=12
Someone knows how to calculate the stride and stripe-width for VMs ontop of ZFS? The manuals always refer to physical disks on a real SW/HW raid with a defined stripe size that the OS has direct access to. Here I only use ZFS what is not a raid with a fixed stripe size and there is also the virtio in between the hosts ZFS and the guests OS.
Anyone know if it is possible to calculate what stride ans stripe-width to use? I've got a write amplification from guest to host of factor 7 and that is quite high. So I hoped I might optimize how the guests ext4 is writing data to the virtio so virtio isn'T amplifying that much because of all the mixed blocksizes in that chain.
Last edited: