Hi,If you set logbias=throughput the ZIL won't store the complete sync write but only its metadata
Ok, so I will do a 16K sync write test comparing 128bit AES vs 256bit AES. If it is padding to round up data to the key size, then 256bit AES should double the write amplification compared to 128bit.
Just for encryption or in general?
One interesting thing I saw in my speadsheet:Both cases if it is possible and you have time. I see cases when ashift=13 can be better on some particular ssds models.
Thx. a lot !
Write amplification guest -> host | Write amplification host -> SSDs NAND | |
sync 4K write | 9,48x | 1,25x |
async 4K write | 4,88x | 2,36x |
I tested it here:Ok, so I will do a 16K sync write test comparing 128bit AES vs 256bit AES. If it is padding to round up data to the key size, then 256bit AES should double the write amplification compared to 128bit.
16k sync writes/reads that are 50% compressible read/written to a xfs partition on a zvol (volblocksize=8K) on a 4 disk striped mirror (ashift=12):
aes-256-gcm + lz4 aes-128-gcm + lz4 aes-256-gcm + no compression no encryption + lz4 Write Performance: 8 MiB/s 8,09 MiB/s 7,78 MiB/s 10,1 MiB/s Read Performance: 29,2 MiB/s 31,9 MiB/s 29,8 MiB/s 38,8 MiB/s W.A. fio -> guest: 1,48 x 1,48 x 1,48 x 1,48 x W.A. guest -> host: 7,23 x 7,21 x 8,1 x 3,67 x W.A. host -> NAND: 1,13 x 1,15 x 1,15 x 1,12 x W.A. total: 12,09 x 12,25 x 13,78 x 6,09 x R.A. total: 0,5 x 0,5 x 1,0 x 0,5 x
I used ext4 with "stripe-width" and xfs with "ws" to match the stripe-width of the guest fs to the blocksize of the zvol. That showed no difference.you said big volblock gives you write amplification, didnt you also increase block/cluster size in guest to match?
I will test that.I would also consider doing tests with raw file on top of datasets, you get no proxmox snapshots in UI but for now you just testing performance and amplification.
Its not that easy. Atleast with linux it looks like I'm forced to use a 4K block size. I tried to increase ext4 blocksize above 4K but it told me thats not possible because the FS block size can'T be greater than the page file size of the RAM and these are 4K. So I would need to switch to hugh pages and not sure how to do that or KVM or my physical hardware is able to do that at all.Personally I am done with 4k blocks on virtualization on any new guest OS I install, I think its just too inefficient.
# tune2fs -l /dev/sdb1
tune2fs 1.44.5 (15-Dec-2018)
Filesystem volume name: <none>
Last mounted on: /home2
Filesystem UUID: 6fada182-ef45-416d-a5a0-7f85352561c2
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index sparse_super2 filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize bigalloc metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 128000
Block count: 131071728
Reserved block count: 0
Free blocks: 100263536
Free inodes: 125964
First block: 0
Block size: 4096
Cluster size: 65536
Thanks, I will try that. Didn't see that cluster option in the ext4 manual.With ext4 use -C and also enable bigalloc feature feature. Snip below from one such partition, this was originally a 4k block partition and when I moved it to 64k clusters (alongside changing volblocksize to 64k) the performance improvement was astounding.
Code:# tune2fs -l /dev/sdb1 tune2fs 1.44.5 (15-Dec-2018) Filesystem volume name: <none> Last mounted on: /home2 Filesystem UUID: 6fada182-ef45-416d-a5a0-7f85352561c2 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index sparse_super2 filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize bigalloc metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 128000 Block count: 131071728 Reserved block count: 0 Free blocks: 100263536 Free inodes: 125964 First block: 0 Block size: 4096 Cluster size: 65536
Thanks. I created a ext4 with "mkfs.ext4 -b 4096 -O extent -O bigalloc -O has_journal -C 32k" ontop of a 32K volblocksize zvol and ran two fio tests (32k random read/write. One sync one async) and there was only a minimal write amplification change compared to a default ext4 on a 8k volblocksize zvol. Need to do some more tests but looks like clustering will only help with large files.Also look at the largefile stuff, sparse_super2, flex_bg, these reduce inodes and keep them not spread out so sequential is much more likely, I know these things most likely impact spindles, but I would expect in regards to write amplification, fragmentation would increase it, so still might be useful to you.
smartctl -a /dev/nvme0n1
...
Namespace 1 Formatted LBA Size: 512
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
apt install nvme-cli
nvme format --lbaf=1 /dev/nvme0n1
smartctl -a /dev/nvme0n1
...
Namespace 1 Formatted LBA Size: 4096