Volblocksize for 6/8 drives striped mirror?

Dunuin

Distinguished Member
Jun 30, 2020
14,794
4,631
258
Germany
Hi,

Right now I am using 2x Intel S3700 100GB as mirror for system+swap and 4x S3710 200GB + 1x S3700 200GB as raidz1 with 32K volblocksize as VM storage. My VM storage is using around 200-400GB of the 600GiB capacity (I set the quota to 80% so the pool can't get full and slow).

I would like to switch to a striped mirror to get more performance and a lower volblocksize so I bought another 2x S3710 200GB + 1x S3700 200GB.

The question is now how to setup the storage.
I woukd like to keep the 2x S3700 100GB as mirror for the OS.

But what to do with the other SSDs?

Option A): Two striped mirrors of 4 SSDs each.

Option B): One striped mirrors of 6x S3710 200GB and one mirror of 2x S3700 200GB.

Option C:) One big striped mirror of 6x S3710 200GB + 2x S3700 200GB.

But I've got some questions...
1.) Should option C be faster than option B? Data would be striped across 4 and not only 3 drives so that should be faster in theory but the S3700 are a little bit slower than the S3710 so the pool might need to wait for the S3700 to finish the write.

2.) Best volblocksize for a striped mirror of 4 drives is 8K for ashift of 12. But what about 6 or 8 drives? Should that be 12K for 6 and 16K for 8 drives?

3.): Could a striped mirror of only 6 drives be a performance problem if the volblocksize is 12K because 12K is not part if 2^x?

4.): Which setup would help the most with the write amplification? Right now I've got a write amplification of around 20x (7x from VM to host and 3x from host to SSDs NAND) and would be great if I could lower that.

5.) Is it possible to switch the virtio SCSI blocksize from 512B to 4K? Using KVM on TrueNAS allows this. I wasn't able to find a way (GUI and CLI) how to tell Proxmox to use 4K. Right now Virtio SCSI is writing with 512B blocks to a 32K zvol and I would like to test if writing 4K blocks to the 32K zvol would result in a lower write amplification and less IOPS.

How would you setup these 4 drives?
 
Last edited:
Very good questions which interest me as well. Maybe this can help with number 5: I add args: -global scsi-hd.physical_block_size=4k to my VMs (as shown in #3282) to make the virtual SCSI drives be detected as Sector size (logical/physical): 512/4096 bytes. I think it reduces the amplification and the required number of I/O operations per block.
 
Very good questions which interest me as well. Maybe this can help with number 5: I add args: -global scsi-hd.physical_block_size=4k to my VMs (as shown in #3282) to make the virtual SCSI drives be detected as Sector size (logical/physical): 512/4096 bytes. I think it reduces the amplification and the required number of I/O operations per block.
If it is only possible to set that globally for all VMs thats a problem if running 512B formated zvols with 4K causes troubles.
 
If it is only possible to set that globally for all VMs thats a problem if running 512B formated zvols with 4K causes troubles.
The parameter is called global, but it is global for all SCSI drives (not IDE nor SATA nor VirtIO) for that particular VM. It is not for all VMs, but indeed if you have your virtual drives over multiple pools, it would be a problem. It is the best I found so far, maybe there are better solutions.
 
The parameter is called global, but it is global for all SCSI drives (not IDE nor SATA nor VirtIO) for that particular VM.
If it only affects all Virtio SCSI drives of that particular VM that would be fine. My idea was to keep the existing VMs at 512B and create the new VMs with 4K.

It is not for all VMs, but indeed if you have your virtual drives over multiple pools, it would be a problem. It is the best I found so far, maybe there are better solutions.
Why is it a problem with different pools? Most of my VMs got a swap formated zvol on another pool.
 
Why is it a problem with different pools? Most of my VMs got a swap formated zvol on another pool.
It makes all virtual SCSI drives attached to the VirtIO SCSI controller of that VM report a logical size of 4k. This is fine if all virtual SCSI drives are on zpools with ashift=12 or higher. It should also work fine for other zpools but it might not be optimal and increase write amplification for ashift<12 (if you gave virtual SCSI drives with ashift<12 on that VM). However this should not be a problem for swap as I would expect swap drives to use ashift=12 because memory pages are 4k on Linux.
 
Last edited:
It makes all virtual SCSI drives attached to the VirtIO SCSI controller of that VM report a logical size of 4k. This is fine if all virtual SCSI drives are on zpools with ashift=12 or higher. It should also work fine for other zpools but it might not be optimal and increase write amplification for ashift<12 (if you gave virtual SCSI drives with ashift<12 on that VM). However this should not be a problem for swap as I would expect swap drives to use ashift=12 because memory pages are 4k on Linux.
Ah ok. That would be fine. All my pools are ashift 12.
 
2.) Best volblocksize for a striped mirror of 4 drives is 8K for ashift of 12. But what about 6 or 8 drives? Should that be 12K for 6 and 16K for 8 drives?

3.): Could a striped mirror of only 6 drives be a performance problem if the volblocksize is 12K because 12K is not part if 2^x?
Still need to find an answer. New drives should arrive soon.

Edit:
And I got another question about the KVM cache modes. Right now I am using "cache=none". Which of the KVM cache modes will allow the SSDs to use the SSDs internal write cache? They all got powerloss protection and therefore should be able to cache async and sync writes but the drives got an internal write amplification of around factor 3 to 3.5 (its written around 300GB to the ssds per day but atleast 900GB per day to the NAND) so I was wondering if they aren't using the internal cache for write amplification.

Edit:
And another question. Lets say I got a ZFS pool of 8 drives as striped mirror (raid10) using ashift=12 and volblocksize=16K. Now I create a zvol using virtio SCSI with virtio blocksize set to 4K. On this zvol I'm installing Linux using ext4. Ext4 got the stride und stripe-size parameters that can be set while creating the filesystem. Can I optimize the write amplification by changing these parameters or is that not working because there is the virtio SCSI and too much abstraction in between? If I would setup a mdadm raid10 inside the VM I would use a blocksize of 4K, chunk size of 16K, stride of 4 (4*4K=16K) and stripe-size of 16 (16*4K=64K). Is there anything I can do inside the guest to optimize the writes? Virtio SCSI writing 4K blocks to a 16K zvol is createing alot of overhead but I can't increase the blocksize of the guests filesystem to 16K because linux is working with 4K page files.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!