optimal use of 6 x 2TB nvme

brosky

Well-Known Member
Oct 13, 2015
55
4
48
Hi,

I'm scratching my head here with the following storage setup - I have 6 x 2TB nvme and I need to "provide" maximum storage with some redundancy. Let's say that IOPS is not that important and the fact the in theory the nvme's (INTEL SSDPE2KX020T8) are fast enough for my needs (LXC -> postgresql storage)

So I created 3 mirrored vdev's in one pool:

root@pve171:~# zpool status -v pool: storagezfs state: ONLINE config: NAME STATE READ WRITE CKSUM storagezfs ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvme4n1 ONLINE 0 0 0 nvme5n1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 nvme6n1 ONLINE 0 0 0 nvme7n1 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 nvme9n1 ONLINE 0 0 0 nvme10n1 ONLINE 0 0 0

root@pve171:~# zpool list -v NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT storagezfs 5.44T 408K 5.44T - - 0% 0% 1.00x ONLINE - mirror 1.81T 152K 1.81T - - 0% 0.00% - ONLINE nvme4n1 - - - - - - - - ONLINE nvme5n1 - - - - - - - - ONLINE mirror 1.81T 104K 1.81T - - 0% 0.00% - ONLINE nvme6n1 - - - - - - - - ONLINE nvme7n1 - - - - - - - - ONLINE mirror 1.81T 152K 1.81T - - 0% 0.00% - ONLINE nvme9n1 - - - - - - - - ONLINE nvme10n1 - - - - - - - - ONLINE

I've set the ashift=12

The specs of the nvme say that the block size is 512, so ashift 12 is ok
When adding the pool, what block size should be optimal ? 4k (to go with ashift) or 8k (default) ?

Other suggestions on my storage config are welcomed.
 
Posgres writes with a 8K blocksize. With with 6 disks raidz1 you would need to increase the volblocksize from 8K to 32K and with raidz2 to 16K (both in case of ashift=12) because otherwise you would loose 50% of your total storage for raidz1 or 67% for raidz2, because of padding overhead. Writing with a bigger to a smaller blocksize is always bad and the performance would drop to 1/2 or 1/4 and SSD wear will increase by factor 2 or 4 for 8K writes. So raidz1/2/3 isn't an option if you primarily want to run posgres.

Its hard to find information on what volblocksize to use with striped mirrors. Wasn't able to find any information in neither the OpenZFS documentation nor from blog posts of the ZFS developers. The proxmox staff also never answered my questions how to calculate the volblocksize for a striped mirror. But I've heard it should be "4K when ashift=12 * number of mirrors".
So according to that, with a striped mirror (like you set it up) best performance should be 12K but the volblocksize can only be a multiple of 2^X so 8K or 16K would be the next best. According to my own benchmarks I think in general 16K should give better performance but in your case with posgres I would use 8K.

And SSDs are lying about the logical/physical sectorsize. They always report to be using 512B or 4K but in reality they use interally a way bigger blocksize like 8K, 16K or whatever. No manufacturer will give any information on real blocksizes in the datasheets.

So best would be to test different volblocksize-ashift-combinations, run iperf3 benchmarks and then compare the results.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!