Writing with a higher to a lower blocksize is always fine, just not the other way round. You shouldn't loose any performance or capacity when using ashift 12 with a 512B/512B sectors disk. If you read https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz by now you will see that padding overhead will be a result of a bad relation of volblocksize, ashift and your numbers of databearing disks. So its no problem to go from ashift 9 to ashift 12,13 or 14 you just also need to increase your volblocksize by factor 8,16 or 32. And if that is useful or not depends on your workload or how high you can go with your volblocksize.Why would I want to do that at first place? One thing that it is not debatable (at least I am under that impression) is the disk - ashift relationship. 512/512 =ashift 9 (2x2x2x2x2x2x2x2x2=512) and 512/4096 or 4096/4096 (not quite sure if this exists) ashift =12(2x2x2x2x2x2x2x2x2x2x2x2)=4096
If a different number for ashift is being used then padding issues start.
Lets say your workload contains a posgres DB that is reading/writing 8K blocks. For a raidz1/2/3 you would want to use a ashift of 9 because every disk combinations with a ashift of 12 or above (see the spreadsheet shown in the blog post) would result in a volblocksize of atleast 16K if you don't want to loose too much space to padding overhead. On the other hand using a 4 disks striped mirror would be totally fine with a ashift of 12. With a ashift of 13 only a 2 disk mirror would be useful. And a ashift of 11 would be fine for a 8 disk striped mirror.
In general using shift=9 is preferable, as long as all your disks allow that, as it will increase the range you can choose the volblocksize from. But one problem with it is the upgradeability. Ashift can only be set once at creation of the pool. If you choose a shift of 9 there, you will be limited to 512B/512B physical/logical sector HDDs and these get more and more rare until they will somewhere in the future completely disapear from the market. If 512B/4K HDDs are the only thing left to buy (or the atleast the only you can afford) then you would need to destroy and recreate that pool again with a ashift of 12. So many people just directly use a ashift of 12, even when only using 512B/512B disks, so they could easily replace the disks later with anything they got laying around.
Jup 1K would be the minium volblocksize then but that wouldn't make much sense to use. For deduplication you don't want the volblocksize to be too high. For blocklevel compression you don't want the volblocksize to be too low. For workloads with big files you don't want the volblocksize to be too low as your data to metadata ratio will get worse. For workloads with alot of smaller files you want that volblocksize to be lower than most of the small files. So its really hard to choose a good volblocksize because it depends on so many factors and most people don't fully understand their own workload. If you optimize your volblocksize for one thing, you always make it worse for something else. So most of the time its more useful to choose something in the middle as a compromise, especially because PVE only allows you set the vollbocksize globally for the ZFS storage for all virtual disks (see my feature request). I would go with a 4K volblocksize with a 4 disk striped mirror using ashift 9 as this is above the minimum useful volblocksize for your pool layout, most filesystems are based on 4K blocks os its nice to match that.or you could answer yes to my example which relies on my actual configuration. 4 disks in raid10 so that means two mirrors. Ashift is 9 (aka 512b each disk) 512bx2=1024k. Right? Never seen anyone using this . Ok this is the minimal but is it the optimal?
In my case for instance all VMs have underling filesystem of 4k (commpression on). What are the math afterwards to calculate the theoretical (at least) value for zvol block size those VM will be installed on? I know it has something to do with writes and reads of 4k files when block size is 1k. For instance with some calculations you see the multiplication / duplication /quad-plication (not even a word) happening during writing and reading and make an assumption if that number (1k here would be good or bad and increase it). So with all the above happening should I use
1k 2k 4k 8k which is the default 16k?
That article is written by the creator of ZFS himself and he goes deep into the details on how ZFS works on block level, explaining why there is padding overhead by using examples. You can even get the formula to calculate the optimum volblocksizes for each raidz1/2/3 setup if you look at the spreadsheet linked in that article.PS Thank you for the tip for SSDs but I am aware of that.
Also about the link explaining the different Raidz levels in comparison with disks bein used , compression on/off, vol blockl sizes, apart from the
bolt letters I can t get it and to tell you the truth I don t right now, For you it is perfect, for me it is complicated (probably extra missing
knowledge from my part)
Last edited: