SSD emulation on ZFS

stevensedory

Well-Known Member
Oct 26, 2019
44
3
48
39
Hello,

I am wondering if since we're using local ZFS for our VM storage, should we set all Hard Disks to "SSD emulation"?

I know on Windows, this stops things like defrag from running, which kills CoW storage like ZFS. Does it do something similar on linux guests?

Are there any other ZFS best practices we should be following?
 
Does it do something similar on linux guests?
Linux no longer uses the disc as an entropy generator.

For Windows, you should use 4k zvol blocksize instead of the default 8k.
The cache level of the vdisk should be "none".
 
  • Like
Reactions: janssensm
Linux no longer uses the disc as an entropy generator.

For Windows, you should use 4k zvol blocksize instead of the default 8k.
The cache level of the vdisk should be "none".

Thanks for the tip, I just transferred a windows vm to a 4k zvol, much difference to 8k.
You just mentioned windows, would a linux vm or containter be positively afffected by a 4k zvol too or would it run just fine with the default of 8k?
 
You just mentioned windows, would a linux vm or containter be positively afffected by a 4k zvol too or would it run just fine with the default of 8k?

Containers do not use an additional filesystem on ZFS, they just run on ZFS and use the variable recordsize. There is blocksize.
For VMs, it'll work with any blocksize, but it can lead to read and write amplification if the blocksize does not match. The default blocksize in most cases for e.g. ext4 is also 4K, so it'll be faster mit 4K block sizes.
 
  • Like
Reactions: janssensm
Thanks for explaining, but that makes me wonder what workload benefits of the default 8K block size. Would that be containers?
Because you just implied that the average vm will benefit the 4k size, or am I mistaken?
 
Hi,

4k, 8k.... it depends a lot by your load. And by your zfs pool (mirror, raidz, ) and by the underlying block size of the hdds. As a general rule using bigger block size will help a lot if you mostly operate with large files. Also small blocks will need a lot of Metadata for files that are bigger.
But by default zfs have a limit for arc cache where can be stored the metadata. If you have many metadata, and this will not be in ARC, then you will need to read from disk.

Yes many FS are using 512/4k by default. But you can create this FS before instalation of the os with the desired block size (16-32k as a example). Then you install the os without re-format the os disk.

Some applications like data base will use more then 4k block(16 k for mysql, 64 for mssql). Now think that you have a 4k native hdd with a raidz1(3 hdd). Your application/os need to write 4k zvolblock. zfs will need to write this block on 2 disk (data) + parity (the 3rd disk). 4k/2 data disk = 2k. But your 4k hdd can write at minimum only 4 k.

In the end, this are only some things that you must think before choosing a decision.

Good luck / Bafta.
 
  • Like
Reactions: janssensm
Would that be containers?

As I tried to explain, no. They are stored on zfs datasets/filesystem and have a variable recordsize up to 128k.

4k, 8k.... it depends a lot by your load.

The only thing I would add to this very well written summary is compression. It'll make things even more complicated with respect to understanding and optimisation. If you really want to optimize for disk usage, I can only recommend forcing the ashift to 9 so that you will end up with 512 bytes blocks. You will have a much higher compression rate, because a single 4K block on your VM that has changed cannot be compressed effectively with 4K ashift (12), because you still need to store a 4K block. With 512 byte blocks, you can have 3, 2 or even 1 block used and you will save the other ones.

If you have many metadata, and this will not be in ARC, then you will need to read from disk.

I'm really looking forward to the new allocation classes that let you store them on (mirrored) SSDs besides the spinners.
 
It'll make things even more complicated with respect to understanding and optimisation. If you really want to optimize for disk usage, I can only recommend forcing the ashift to 9 so tha

Yes. It is a hard decision to make when you start with a new pool. We know what data is at present, but in future your data could be changed (compressible data / uncompressed data). In some cases you can have luck ;).

Thx @LnxBil for your kindly words.

Good luck / Bafta !
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!