SSD emulation on ZFS

stevensedory · Nov 13, 2019

Hello,

I am wondering if since we're using local ZFS for our VM storage, should we set all Hard Disks to "SSD emulation"?

I know on Windows, this stops things like defrag from running, which kills CoW storage like ZFS. Does it do something similar on linux guests?

Are there any other ZFS best practices we should be following?

mac.linux.free · Nov 13, 2019

good question

stevensedory · Nov 14, 2019

mac.linux.free said:
good question

Haha, thanks. Hopefully I get some helpful info.

wolfgang · Nov 14, 2019

stevensedory said:
Does it do something similar on linux guests?

Linux no longer uses the disc as an entropy generator.

For Windows, you should use 4k zvol blocksize instead of the default 8k.
The cache level of the vdisk should be "none".

janssensm · Nov 16, 2019

wolfgang said:
Linux no longer uses the disc as an entropy generator.

For Windows, you should use 4k zvol blocksize instead of the default 8k.
The cache level of the vdisk should be "none".

Thanks for the tip, I just transferred a windows vm to a 4k zvol, much difference to 8k.
You just mentioned windows, would a linux vm or containter be positively afffected by a 4k zvol too or would it run just fine with the default of 8k?

LnxBil · Nov 16, 2019

janssensm said:
You just mentioned windows, would a linux vm or containter be positively afffected by a 4k zvol too or would it run just fine with the default of 8k?

Containers do not use an additional filesystem on ZFS, they just run on ZFS and use the variable recordsize. There is blocksize.
For VMs, it'll work with any blocksize, but it can lead to read and write amplification if the blocksize does not match. The default blocksize in most cases for e.g. ext4 is also 4K, so it'll be faster mit 4K block sizes.

janssensm · Nov 16, 2019

Thanks for explaining, but that makes me wonder what workload benefits of the default 8K block size. Would that be containers?
Because you just implied that the average vm will benefit the 4k size, or am I mistaken?

guletz · Nov 16, 2019

Hi,

4k, 8k.... it depends a lot by your load. And by your zfs pool (mirror, raidz, ) and by the underlying block size of the hdds. As a general rule using bigger block size will help a lot if you mostly operate with large files. Also small blocks will need a lot of Metadata for files that are bigger.
But by default zfs have a limit for arc cache where can be stored the metadata. If you have many metadata, and this will not be in ARC, then you will need to read from disk.

Yes many FS are using 512/4k by default. But you can create this FS before instalation of the os with the desired block size (16-32k as a example). Then you install the os without re-format the os disk.

Some applications like data base will use more then 4k block(16 k for mysql, 64 for mssql). Now think that you have a 4k native hdd with a raidz1(3 hdd). Your application/os need to write 4k zvolblock. zfs will need to write this block on 2 disk (data) + parity (the 3rd disk). 4k/2 data disk = 2k. But your 4k hdd can write at minimum only 4 k.

In the end, this are only some things that you must think before choosing a decision.

Good luck / Bafta.

LnxBil · Nov 17, 2019

janssensm said:
Would that be containers?

As I tried to explain, no. They are stored on zfs datasets/filesystem and have a variable recordsize up to 128k.

guletz said:
4k, 8k.... it depends a lot by your load.

The only thing I would add to this very well written summary is compression. It'll make things even more complicated with respect to understanding and optimisation. If you really want to optimize for disk usage, I can only recommend forcing the ashift to 9 so that you will end up with 512 bytes blocks. You will have a much higher compression rate, because a single 4K block on your VM that has changed cannot be compressed effectively with 4K ashift (12), because you still need to store a 4K block. With 512 byte blocks, you can have 3, 2 or even 1 block used and you will save the other ones.

guletz said:
If you have many metadata, and this will not be in ARC, then you will need to read from disk.

I'm really looking forward to the new allocation classes that let you store them on (mirrored) SSDs besides the spinners.

guletz · Nov 17, 2019

LnxBil said:
It'll make things even more complicated with respect to understanding and optimisation. If you really want to optimize for disk usage, I can only recommend forcing the ashift to 9 so tha

Yes. It is a hard decision to make when you start with a new pool. We know what data is at present, but in future your data could be changed (compressible data / uncompressed data). In some cases you can have luck

.

Thx @LnxBil for your kindly words.

Good luck / Bafta !

Search

Search

SSD emulation on ZFS

stevensedory

Well-Known Member

mac.linux.free

Renowned Member

stevensedory

Well-Known Member

wolfgang

Proxmox Retired Staff

janssensm

Famous Member

LnxBil

Distinguished Member

janssensm

Famous Member

guletz

Distinguished Member

LnxBil

Distinguished Member

guletz

Distinguished Member

We value your privacy