Choosing ZFS volblocksize for a container's storage: Same logic as for VMs?

Sep 1, 2022
239
46
33
40
Hello,

The topic is pretty much my entire question:

I've started to get my head around the logic for choosing a volblocksize for a VM's zVol where the virtual disk lives, and have started using datasets to segment out my VM storage so I can match the appropriate volblocksize to the needs of the VM.

What's the logic for choosing the volblocksize for, say, a Linux container for mixed use self-hosted stuff? The same as for a VM's storage? Or some other metric?

Thanks!
 
Proxmox containers on ZFS uses a filesystem not a volume. Recordsize (128K by default) sort of corresponds with volblocksize (8K by default) but unless you have mostly uncompressible large files (video or Proxmox Backup Server) there is usually no need to tune it.
 
Jup. Datasets use recordsize. Zvols use volblocksize. Volblocksize is a fixed value where no matter that you write to that zvol it will be in blocks matching that volblocksize. The recordsize is dynamic and you only define the upper limit. When using the default 128K recordsize, writing a 10kb file won't create a 128K but a 16K record. A 20KB file a 32K record. A 4K file a 4K record and so on. In case you got a file that is bigger than your recordsize it will create multiple records of size of your recordsize. So a 300KB file would create 3x 128K records. So recordsize is much more forgiving than volblocksize and you won't completely screw things up when not using the perfect recordsize.
 
Oh--and I mean this not at all sarcastically--how wonderful. :)

Seriously, understanding the volblocksize for ZVOLs, and how to pick the right size, and how to configure it in PVE, was confusing enough to take me a few hours to do right the first time. Even then, I'd have been completely lost if I hadn't managed to google up just the right thread here, almost by accident.

Records are much easier to understand; their function tracks my understanding of traditional HDD sectors. I'm old enough that this is helpful. :cool:

This sounds much simpler, indeed. Pretty much just set it and go, right? :)
 
You can change the recordsize to optimize it for a specific workload. For example a 16k recordsize should be great for a dataset storing a mysql DB that only writes 16k blocks. Especially when using deduplication where dedupcication with 16k records should be more efficient than way bigger records that don't match the native MySQL blocksize.
And a recordsize of 1M might be good for workloads that only store big files. So for example good for a PBS datastore that usually stores chunk files of 1-4MB.

But in general performance should be fine enough without changing the default recordsize as ZFS will choose a dynamic recordsize as needed.

For raidz1/2/3 and volblocksize I can recommend this blog post: https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz
It explains in detail how stuff works on the lowest level and explains what volblocksize to use to not waste space because of padding overhead.
 
Last edited:
You can change the recordsize to optimize it for a specific workload. For example a 16k recordsize should be great for a dataset storing a mysql DB that only writes 16k blocks. Especially when using deduplication where dedupcication with 16k records should be more efficient than way bigger records that don't match the native MySQL blocksize.
And a recordsize of 1M might be good for workloads that only store big files. So for example good for a PBS datastore that usually stores chunk files of 1-4MB.

But in general performance should be fine enough without changing the default recordsize as ZFS will choose a dynamic recordsize as needed.

For raidz1/2/3 and volblocksize I can recommend this blog post: https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz
It explains in detail how stuff works on the lowest level and explains what volblocksize to use to not waste space because of padding overhead.
Hello again.

I had not originally planned to do it this way, but I find myself bringing up a MariaDB instance in a container. I want to store the DB itself in an appropriate filesystem for best performance on what is already kind of a potato node.

Based on our prior conversation, I think what I want to do is create a dataset with recordsize 16k. Is that correct?
 
I'm back. :P

I had to restore my MariaDB CT from backup on Sunday. I have a filesystem/PVESM-managed storage area called ctStore where my CTs live; the root disk of the MariaDB CT lives there.

I have a separate filesystem set up for my database storage, with a 16k blocksize, which I imported via pvesm. Proxmox treats it like a pool, and my container lists it as a "mount point" where a "subvol" lives.

1699980475217.png

When I restored the CT, before starting the CT for the first time, I noticed that both the root disk and my DB filesystem were restored to ctStore. From the GUI, I used Volume Action to move the dbStore back to the database storage, and then restarted the container.

Was this the correct procedure? Everything seems to work, but I want to make sure I'm still taking advantage of 16k block size for my DB storage. MariaDB apparently tanks performance if you fail to do that properly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!