Mysql tuning. Recordsize / Volblocksize, VM or LXC?

sukerman

Active Member
Aug 29, 2019
55
4
28
51
Hi All,

I'm trying to run MariaDB with the correct blocksizes to get best performance, either in a VM or LXC. With an LXC, I can set the recordsize on particular datasets on the host so that the container uses them. I guess this is one layer less of abstraction for the filesystem compared to a VM so also maybe a good idea.

The problem is backup, we have 2Tb of data and LXC backup is slow compared to VM backup with the dirty bitmap that only backs up the changes.

The pool record size is 128K. Do I need to create a pool with the correct recordsize (16K) then create a VM inside?

Using a VM in the normal pool, the Zvol has volblocksize, but I don't know if this is what I should set, and in fact if I try I get:

Code:
cannot set property for 'pool-hdd/vm-1001-disk-0': 'volblocksize' is readonly

I really want to use VM, then I can have it backup to PBS hourly without much overhead, backing up LXC is too slow to do regularly with this much data.

Any ideas?

Thanks,

Jack
 
VMs are using zvol and LXCs are using datasets.
Zvols ignore the recordsize and use always the volblocksize.
Datasets will ignore the volblocksize and use always the recordsize.
Recordsize can be seen as an "up to" value. If you set a recordsize of 128K, ZFS can decide if it wants to write a records as 4K, 8k, 16K, 32K, 64K or 128K.
Volblocksize is fixed. No matter what your workload might look like, if you use a 8K volblocksize everything will be written as 8K blocks, no matter if you do a 4K write (which will waste 4K because it can't write something smaller than 8K) or a 128K write (which will waste IOPS because this 128K needs to be written as 16x 8K blocks).

And yes, the volbocksize is readonly and can only be set at time of creation of that zvol. To change it you need to set the volblocksize for your complete pool (Datecenter -> Storage -> YourZfsPool -> Edit -> Block Size) and then destroy and recreate all your virtual disks (doing a backup and restoring it will also destroy and recreate the zvols).

MariaDB is using 16K blocks so you possibly want a 16K volblocksize for that. Also keep in mind that what volblocksize is possible and what not really depends on your pool layout. Any raidz1/2/3 with 4 or more disk and a shift of 12 would be terrible.
 
Last edited:
  • Like
Reactions: sukerman
Thanks! - I see no option to do this in the GUI when added a HDD to the VM, so I guess I do this on the CLI on the host?
 
Last edited:
You can't define the volblocksize that for a single VM. You define it pool wide using the GUI: Datacenter -> Storage -> YourZfsPool -> Edit -> Block Size

You could use the CLI to manually create some zvols with individual volblocksizes and attach them to a VM but that doesn't make much sense because as soon as your restore that VM from a backup or migrate that VM, that zvol will be created again with the poolwide volblocksize ignoring the volblocksize that was used before.
 
Last edited:
  • Like
Reactions: sukerman
Ok - great, so basically separate pool and set the recordsize on the pool to 16K (underlying raid structure depending....) Thank you!

So lets say I have 2 NVME ssd and create a raidz1 pool across them, I would set record size to 32K? or still 16Kb?

Sorry if its a dumb question, I'm not sure on the internals of raid/ashift.
 
Last edited:
Ok - great, so basically separate pool and set the recordsize on the pool to 16K (underlying raid structure depending....) Thank you!
Volblocksize, not recordsize. The default 128K recordsize should be fine in most cases as it can dynamically adapt to the workload.
So lets say I have 2 NVME ssd and create a raidz1 pool across them, I would set record size to 32K? or still 16Kb?
You can't create a raidz1 pool of 2 disks. That always needs atleast 3 disks. Raidz1 is the equivalent to a raid5. I guess you mean a mirror as a equivalent to a raid1. I would leave that recordsize at 128K but change the volblocksize for that pool to 16K.
 
  • Like
Reactions: sukerman
I've done some testing.

AMD Ryzen 9 5900X - 12c/24t - 3.7 GHz/4.8 GHz
128 GB ECC 2666 MHz
2×1.92TB Samsung PM9A3 U.2 NVMe Server SSD in ZFS mirror

Very basic test with fio 16K blocks, not sure if this is representative of mysql load but anyhow......

Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=16k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

pool 128K / blocksize 8k

READ: bw=609MiB/s (639MB/s), 609MiB/s-609MiB/s (639MB/s-639MB/s), io=3070MiB (3219MB), run=5039-5039msec
WRITE: bw=204MiB/s (213MB/s), 204MiB/s-204MiB/s (213MB/s-213MB/s), io=1026MiB (1076MB), run=5039-5039msec

pool 128K / blocksize 128k

READ: bw=619MiB/s (649MB/s), 619MiB/s-619MiB/s (649MB/s-649MB/s), io=3070MiB (3219MB), run=4964-4964msec
WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=1026MiB (1076MB), run=4964-4964msec

pool 16K / blocksize 8k

READ: bw=1861MiB/s (1951MB/s), 1861MiB/s-1861MiB/s (1951MB/s-1951MB/s), io=3070MiB (3219MB), run=1650-1650msec
WRITE: bw=622MiB/s (652MB/s), 622MiB/s-622MiB/s (652MB/s-652MB/s), io=1026MiB (1076MB), run=1650-1650msec

pool 16K / blocksize 16k

READ: bw=1895MiB/s (1987MB/s), 1895MiB/s-1895MiB/s (1987MB/s-1987MB/s), io=3070MiB (3219MB), run=1620-1620msec
WRITE: bw=633MiB/s (664MB/s), 633MiB/s-633MiB/s (664MB/s-664MB/s), io=1026MiB (1076MB), run=1620-1620msec

Code:
root@titan:~# zfs get all | grep size
pool         recordsize            128K                   default
pool         dnodesize             legacy                 default
pool/r-128k  recordsize            128K                   default
pool/r-128k  dnodesize             legacy                 default
pool/r-16k   recordsize            16K                    local
pool/r-16k   dnodesize             legacy                 default

It appears that blocksize has no effect and recordsize is the important setting. I created the datasets then added them in 'storage'. You can then add them as disks to the vm and they have the correct blocksize.

Hope that helps.
 

Attachments

  • Captura de pantalla 2022-01-14 a las 17.17.01.png
    Captura de pantalla 2022-01-14 a las 17.17.01.png
    78.1 KB · Views: 21
What are you meaning if you refer to "blocksize"? Volblocksize? Blocksize used by fio for reads/writes?
Like already said, if you write/read to a dataset/LXC only the recordsize will be used and not the volblocksize.
If you write/read to a zvol/VM only the volblocksize will be used and no recordsize.
 
I am referring to the block size box when you add the zfs dataset in the gui. I set the recordsize of the dataset on the cli, then add the dataset onto storage on the gui. I assumed this was the vol block size used in vms that add hdd's from that storage
 
I am referring to the block size box when you add the zfs dataset in the gui. I set the recordsize of the dataset on the cli, then add the dataset onto storage on the gui. I assumed this was the vol block size used in vms that add hdd's from that storage
If you mean the "block size" box when adding a ZFS storage using "Datacenter -> Storage -> Add -> ZFS" then yes.
 
  • Like
Reactions: sukerman

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!