Advanced disk paramters

GabrieleV · Dec 21, 2015

Hello,
I've read this article:
http://blogs.igalia.com/berto/2015/...formance-in-qemu-2-5-with-the-qcow2-l2-cache/
I would like to do some test using the -disk parameter l2-cache-size: is there a way to pass from proxmox this disk parameter ?
I see in man vm.conf that there are specific options, but didn't find out how to pass parameters specific to the disk images.

Thank you,
GV

dietmar · Dec 21, 2015

There is currently no way to set that. IMHO such things should be set by the framework automatically. Anyways, I would accept patches...

GabrieleV · Dec 21, 2015

I'll try on the command line ... if it's like stated in the article, may be useful to have such a feature. I'll let you know.

Thank you for your reply,
GV

zetasyanthis · Oct 18, 2018

Following up on this long-idle thread... I've been running some experiments on Centos 7 and Proxmox (both using virsh under the hood) on top of NVMe drives. Seems to be pretty consistent that setting this value properly can result in a 4x increase in IOPS. Is there any way currently to set this option while creating a block device for a VM?

Lucio Magini · Nov 15, 2018

Tuning the Qcow L2 cache parameter seems to be very important for just "normal" Qcow disk sizes...

This is good explanation of the issue i copy and past from blogs.igalia.com/berto/2015/12/17/improving-disk-io-performance-in-qemu-2-5-with-the-qcow2-l2-cache/

A qcow2 file is organized in units of constant size called clusters. The virtual disk seen by the guest is also divided into guest clusters of the same size. QEMU defaults to 64KB clusters, but a different value can be specified when creating a new image:

qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G

In order to map the virtual disk as seen by the guest to the qcow2 image in the host, the qcow2 image contains a set of tables organized in a two-level structure. These are called the L1 and L2 tables.

There is one single L1 table per disk image. This table is small and is always kept in memory.

There can be many L2 tables, depending on how much space has been allocated in the image. Each table is one cluster in size. In order to read or write data to the virtual disk, QEMU needs to read its corresponding L2 table to find out where that data is located. Since reading the table for each I/O operation can be expensive, QEMU keeps a cache of L2 tables in memory to speed up disk access.

The L2 cache can have a dramatic impact on performance. As an example, here’s the number of I/O operations per second that I get with random read requests in a fully populated 20GB disk image:

L2 CACHE SIZE AVERAGE IOPS
1 MB 5100
1,5 MB 7300
2 MB 12700
2,5 MB 63600

In order to choose the cache size we need to know how it relates to the amount of allocated space.
The amount of virtual disk that can be mapped by the L2 cache (in bytes) is:

disk_size = l2_cache_size * cluster_size / 8

With the default values for cluster_size (64KB) that is

disk_size = l2_cache_size * 8192

So in order to have a cache that can cover n GB of disk space with the default cluster size we need

l2_cache_size = disk_size_GB * 131072

QEMU has a default L2 cache of 1MB (1048576 bytes) so using the formulas we’ve just seen we have 1048576 / 131072 = 8 GB of virtual disk covered by that cache. This means that if the size of your virtual disk is larger than 8 GB you can speed up disk access by increasing the size of the L2 cache. Otherwise you’ll be fine with the defaults.

How to configure the cache size
Cache sizes can be configured using the -drive option in the command-line, or the ‘blockdev-add‘ QMP command.

There are three options available, and all of them take bytes:

l2-cache-size: maximum size of the L2 table cache
refcount-cache-size: maximum size of the refcount block cache
cache-size: maximum size of both caches combined

There are two things that need to be taken into account:

Both the L2 and refcount block caches must have a size that is a multiple of the cluster size.
If you only set one of the options above, QEMU will automatically adjust the others so that the L2 cache is 4 times bigger than the refcount cache.

This means that these three options are equivalent:

-drive file=hd.qcow2,l2-cache-size=2097152
-drive file=hd.qcow2,refcount-cache-size=524288
-drive file=hd.qcow2,cache-size=2621440

Although I’m not covering the refcount cache here, it’s worth noting that it’s used much less often than the L2 cache, so it’s perfectly reasonable to keep it small:

-drive file=hd.qcow2,l2-cache-size=4194304,refcount-cache-size=262144

The problem with a large cache size is that it obviously needs more memory. QEMU has a separate L2 cache for each qcow2 file, so if you’re using many big images you might need a considerable amount of memory if you want to have a reasonably sized cache for each one. The problem gets worse if you add backing files and snapshots to the mix.

Management of this parameters can give a huge performance gain for some usecases and and being able to edit them in the Qcow disk creation panel in Proxmox could be a big improvement ;-)

Humbug · Nov 16, 2018

I filed a feature request for qcow2 l2-cache-size parameter here:
https://bugzilla.proxmox.com/show_bug.cgi?id=1989

Lucio Magini · Nov 17, 2018

Speaking about future Proxmox VE enhancement about Qcow2 there is also anothe relevant variabile to put under control.
It the Qcow2 cluster size.

There is a good exlanation of Qcow2 cluster size relevance here on jamescoyle blog "2055-qcow2-image-format-and-cluster_size":

" A virtual disk, much like how operating systems treat physical disks, are split up into clusters; each cluster being a predefined size and holding a single unit of data. A cluster is the smallest amount of data that can be read or written to in a single operation. There is then an index lookup that’s often kept in memory that knows what information is stored in each cluster and where that cluster is located.

A qcow2 filesystem is copy-on-write (q’cow’2) which means that if a block of data needs to be altered then the whole block is re-written, rather than just updating the changed data. This means that if you have a block size of 1024 (bytes) and you change 1 byte then 1023 bytes have to be read from the original block and then 1024 bytes have to be written – that’s an overhead of 1023 bytes being read and written above the 1 byte change you created. That over head isn’t too bad in the grand scheme, but imagine if you had a block size of 1 MB and still only changed a single byte!
On the other hand, with much large writes another problem can be noticed. If you are constantly writing 1MB files and have a block size of 1024 bytes then you’ll have to split that 1MB file into 1024 parts and store each part in a cluster. Each time a cluster is written to, the metadata must be updated to reflect the new stored data. Again then, there is a performance penalty in storing data this way. A more efficient way of writing 1MB files would be to have a cluster size of 1MB so that each file will occupy a single block with only one metadata reference."

This is known as "Write amplification" from guest filesystem on Qcow2.

To minimize write amplification sector size of the guest need to be "tuned" for the usecase.

To minimize write amplification in virtual enviroments the sector size of the guest need to be "tuned".
BUT You need to consider also the write amplification you can experience due to mismatch of Qcow2 cluster size and the underlaing Posix file system on wich the Qcow2 file reside.
For example when you use a Qcow2 file on a datastore provisioned via NFS on a ZFS box you are using a 64k cluster size on top a (Probably) 128K record set (this is the standard ZFS recorset).
Qemu permit to set Qcow cluster size from 512byte to 2MB

Of course you can act on ZFS record set to match the 64K cluster size of Qcow, but it could be usefull sometimes to have control on Qcow2 cluster size parameter also in the VM disk creation Panel in Proxmox to tune usecase.

So Qcow2 creation panel enhancement should take care of L2 cache and also of Cluster Size.

Note that when you use a bigger cluster size you need a smaller L2 cache size for the same disk size because the numer of entries in the L2 table is smaller

Really Qcow2 Disk size, Qcow2 cluster size and Qcow2 L2 Table Cache size can be "linked togheter" for the best result so Proxmox Panel could set "automagically" the right values but leave poeple free to modify the numbers for specific tuning ;-)

Humbug · Nov 23, 2018

Lucio, i added your request for a qcow2 cluster size parameter to the ticket
https://bugzilla.proxmox.com/show_bug.cgi?id=1989
What application is your picture taken from?

Lucio Magini · Nov 23, 2018

Humbug said:
Lucio, i added your request for a qcow2 cluster size parameter to the ticket
https://bugzilla.proxmox.com/show_bug.cgi?id=1989
What application is your picture taken from?

It's a simple mockup produced with Pencil Project Portable

Search

Search

Advanced disk paramters

GabrieleV

Renowned Member

dietmar

Proxmox Staff Member

GabrieleV

Renowned Member

zetasyanthis

New Member

Lucio Magini

Member

Humbug

Active Member

Lucio Magini

Member

Attachments

Humbug

Active Member

Lucio Magini

Member