Tuning the Qcow L2 cache parameter seems to be very important for just "normal" Qcow disk sizes...
This is good explanation of the issue i copy and past from blogs.igalia.com/berto/2015/12/17/improving-disk-io-performance-in-qemu-2-5-with-the-qcow2-l2-cache/
A qcow2 file is organized in units of constant size called clusters. The virtual disk seen by the guest is also divided into guest clusters of the same size. QEMU defaults to 64KB clusters, but a different value can be specified when creating a new image:
qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G
In order to map the virtual disk as seen by the guest to the qcow2 image in the host, the qcow2 image contains a set of tables organized in a two-level structure. These are called the L1 and L2 tables.
There is one single L1 table per disk image. This table is small and is always kept in memory.
There can be many L2 tables, depending on how much space has been allocated in the image. Each table is one cluster in size. In order to read or write data to the virtual disk, QEMU needs to read its corresponding L2 table to find out where that data is located. Since reading the table for each I/O operation can be expensive, QEMU keeps a cache of L2 tables in memory to speed up disk access.
The L2 cache can have a dramatic impact on performance. As an example, here’s the number of I/O operations per second that I get with random read requests in a fully populated 20GB disk image:
L2 CACHE SIZE AVERAGE IOPS
1 MB 5100
1,5 MB 7300
2 MB 12700
2,5 MB 63600
In order to choose the cache size we need to know how it relates to the amount of allocated space.
The amount of virtual disk that can be mapped by the L2 cache (in bytes) is:
disk_size = l2_cache_size * cluster_size / 8
With the default values for cluster_size (64KB) that is
disk_size = l2_cache_size * 8192
So in order to have a cache that can cover
n GB of disk space with the default cluster size we need
l2_cache_size = disk_size_GB * 131072
QEMU has a default L2 cache of 1MB (1048576 bytes) so using the formulas we’ve just seen we have
1048576 / 131072 = 8 GB of virtual disk covered by that cache.
This means that if the size of your virtual disk is larger than 8 GB you can speed up disk access by increasing the size of the L2 cache. Otherwise you’ll be fine with the defaults.
How to configure the cache size
Cache sizes can be configured using the -drive option in the command-line, or the ‘blockdev-add‘ QMP command.
There are three options available, and all of them take bytes:
- l2-cache-size: maximum size of the L2 table cache
- refcount-cache-size: maximum size of the refcount block cache
- cache-size: maximum size of both caches combined
There are two things that need to be taken into account:
- Both the L2 and refcount block caches must have a size that is a multiple of the cluster size.
- If you only set one of the options above, QEMU will automatically adjust the others so that the L2 cache is 4 times bigger than the refcount cache.
This means that these three options are equivalent:
-drive file=hd.qcow2,l2-cache-size=2097152
-drive file=hd.qcow2,refcount-cache-size=524288
-drive file=hd.qcow2,cache-size=2621440
Although I’m not covering the refcount cache here, it’s worth noting that it’s used much less often than the L2 cache, so it’s perfectly reasonable to keep it small:
-drive file=hd.qcow2,l2-cache-size=4194304,refcount-cache-size=262144
The problem with a large cache size is that it obviously needs more memory. QEMU has a separate L2 cache for each qcow2 file, so if you’re using many big images you might need a considerable amount of memory if you want to have a reasonably sized cache for each one. The problem gets worse if you add backing files and snapshots to the mix.
Management of this parameters can give a huge performance gain for some usecases and and being able to edit them in the Qcow disk creation panel in Proxmox could be a big improvement ;-)