IOPS limit

dignus · Sep 29, 2016

Hey all,

I have a weird issue. I setup IOPS limit to 500 on a KVM VM. It seems to work, from within the VM, see this graph from zabbix: http://imgur.com/a/ZhMO0

So far so good. This VM is the only VM living on the storage platform. If I look at the IO activity on the storage platform, it's doing over 12k IOPS: http://imgur.com/a/DGDKZ

Is there any reason that you can think of that there may be such a huge difference between the 2 numbers?

wolfgang · Sep 29, 2016

Hi,

may be your storage makes snapshots, syncing staff or the Storage OS need much IOs.

You have to analyse on the storage side from which process the IO comes.

dignus · Sep 29, 2016

Well, that's the issue. No other background processes are running. The OS is running on a different disk. Furthermore - when I stop the VM, alle IOPS are gone, so we can be sure it is caused by the VM.

The other interesting part is that is constant behavior. Right at this moment the VM reports 150 IOPS (iostat), while iostat on storage array (ZFS based) reports about 3000. It's about the same multiplier as during the peaks - 20 fold.

Now, this is a ZFS array, I know it writes a journal (no ZIL in place), but 20 times the IOPS is just a bit out there. I wonder if there's something wrong in the KVM process itself.

wolfgang · Sep 29, 2016

I checked it on an lvm and it is ok.
set the brust also on 150 IOPS.
and get max 200 IOPS.
virtio was the bus interface.
If I read write at the same time I can go on 400IPOS what is correct 150 in 150 out.
I think this is an zfs behavior.

Consider ZFS do always write data double.
The if you have a Raid it deepens on the Raid level, how often it will write.
So 20 times more can in real only 5 times more and I did not calculate the Raid level.

dignus · Sep 29, 2016

Hm, that sounds reasonable. Although the 20 times higher IOPS is a difference I can't explain. The array is a RAIDZ2 array, so, yeah, there'll be more writes, but 20 times is still weird.

Thanks for checking on the LVM. I didn't have access to a setup to test that.

LnxBil · Sep 30, 2016

You also have to consider zfs volblocksize, guest blocksize, guest block alignment, zfs raidz and compression. In the worst case you have a write multiplication. Assume you have to update one 4K block on a misalign disk which involes reading two blocks of 8K on all your disks where you have the data and checksums of the blocks, then both blocks are partially written and the checksum has to be computed for two 8K blocks and the two blocks including checksum has to be written to disk. This is really huge. Better would be mirrored ZFS where you would only need to write 2 times two 8K blocks in a misaligned setup.

That's consistent with the graph. You have to read a lot when you want to write. Please try to optimize your whole stack:

get volblocksize of ZFS volume (should be 8K - as it is the default on Proxmox)
(only for interest, get ashift value. Should be 9 or 12, whereas 9 is better for compression on non 4K sector disks)
get alignment of your virtual disk (blocks should be aligned on the ZFS volblocksize)
ideally guest fs blocksize should be volblocksize of a multiple thereof

Search

Search

IOPS limit

dignus

Renowned Member

wolfgang

Proxmox Retired Staff

dignus

Renowned Member

wolfgang

Proxmox Retired Staff

dignus

Renowned Member

LnxBil

Distinguished Member

We value your privacy