High IO Delay

masterdaweb

Well-Known Member
Apr 17, 2017
87
5
48
32
Hi,

i'm facing high IO Delay.

when I use command "iotop" it shows justs low readings / write in MB/s, about 1-3 MB/s

Server specs:

- mdadm raid 0 (2x disks)
- ext 3
- qcow2 images
- about 18 VMs running

I attached image below

iodelay.png

gui.png
 
Last edited:
are your disks slow? high io delay just means that that the cpu waits for io to finish while it has nothing better to do.
if your vms are accessing many small files and your storage cannot keep up, this is no surprise (btw. mdraid is not supported/recommended by us and raid0 is especially dangerous)
 
  • Like
Reactions: masterdaweb
are your disks slow? high io delay just means that that the cpu waits for io to finish while it has nothing better to do.
if your vms are accessing many small files and your storage cannot keep up, this is no surprise (btw. mdraid is not supported/recommended by us and raid0 is especially dangerous)
Hi @dcsapak

Do you think that if I change qcow2 to raw format, I'll experience better IO performance ? In numbers, how good that would be approximately ?

Despite of Raid 0 be dangerous, it improves IO performance, and we have remote backups.
 
Last edited:
Raw will lead better I/O performance, but you'll also loose some nice features. If you run only 2 SATA 7.2k devices, you will not have a lot of possible I/Os. If your workload is totally random, you'll have a very, very slow system.

I/O performance can only be increased by using some cache (e.g. hardware raid controller with cache and BBU) or by using SSDs. Two disks are not fast at all.

You can monitor you individual disk I/O delay with iostat. Sometime if one disk goes bad, this can be notices by very slow reads on one disk.
 
I get this trouble on my server some time ago. Look at you HDD's. Maybe it 4k bytes physical sector hdd.

Use: cat /sys/class/block/sdX/queue/physical_block_size
If you see 4096 it and you trouble too.

So if you use 512 byte logical sector in 4k bytes physical sector you lose some performance. And when you create volumes you wasn't align you partition to physical sector size you lose very many performance. If this is you situation too you can look more in with request "4k hdd linux" to google.
 
Last edited:
@LnxBil and @lifespirit, thank you for your reply.

I've been thinking in acquiring Hybrid SoftRaid 2x2TB HD + 2x480GB SSD.

So my question is: is it possible to use both Hard Disks for VM data, and both SSD for cache ? If so, how to do this ?
 
@LnxBil and @lifespirit, thank you for your reply.

I've been thinking in acquiring Hybrid SoftRaid 2x2TB HD + 2x480GB SSD.

So my question is: is it possible to use both Hard Disks for VM data, and both SSD for cache ? If so, how to do this ?
If you arre not rebooting your proxmox, you can use zfs zil/l2arc ...
you can parted your ssds into 10g and 400g partition
from scratch : zpool create tank hdd1 hdd2 log ssd1-10g ssd2-10g cache ssd1-400g ssd2-400g
(leaving 70g as OP), zfs will place your data balanced across devices, sometimes better than a simple RAID0, and performs better than lvm-cache, but it's cache part would be abandoned when rebooting.
You can also use Hardware RAID Card solutions like LSI CacheCade Pro2, but in my case it's performance would be bad, random access is not good.
 
Last edited:
  • Like
Reactions: masterdaweb
If you arre not rebooting your proxmox, you can use zfs zil/l2arc ...
you can parted your ssds into 10g and 400g partition
from scratch : zpool create tank hdd1 hdd2 log ssd1-10g ssd2-10g cache ssd1-400g ssd2-400g
(leaving 70g as OP), zfs will place your data balanced across devices, sometimes better than a simple RAID0, and performs better than lvm-cache, but it's cache part would be abandoned when rebooting.
You can also use Hardware RAID Card solutions like LSI CacheCade Pro2, but in my case it's performance would be bad, random access is not good.
I wish that I could use ZFS to speed up, but the problem is that I can't provisioning VMs to ZFS by cloning a template from a NFS storage.

For example, I have a Node X that only stores templates (NFS) and another Node Y that store VMs (ZFS). I can't clone from NFS to ZFS because ZFS is not shared (https://pve.proxmox.com/wiki/Storage).
 
Couldn't quite get why you were blocked from creating VMs
but you can mount nfs as normal folder on Node Y and create vm to zfs there
 
  • Like
Reactions: masterdaweb
I use for this lvm cache. Work fine. If you interesting this is my configuration:
1. I have 2HDD (booth 2 TB) + 2SSD (128GB+512GB)
2. I install proxmox in stock (ext4) on SSD128G. He create lvm group pve, 2 logical volumes root and swap and thin-pool pve-data when install.
3. In first I add SSD512G and booth HDD to lvm group. I aligned logical volume on HDD via physical block before.
4. I delete thin-pool pve-data and create it on HDDs with emulation raid0, but if you want you can create raid1 or raid 10 via lvm. Thin-pool - this is lvm logical disk. This disk contains another lvm disks with vitrual machines. I add cache to thin pool and all my vm work with cache. But if you want you can don't create thin-pool. Logical volume work with cache similar. If you want you can migrate qcow2 <-> lvm. For this do 'dd if=file.qcow2 of=/dev/pve/vm-disk'
5. Create cache metadeta volume. Size: cache_volume_size/1000
lvcreate -L 606M -n data_cache_meta pve /dev/sda1 /dev/sdb1
6. Create cache volume:
lvcreate -L 592G -n data_cache pve /dev/sda1 /dev/sdb1
7. Create connection meta_data <-> data for cache
lvconvert --type cache-pool --poolmetadata pve/data_cache_meta pve/data_cache
8. activate cache for volume or thin-pool
lvconvert --type cache --cachepool pve/data_cache pve/data
Cache can be Write cache and Read cache. Write cache can be parallel (SSD+HDD) or line (SSD->HDD), but line write cache wery unsafe. He can died after power reset.

My result:
ACTUAL DISK READ: 8 M/s | Actual DISK WRITE: 5 M/s
I/O vait: +/-2%

Vhen VMs start:
ACTUAL DISK READ: 20 M/s | Actual DISK WRITE: 10 M/s
I/O vait: +/-5%
 
Last edited:
  • Like
Reactions: masterdaweb
I tryed ZFS before. According to my feelings ZFS - fs for realy hard projects. Cache wasn't can be configured, it must be configured. In default ZFS don't show good I/O in 2 HDD. Many wery specifics configurations. If you have realy many time and realy many hdd/ssd you can config ZFS realy good, but i wasn't can. So... In one day i format all my HDD and lvm volume group create.

I was mistaked in previos post. Line cache (SSD -> HDD) UNsafe. Cache cleaned in every boot. If some data don't write when server down (power reboot) you lose this.
 
I tryed ZFS before. According to my feelings ZFS - fs for realy hard projects. Cache wasn't can be configured, it must be configured. In default ZFS don't show good I/O in 2 HDD. Many wery specifics configurations. If you have realy many time and realy many hdd/ssd you can config ZFS realy good, but i wasn't can. So... In one day i format all my HDD and lvm volume group create.

I was mistaked in previos post. Line cache (SSD -> HDD) UNsafe. Cache cleaned in every boot. If some data don't write when server down (power reboot) you lose this.

2x SSD 480 GB used for cache is too much ? I'll be using 2x 2 TB HD for storing VMs
 
2x SSD 480 GB used for cache is too much ? I'll be using 2x 2 TB HD for storing VMs
You can cached full HDD size, why not? Think what VMs you have and how much disk I/O created all of them. Cache created with hot data. He don't balanced. You can create logical wolumes as you can imagination.
small RAID0 on SSD + HDD_RAID1/SSD_Cache
small RAID1 on SSD + HDD_RAID1/SSD_Cache
no SSD + HDD0/SSD_Cache
no SSD + HDD1/SSD_Cache
This 4 very simple configs, but you can create any combinations. All depends on you VMs.
 
  • Like
Reactions: masterdaweb