[SOLVED] trouble with IO delay

Jan 8, 2021
42
14
13
Germany, near Bonn
I have a problem with the IO delay and I'm hoping for the combined knowledge of the forum:

First of all a few remarks about me: I am an IT-interested person with a lot of half-knowledge ;) and I am relatively new to proxmox. As part of an upcoming SW update of my current server (Ubuntu 14.04 as DC, fileserver, mailserver, OpenVPN, media server, smarthome visualisation etc.) in my small home network, I decided to give the hardware an update as well. It has become an HP Microserver Gen8 (Xeon CPU E3-1220L with 16GB Ram, 2x WD30EFRX). I think it has more than enough power for my needs. With proxmox I wanted to separate the individual programmes/services a little from each other so that I wouldn't have to shut down everything at once in the event of an update with problems ....

Since Linux does not support the built-in Raid controller, I installed the system in AHCI mode. In principle, I feel that it runs quite well with actual 2 VMs and 1 LXC container. But if I want to copy large amounts of data (both within a VM and externally into a VM), the IO delay peaks at 60 and more. Of course, this is not working and certainly not in the spirit of the inventor ..... :(.

Now I am at a bit of a loss. I've read a few things on the net but I can't find any clues. I actually installed proxmox with the default settings in RAIDZ1. You shouldn't be able to do much wrong. The disks are of an older type, but they shouldn't be the bottleneck, should they? What can I check / test to get to the root of the problem and solve it?

Thank you for your willingness to help.

[edit] define type of HD and installed RAID[/edit]
 
Last edited:
Hi,

But if I want to copy large amounts of data (both within a VM and externally into a VM), the IO delay peaks at 60 and more. Of course, this is not working and certainly not in the spirit of the inventor ..... :(.

I mean, IO wait is just the symptom here, just tells you the time processes waited on IO request to finish which is naturally higher for copying lots of data at once.

Now I am at a bit of a loss. I've read a few things on the net but I can't find any clues. I actually installed proxmox with the default settings in RAIDZ1. You shouldn't be able to do much wrong. The disks are of an older type, but they shouldn't be the bottleneck, should they? What can I check / test to get to the root of the problem and solve it?

I mean the WD30EFRX aren't exactly speedsters, they got released almost 9 years ago and spinners only is always on the slow end, the 5400rpm won't help either.

RAIDZ-1 with two disks sounds wrong, are you sure it's not RAID1?

What could help is adding a small but fast SSD as ZIL device:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_add_cache_and_log_dev

For lots of smaller files we found that a special device mirror, which is used for metadata and small writes only, helps also a lot too: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_special_device
But here you need at least two fast SSDs which may be overkill for your setup, as then you may fare better two just switch the two spinners with SSDs and be done.
 
  • Like
Reactions: Lueghi
@thomas
Thanks for your answer. So if I understand your posting in the right way, then the bottleneck are really the old and slow hard drives and the behavier while copying large files is normal with them ...
The large data I'm copying I've mainly during migration of my old system. So I will check what's going on with the system. And perhaps i have to invest in new hard drives in the future.
Finally I like proxmox. It looks for me very simple to get my programs splited in different VM's/LXC and I can update them without breaking down everyting :cool:.
 
Hello @Lueghi i'm on the same boat.. gen8 microserver... an i'm having the same issues... event i try a few things to solve that... but i'm not able to.

First thing that you need to remember is that disktray in the gen8 have different sata speeds.

Anyway my problems are with ceph... i made a cluster with 3 microserver gen8 create a ceph with one ssd in each node connected the nodes with 10GB network so i have shared storage. I have a VM in one node that have this wirte graph

1626251486712.png
The node where this VM is running the delay is like this
1626251553470.png

The problem are the other two nodes where the delay is like this
1626251588988.png
and this
1626251608696.png

In teory i think nothing should give this results, so maybe something with hardware that is really causing this io delay...

Do you get a solution to your problem?

Anyone can point some directions how to debug this...
 
Sorry but with ceph I can't help. My problems with the iowait are really the slow hard drives. ZFS makes a lot disk traffic compared to other file systems like nfs etc. And with slow hard drives this can't be solved. I have to accept or to invest into new hard drives ....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!