IO delay randomly appearing, proxmox 5.1-36

dark.mania

New Member
Nov 20, 2018
8
0
1
32
Hi everyone,

I have been checking having a reccuring problem lately on my proxmox.

Since one week, my proxmox sever start to have some/lot of IO delay like tonight :
io.png
the same graphic but in days :
io2.png


When checking /var/log/syslog the only alert I have that could be cause or a consequence of this IO Delay is RRDCached.
So the only way to get past this problem was to do the following :
systemctl stop rrdcached.service
rm -rf /var/lib/rrdcached/db/pv2-storage/*
rm -rf /var/lib/rrdcached/db/pv2-vm/*
rm -rf /var/lib/rrdcached/db/pv2-node/*
rm -rf /var/lib/rrdcached/journal/


It have already happened only twice in the past (10month ago and 8 month ago), but now in just 8 days I had do this "fix" 5 times already !

Here is what I get with the command (only rrdcached error ...) :
zgrep --color=always "error\|fail\|crit" /var/log/syslog | ccze -A
syslog.png

And here is what syslog show me for (full syslog before "rrdcached" errors happened) :
zgrep --color=always "20 20:3" /var/log/syslog | ccze -A
syslog1.png

Everytime this error happen all my VM are slowed to the point to be unusuable...
So I'm starting to get concerned about having some real issues on my configuration...

Here are some information on the current configuration :
Standalone Proxmox server (so I doubt there is any quorum problem involved)
VMs running on it:
-1 windows
-6 linux

pveversion :
pveversion.png

pveperf :
pveperf.png

iostat -x 2 5 (first result / boot result) :
iostat0.png

iostat -x 2 5 :
iostat.png

Two "iotop" result with 10s appart ... -> "kworker" come and leave fast .... even if at 99.99% :
iotop.png
iotop1.png
 
Last edited:
If you are having high IO because of ZFS, just buy a small cheap enterprise grade ssd for < 100 $ and use it as cache.
Also check your scheduler for disks, it might help to use deadline when they are congested.
 
If you are having high IO because of ZFS, just buy a small cheap enterprise grade ssd for < 100 $ and use it as cache.
Also check your scheduler for disks, it might help to use deadline when they are congested.

I don't remember using ZFS (didn't install promox 5 ZFS at least), so I don't think it would impact my server / VM in any way (why would that happen now [this week + randomly] and not everyday ?).
Here is a screen shot for pveperf (executed two time in row), i had to fix / reboot/ restart all VM last night, so the "load" (clients loading websites on it) is higher right now than it was yesterday when IO delay started.... (when "normal", the server got a higher "buffered reads" and "fsyncs/second"):
pveperf2.png


I am currently checking all HDD (raid 5) to see if one of them is dying (maybe you have some tips other than using smartctl ?).

Oh and the server is from OVH and it has to be alive at anytime (clients websites are running on it) so I can't change it's hardware configuration as I wish (other than asking for disk change in case of a dead drive).



I'm sorry in advance for my lack of knowledge, but I don't know anything about your last line...
I'll check / search online for that, but if you could guide me or already have a tutorial somewhere ?
Here is what I have found on my server about it so far.
All five disk (/dev/sda to /dev/sde) have this :
cat_scheduler.png
Should I change that to "noop deadline [cfq]" ? (as it's already deadline that is choosen..)

I based my search on this :
https://www.cyberciti.biz/faq/linux-change-io-scheduler-for-harddisk/
 
Last edited:
After checking all disk, they don't seem to have any problem, here is smartctl for each disk after doing the following command :
smartctl -t short -d scsi /dev/sda
sda.png sdb.png sdc.png sdd.png sde.png
Only /dev/sdd show more "non-medium error count" (when writing from what I can see in the table above that line), I don't know much about what it means so it can also be important, tell me if that is the case !

And here is the result for:
mdadm --detail /dev/md4
md4.png

I don't know what to check now.

The only thing I can think of is to backup the VM and reset the server and move to proxmox 4.X (if that's possible) as it seems that only proxmox 5.X got this kind of IO delay problem...
 
Hi again,
i took some time and read your posts in more detail.
It seems you have 5 HDD disks in RAID 5, right?
It seems you are also using software RAID, so you actually installed Debian and later ProxMox on top of it.
Can you show me the details: cat /proc/mdstat ?
Can you show me also lvs, vgs and pvs?

Your setup lacks IOPS and is currently unsuitable for your workload. That is why you have high IO WAIT
The main problem is with RAID 5 which is very, very slow.
While it might have run "decently" in the past, your customers needs are increasing, and there is no more resources that can be used by them.

What I would do, is create a new server, set it up properly this time (maybe hire a competent person, a sys admin by trade) and move all customers VMs to new server. Then decommission the old. If new server is set up correctly, you should not experience slowdowns and also get a nice max IOPS.
 
For the installation I used OVH template :
they ask you what template you want to use between the ones they provide (for proxmox there is only one template for 4.X and one for 5.X, not much choice) and then they ask you if you want to use raid and what type.
With 5 disk I thought it was a good idea to set it as a RAID5...

Here is /proc/mdstat :
mdstat.png

And lvs / vgs / pvs (didn't knew these commands):
lvs-vgs-pvs.png

As I said in my previous answer, IO delay seems to happen randomly : like when not much clients are on it (like yesterday)...
For exemple yesterday I started fixing my problem around 23h then from 00h00 (stopped all VM) to 01h00 (start all VM back), I stayed on the server and checked iotop / top / iostat / proxmox graphic and it still had the same high IO delay.
My thought is that a VM may be the cause of the IO delay, but that would be strange as the IO delay continue to stay even with all VM stopped -> no work load on server ...

Other wise why would it work like a charm right now (with load) and go down and there is no one around :
graph.png

Next time it happens I'll do the same test with more test/screenshot (I'm pretty much sick so I did it sleeply...).
 
Your fsync/second are definitively too low.
Probably this is due to the use of software raid for disks
Good hardware raid controller are provisioned with a write cache that give two orders of magnitude fsync/second
Depending of your workload the lack of fast fsync response can produce high I/O delay
Practically your VMs try to write in sync mode all togheter to obtain data persistance and this generate contention on phisical disk.
Usecases that use fsync write semantic are email server, for example, but also Database workload.
You can mitigate the problem using a writeback or unsafe cache qemu disk cache setting BUT be aware that this is an UNSAFE mode in case of power failures.
Your VM disk can be corrupted in case of a sudden lost of power (aka: use a good UPS to shoutdown gently VMs in case)
 
I will write it one last time.
Your problems are related to lack of IOPS.
Lack of IOPS is because of RAID 5 with HDDs.
Even hardware RAID 5 with cache would probably become to slow for your use case.
Use RAID 10.

Gambling with your customers data by lying about data being written to the VMs or applications running on hypervisor seems unacceptable to me, so I did not even suggest it. But hey, it is your business, do what you like. In that case, you can also disable barriers on ext filesystem. IOPS will increase dramatically.

On a personal note, taking into account all that was written in this thread, i would kindly suggest you stop playing system administrator when doing business and hire a real one.
 
  • Like
Reactions: guletz
Well after reading Lucio Magini I was checking RAID5 / RAID10 cost on iops...

So from what you say that should be the best solution for me (even if it does reduce the total HD memory available).

And yes we are already considering hiring a administrator for this job, but before that I prefered to know what wrong and where at least.

I'll check it all for the rest of the week and write there some time later for those reading this thread in the future.

So for now my problem is "pending" with a solution.

Thanks a lot for the help !
 
Well after reading Lucio Magini I was checking RAID5 / RAID10 cost on iops...

Hi,

iops is only one problem.... any raid5 have the same iops like for one hdd. A raid10 is better for iops (is like 2 x hdd iops if you will use 4 hdd).
The bigest problem that I see is the fact that yours hdd are very large (6 Tb). You must test the time need in case of one hdd will be broken and you replace with a new one. It could take many days ... If in this time another disk will be broken (raid5 or raid10 from the same mirror ) -> end of the game !
Also with mdraid (raid10 or whatever) it is highly possible to have corrupted data, and then you be aware about this it will be too late (as I seen myself on smaller hdd with raid1), because with mdraid you do not have any checks that will tell you that What You Write is the same on What You Read.
Read your hdd specifications (regarding of unrecover/fail readings blocks / TB), and you will find the optimistic value from hdd maker point of view! Make your math and judge yourself ... after how many weeks / month you can lose some data.

So from what you say that should be the best solution for me (even if it does reduce the total HD memory available).

Mybe yes, mybe not. Without any info about what do you run in yours vm, nobody can say what is the best for your case.
Well after reading Lucio Magini I was checking RAID5 / RAID10 cost on iops...

So from what you say that should be the best solution for me (even if it does reduce the total HD memory available).

And yes we are already considering hiring a administrator for this job, but before that I prefered to know what wrong and where at least.

I'll check it all for the rest of the week and write there some time later for those reading this thread in the future.

So for now my problem is "pending" with a solution.

Thanks a lot for the help !
And yes we are already considering hiring a administrator for this job, but before that I prefered to know what wrong and where at least.

The best option is to hire a administrator as other guy already say. It is like I have a problem on my car with brakes! If I do not have knowledge about cars, the best I can do, for my safety it to go to a guy who understand and have knowledge about cars/mecanics ;)

Sorry, realy I do not want to upset you with my long response.

Have a nice day !
 
No problem with the long answer (as if mine wasn't).

Just if you want to know more about VM, currently there is :
6 linux with each 1GB ram, 1socket/1core and 5GB to 250GB disk (most of them running at less than 10% CPU and 600MB of ram used)
1 windows 2016 server with 6GB ram, 4sockets/2cores and 3 drives (70GB, 200GB and 1TB) (using less then 5% CPU and 2GB of ram)

Today server IO delay was around 0.6% - 0.9% all day.
 
For the installation I used OVH template :
they ask you what template you want to use between the ones they provide (for proxmox there is only one template for 4.X and one for 5.X, not much choice) and then they ask you if you want to use raid and what type.
With 5 disk I thought it was a good idea to set it as a RAID5...

Here is /proc/mdstat :
View attachment 8772

And lvs / vgs / pvs (didn't knew these commands):
View attachment 8774

As I said in my previous answer, IO delay seems to happen randomly : like when not much clients are on it (like yesterday)...
For exemple yesterday I started fixing my problem around 23h then from 00h00 (stopped all VM) to 01h00 (start all VM back), I stayed on the server and checked iotop / top / iostat / proxmox graphic and it still had the same high IO delay.
My thought is that a VM may be the cause of the IO delay, but that would be strange as the IO delay continue to stay even with all VM stopped -> no work load on server ...

Other wise why would it work like a charm right now (with load) and go down and there is no one around :
View attachment 8775

Next time it happens I'll do the same test with more test/screenshot (I'm pretty much sick so I did it sleeply...).


On a local storage some iops are also produced from the host, not only from guest VM.
So you can have I/O delay also with VM in stop mode.
With a Fsync value of 0.33 there no room for any I/O (the value means your system can write on disk one time every 3 seconds(!!)
With a Fsync value of 40 you have about 25% of the normal I/O capacity of a Pc with a single mechanic sata disk (Usually about 200 iops/sec)
 
  • Like
Reactions: guletz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!