High I/O Delay – Please Help

Guillaume Soucy

Well-Known Member
Oct 20, 2017
70
5
48
30
L'Orignal, Canada
guillaumesoucy.com
Hi,

I have a dedicated server running under theses specs:

CPU(s) 8 x Intel(R) Xeon(R) CPU D-1521 @ 2.40GHz (1 Socket)
32 GB RAM - DDR4
Kernel Version Linux 4.15.18-7-pve #1 SMP PVE 4.15.18-27 (Wed, 10 Oct 2018 10:50:11 +0200)
PVE Manager Version pve-manager/5.2-10/6f892b40
6 x 12TB HDD – Set in Software RAID5.

Yesterday night at 11PM, all was fine, at 1AM, still fine but, at 9AM: Higher IO delay than the normal so, our VM just get slowed at point the services get affected. IO delays stay at around 10 with peak of 40.

We ran these VM on a local test server with an Intel Pentium @ 3.00GHz, 16GB of RAM and a single 10 TB 7200RPM Seagate HDD. We’ve never had to reboot the server a single time, never get high IO delay. No issues at all. Why we got theses kind of issues with a server with this configuration?

Get this issue one month ago, Sunday Nov 4 2018, we’ve rebooted the physical server and all get fine. But we can’t afford to be needed to reboot the dedicated server each month. Rebooting is not an option as it seem to only temporary fix the issue.

Thanks,

Guillaume
 
May I have some help please? Still slow and this still happening during a Sunday.

When I did
Code:
ps -aux
I see a process who use a lot of resources:

root 11579 41.0 0.0 0 0 ? DN 00:57 431:54 [md4_resync]

This process seem to use a lot of resources only one Sunday a month.

Someone can help me please?
 
Move away from RAID 5 to RAID 10, so you get more IOPS. RAID 5 is way to slow for majority of cases.

You probably have had a problem with one of the disks and resync is in progress, that is why you experience high IO. It will take some time to resync 12 TB.

To check what RAID is doing, you can take a look at mdstat (cat /proc/mdstat).
 
Ok next time this happen I will do mdstat to see more what happen, but my provider (OVH Canada) had made full system diagnosis few days ago and they can't find any issues, even with the disks.

Why it happen only once a month and its only during the first Sunday? :confused:

Moving to RAID 10 is an option but I just want to try if its possible to fix the trouble without doing so as this machine is a production server and this move will create some downtime. :(

Will keep you updated, but need to wait Sunday Feb 3rd as the issue happen only first Sunday of months.

Thanks guys!
 
have you checked the cron? monthly? Are you scrubbing? did you check the system logs ?
 
I just check the crontab file and there no cronjob currently set for root.

This is the result of
Code:
crontab -l

Code:
root@pve:~# crontab -l
no crontab for root

How to check if some scrubs are set? I have some FreeNAS systems on dedicated local machines who did scrub one time a week, Sunday. Not related at all with the current situation but I know how it can slowdown a lot a server! ;)

Guillaume
 
This may just be a mdadm scheduled check of the raid, not even a resync. You can define the speed the raid is allowed to use for that in the sysctl e.g..
 
Hey just find the root cause.

There was an cronjob according to the last line of: /etc/cron.d/mdadm


#
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
#
# Copyright © martin f. krafft <madduck@madduck.net>
# distributed under the terms of the Artistic Licence 2.0
#

# By default, run at 00:57 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi

So I just comment the last line of the file and I will monitor till Sunday Feb 3rd to confirm the complete resolution of the issue. ;)


Guillaume
 
I'm glad you found it, and i'm sure it will solve your problem of high IO delay, however now data corruptions will remain silent and with 12 TB disks, i am sure there will be a few. I just wanted to let you know.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!