IO delay randomly appearing, proxmox 5.1-36

dark.mania · Nov 20, 2018

Hi everyone,

I have been checking having a reccuring problem lately on my proxmox.

Since one week, my proxmox sever start to have some/lot of IO delay like tonight :

the same graphic but in days :

When checking /var/log/syslog the only alert I have that could be cause or a consequence of this IO Delay is RRDCached.
So the only way to get past this problem was to do the following :
systemctl stop rrdcached.service
rm -rf /var/lib/rrdcached/db/pv2-storage/*
rm -rf /var/lib/rrdcached/db/pv2-vm/*
rm -rf /var/lib/rrdcached/db/pv2-node/*
rm -rf /var/lib/rrdcached/journal/

It have already happened only twice in the past (10month ago and 8 month ago), but now in just 8 days I had do this "fix" 5 times already !

Here is what I get with the command (only rrdcached error ...) :
zgrep --color=always "error\|fail\|crit" /var/log/syslog | ccze -A

And here is what syslog show me for (full syslog before "rrdcached" errors happened) :
zgrep --color=always "20 20:3" /var/log/syslog | ccze -A

Everytime this error happen all my VM are slowed to the point to be unusuable...
So I'm starting to get concerned about having some real issues on my configuration...

Here are some information on the current configuration :
Standalone Proxmox server (so I doubt there is any quorum problem involved)
VMs running on it:
-1 windows
-6 linux

pveversion :

pveperf :

iostat -x 2 5 (first result / boot result) :

iostat -x 2 5 :

Two "iotop" result with 10s appart ... -> "kworker" come and leave fast .... even if at 99.99% :

dark.mania · Nov 20, 2018

top :

cat /etc/network/interfaces :

Marshalleq · Nov 21, 2018

You're lucky I'm getting IO delay of around 80% since moving to ZFS. It's a nightmare!

Lucio Magini · Nov 21, 2018

Your pvperf show an orrible 0.33 fsyncs/second.
value should be greater than 200...

mailinglists · Nov 21, 2018

If you are having high IO because of ZFS, just buy a small cheap enterprise grade ssd for < 100 $ and use it as cache.
Also check your scheduler for disks, it might help to use deadline when they are congested.

dark.mania · Nov 21, 2018

mailinglists said:
If you are having high IO because of ZFS, just buy a small cheap enterprise grade ssd for < 100 $ and use it as cache.
Also check your scheduler for disks, it might help to use deadline when they are congested.

I don't remember using ZFS (didn't install promox 5 ZFS at least), so I don't think it would impact my server / VM in any way (why would that happen now [this week + randomly] and not everyday ?).
Here is a screen shot for pveperf (executed two time in row), i had to fix / reboot/ restart all VM last night, so the "load" (clients loading websites on it) is higher right now than it was yesterday when IO delay started.... (when "normal", the server got a higher "buffered reads" and "fsyncs/second"):

I am currently checking all HDD (raid 5) to see if one of them is dying (maybe you have some tips other than using smartctl ?).

Oh and the server is from OVH and it has to be alive at anytime (clients websites are running on it) so I can't change it's hardware configuration as I wish (other than asking for disk change in case of a dead drive).

I'm sorry in advance for my lack of knowledge, but I don't know anything about your last line...
I'll check / search online for that, but if you could guide me or already have a tutorial somewhere ?
Here is what I have found on my server about it so far.
All five disk (/dev/sda to /dev/sde) have this :

Should I change that to "noop deadline [cfq]" ? (as it's already deadline that is choosen..)

I based my search on this :
https://www.cyberciti.biz/faq/linux-change-io-scheduler-for-harddisk/

dark.mania · Nov 21, 2018

After checking all disk, they don't seem to have any problem, here is smartctl for each disk after doing the following command :
smartctl -t short -d scsi /dev/sda

Only /dev/sdd show more "non-medium error count" (when writing from what I can see in the table above that line), I don't know much about what it means so it can also be important, tell me if that is the case !

And here is the result for:
mdadm --detail /dev/md4

I don't know what to check now.

The only thing I can think of is to backup the VM and reset the server and move to proxmox 4.X (if that's possible) as it seems that only proxmox 5.X got this kind of IO delay problem...

mailinglists · Nov 21, 2018

Hi again,
i took some time and read your posts in more detail.
It seems you have 5 HDD disks in RAID 5, right?
It seems you are also using software RAID, so you actually installed Debian and later ProxMox on top of it.
Can you show me the details: cat /proc/mdstat ?
Can you show me also lvs, vgs and pvs?

Your setup lacks IOPS and is currently unsuitable for your workload. That is why you have high IO WAIT
The main problem is with RAID 5 which is very, very slow.
While it might have run "decently" in the past, your customers needs are increasing, and there is no more resources that can be used by them.

What I would do, is create a new server, set it up properly this time (maybe hire a competent person, a sys admin by trade) and move all customers VMs to new server. Then decommission the old. If new server is set up correctly, you should not experience slowdowns and also get a nice max IOPS.

mailinglists · Nov 21, 2018

You just posted while I was reading.
To solve your problem you must use not use RAID 5.

dark.mania · Nov 21, 2018

For the installation I used OVH template :
they ask you what template you want to use between the ones they provide (for proxmox there is only one template for 4.X and one for 5.X, not much choice) and then they ask you if you want to use raid and what type.
With 5 disk I thought it was a good idea to set it as a RAID5...

Here is /proc/mdstat :

And lvs / vgs / pvs (didn't knew these commands):

As I said in my previous answer, IO delay seems to happen randomly : like when not much clients are on it (like yesterday)...
For exemple yesterday I started fixing my problem around 23h then from 00h00 (stopped all VM) to 01h00 (start all VM back), I stayed on the server and checked iotop / top / iostat / proxmox graphic and it still had the same high IO delay.
My thought is that a VM may be the cause of the IO delay, but that would be strange as the IO delay continue to stay even with all VM stopped -> no work load on server ...

Other wise why would it work like a charm right now (with load) and go down and there is no one around :

Next time it happens I'll do the same test with more test/screenshot (I'm pretty much sick so I did it sleeply...).

dark.mania · Nov 21, 2018

In case it can help.

Some more tests with pverperf :

/etc/fstab :

/proc/mounts :

Lucio Magini · Nov 21, 2018

Your fsync/second are definitively too low.
Probably this is due to the use of software raid for disks
Good hardware raid controller are provisioned with a write cache that give two orders of magnitude fsync/second
Depending of your workload the lack of fast fsync response can produce high I/O delay
Practically your VMs try to write in sync mode all togheter to obtain data persistance and this generate contention on phisical disk.
Usecases that use fsync write semantic are email server, for example, but also Database workload.
You can mitigate the problem using a writeback or unsafe cache qemu disk cache setting BUT be aware that this is an UNSAFE mode in case of power failures.
Your VM disk can be corrupted in case of a sudden lost of power (aka: use a good UPS to shoutdown gently VMs in case)

mailinglists · Nov 21, 2018

I will write it one last time.
Your problems are related to lack of IOPS.
Lack of IOPS is because of RAID 5 with HDDs.
Even hardware RAID 5 with cache would probably become to slow for your use case.
Use RAID 10.

Gambling with your customers data by lying about data being written to the VMs or applications running on hypervisor seems unacceptable to me, so I did not even suggest it. But hey, it is your business, do what you like. In that case, you can also disable barriers on ext filesystem. IOPS will increase dramatically.

On a personal note, taking into account all that was written in this thread, i would kindly suggest you stop playing system administrator when doing business and hire a real one.

dark.mania · Nov 21, 2018

Well after reading Lucio Magini I was checking RAID5 / RAID10 cost on iops...

So from what you say that should be the best solution for me (even if it does reduce the total HD memory available).

And yes we are already considering hiring a administrator for this job, but before that I prefered to know what wrong and where at least.

I'll check it all for the rest of the week and write there some time later for those reading this thread in the future.

So for now my problem is "pending" with a solution.

Thanks a lot for the help !

guletz · Nov 21, 2018

dark.mania said:
Well after reading Lucio Magini I was checking RAID5 / RAID10 cost on iops...

Hi,

iops is only one problem.... any raid5 have the same iops like for one hdd. A raid10 is better for iops (is like 2 x hdd iops if you will use 4 hdd).
The bigest problem that I see is the fact that yours hdd are very large (6 Tb). You must test the time need in case of one hdd will be broken and you replace with a new one. It could take many days ... If in this time another disk will be broken (raid5 or raid10 from the same mirror ) -> end of the game !
Also with mdraid (raid10 or whatever) it is highly possible to have corrupted data, and then you be aware about this it will be too late (as I seen myself on smaller hdd with raid1), because with mdraid you do not have any checks that will tell you that What You Write is the same on What You Read.
Read your hdd specifications (regarding of unrecover/fail readings blocks / TB), and you will find the optimistic value from hdd maker point of view! Make your math and judge yourself ... after how many weeks / month you can lose some data.

dark.mania said:
So from what you say that should be the best solution for me (even if it does reduce the total HD memory available).

Mybe yes, mybe not. Without any info about what do you run in yours vm, nobody can say what is the best for your case.

dark.mania said:
Well after reading Lucio Magini I was checking RAID5 / RAID10 cost on iops...

So from what you say that should be the best solution for me (even if it does reduce the total HD memory available).

And yes we are already considering hiring a administrator for this job, but before that I prefered to know what wrong and where at least.

I'll check it all for the rest of the week and write there some time later for those reading this thread in the future.

So for now my problem is "pending" with a solution.

Thanks a lot for the help !

dark.mania said:
And yes we are already considering hiring a administrator for this job, but before that I prefered to know what wrong and where at least.

The best option is to hire a administrator as other guy already say. It is like I have a problem on my car with brakes! If I do not have knowledge about cars, the best I can do, for my safety it to go to a guy who understand and have knowledge about cars/mecanics

Sorry, realy I do not want to upset you with my long response.

Have a nice day !

dark.mania · Nov 21, 2018

No problem with the long answer (as if mine wasn't).

Just if you want to know more about VM, currently there is :
6 linux with each 1GB ram, 1socket/1core and 5GB to 250GB disk (most of them running at less than 10% CPU and 600MB of ram used)
1 windows 2016 server with 6GB ram, 4sockets/2cores and 3 drives (70GB, 200GB and 1TB) (using less then 5% CPU and 2GB of ram)

Today server IO delay was around 0.6% - 0.9% all day.

Lucio Magini · Nov 22, 2018

dark.mania said:
For the installation I used OVH template :
they ask you what template you want to use between the ones they provide (for proxmox there is only one template for 4.X and one for 5.X, not much choice) and then they ask you if you want to use raid and what type.
With 5 disk I thought it was a good idea to set it as a RAID5...

Here is /proc/mdstat :
View attachment 8772

And lvs / vgs / pvs (didn't knew these commands):
View attachment 8774

As I said in my previous answer, IO delay seems to happen randomly : like when not much clients are on it (like yesterday)...
For exemple yesterday I started fixing my problem around 23h then from 00h00 (stopped all VM) to 01h00 (start all VM back), I stayed on the server and checked iotop / top / iostat / proxmox graphic and it still had the same high IO delay.
My thought is that a VM may be the cause of the IO delay, but that would be strange as the IO delay continue to stay even with all VM stopped -> no work load on server ...

Other wise why would it work like a charm right now (with load) and go down and there is no one around :
View attachment 8775

Next time it happens I'll do the same test with more test/screenshot (I'm pretty much sick so I did it sleeply...).

On a local storage some iops are also produced from the host, not only from guest VM.
So you can have I/O delay also with VM in stop mode.
With a Fsync value of 0.33 there no room for any I/O (the value means your system can write on disk one time every 3 seconds(!!)
With a Fsync value of 40 you have about 25% of the normal I/O capacity of a Pc with a single mechanic sata disk (Usually about 200 iops/sec)

Search

Search

IO delay randomly appearing, proxmox 5.1-36

dark.mania

New Member

dark.mania

New Member

Marshalleq

Active Member

Lucio Magini

Active Member

mailinglists

Renowned Member

dark.mania

New Member

dark.mania

New Member

mailinglists

Renowned Member

mailinglists

Renowned Member

dark.mania

New Member

dark.mania

New Member

Lucio Magini

Active Member

mailinglists

Renowned Member

dark.mania

New Member

guletz

Distinguished Member

dark.mania

New Member

Lucio Magini

Active Member

We value your privacy