How to track down IO delay?

starkruzr

Well-Known Member
Hi,

I have two hosts clustered together, one is an i7-3770 and the other is a Xeon E3-1620. Somewhat recently I've had the i7 machine start showing a lot of IO delay in the metrics console, and it would peg at 99% and slow everything to a crawl for a few minutes. I thought this might be the fault of the janky local storage contraption I had on there, which is a 4-disk Vantec hardware RAID USB3 DAS configured in RAID10, but I moved all the volumes running on that machine onto the 10G FreeNAS machine I have running NFS and am still getting large amounts of IO delay. How can I figure out where this is coming from?

Thanks.
 
Now that's an old CPU. Kernel version and PVE version?

Code:
apt install iotop

Run iotop and see if you find a process going for a lot of i/o.

Code:
vmstat 3

How many context switches?
 
Last edited:
This is hard to narrow down, could it be bottle neck on network, I am myself seeing a similar issue on a 4 node cluster with ceph , turning off VMs or moving to other nodes helped but did not eliminated it completely. Still working on figuring it out, will post an update if I find something useful.
 
You're using Ceph, which makes it a whole different story to narrow it down. OP is not using a distributed filesystem and even had this issue with local storage. Haider, you should open a new thread for your problem.
 
You're using Ceph, which makes it a whole different story to narrow it down. OP is not using a distributed filesystem and even had this issue with local storage. Haider, you should open a new thread for your problem.
I will, I was just trying to see what others see.