Periodic I/O Delay Spike Every 5 Minutes

CPU load is constant. The blue spike shows the IO Delay.
The green straight line in the first graph shows CPU Usage. (Or am I wrong?)
You're not wrong, I was looking at the average load and used the wrong term.
However, as I said before there isnt a single CT which has a usage graph that correlates to the IO Delay graph.
I don't see IO Delay graphs for CTs and because of many processes waiting for I/O, it might not show up as high Disk usage (because the wait time lowers it).
Since all processes in CTs do show up in the host, can't you use some standard Linux tool to find out which processes are blocked on I/O every five minutes? I don't know what tool or command to use for this, sorry but I'm sure it must exist.
 
  • Like
Reactions: Moonrocks
You're not wrong, I was looking at the average load and used the wrong term.

I don't see IO Delay graphs for CTs and because of many processes waiting for I/O, it might not show up as high Disk usage (because the wait time lowers it).
Since all processes in CTs do show up in the host, can't you use some standard Linux tool to find out which processes are blocked on I/O every five minutes? I don't know what tool or command to use for this, sorry but I'm sure it must exist.

When the spike happens most tasks gets blocked due to unavailable I/O.
Is there a way to find out “what” is causing the I/O Delay spike rather than whats being blocked?
 
Just a guess out-of-the-blue: any chance you have some monitoring tool installed on each container which updates itself every 5 minutes - effectively in the same second? Perhaps it runs something like "df" every five minutes.

Look for a cron entry like "*/5 * * * * root something" in /etc/cron.d and the other usual places.

If you find something like this you may distribute execution a little bit by assigning "0/5" "1/5" "2/5" "3/5" and "4/5" to different instances.

Good luck
 
Just a guess out-of-the-blue: any chance you have some monitoring tool installed on each container which updates itself every 5 minutes - effectively in the same second? Perhaps it runs something like "df" every five minutes.

Look for a cron entry like "*/5 * * * * root something" in /etc/cron.d and the other usual places.

If you find something like this you may distribute execution a little bit by assigning "0/5" "1/5" "2/5" "3/5" and "4/5" to different instances.

Good luck

One important details I forgot to add was this.

Before I formatted the 5x 8TB HDDs, I had the HDDs setup as ZFS RAIZ-Z2 without using the Hardware Array (Add Storage option disabled), which I had exported via NFS and then added to the DC via NFS, I was seeing 80-90% and then realised that the performance hit was too big so I went ahead and destroyed the ZFS Pool, rebooted the server created a RAID 5, mount on a dir and add to the node as storage. Things seemed to go to normal in the beginning but then started noticing this short spike every 5 mins.

I am seeing the following cron jobs under /etc/cron.d/zfsutils-linux

Code:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# TRIM the first Sunday of every month.
24 0 1-7 * * root if [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi

# Scrub the second Sunday of every month.
24 0 8-14 * * root if [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ]; then /usr/lib/zfs-linux/scrub; fi

not sure if this is relevant.
 
Last edited:
I am seeing the following cron jobs under /etc/cron.d/zfsutils-linux

Code:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# TRIM the first Sunday of every month.
24 0 1-7 * * root if [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi

# Scrub the second Sunday of every month.
24 0 8-14 * * root if [ $(date +\%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/scrub ]; then /usr/lib/zfs-linux/scrub; fi

not sure if this is relevant.
Those are just the default ZFS maintaince tasks and only run once per month.

5x 18TB HDDs in raid5 sounds really terrible. That means you got a 72TB big storage that can only handle ~100 IOPS. Do random 4K reads/writes and you need over 7 years to read or fill that entire array ;). Nothing I would like to run multiple OSs on causing massive small random IO.
 
Last edited:
Update: I migrated the disks for all CTs to the boot disk (1 TB Samsung SSD) mounted as Thin LVM and the I/O delay spike is gone.
 
Last edited:
Update: I migrated the disks for all CTs to the boot disk (1 TB Samsung SSD) mounted as Thin LVM and the I/O delay spike is gone.

Good to know, I think that this could be prove that it has something to-do with the "failed" disk in the other storage pool.
 
Update 2: Replaced all disks, created a new RAID10 array mount as thin-lvm and moved CTs back to the new store. Periodic I/O Spike happening again. The spike seems to be for much shorter period now.
 
Last edited:
Okay so, ran some more tests.
When I try to run yabs on a CT, I/O Loads corresponding to each test.


R+W disk test with 4k block size -> I/O Delay: 0-3%
R+W disk test with 64k block size -> I/O Delay: 10-15%
R+W disk test with 512k block size -> I/O Delay: 95-99%

Looks like the 512k bs tests are completely suffocating the Disk B/W.

Here's a dd for disk speed performed from the same CT:

Code:
$ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.27249 s, 472 MB/s

Now, is this expected behaviour (due to low IOPS) on 6x8TB Enterprise SAS HDDs @ 7200RPM in a H/W RAID 10 mounted as Thin-LVM?
Does this indicate that the periodic load spike is happening because a CT is trying to perform a 512k read/write every 5 minutes or is there something wrong with my drives/array?
 
Last edited:
The smaller your block size of your benchmark, the more IOPS you hit the array with, the higher your IO delay will become. but doesn't make sense to me, that the 64K block size causes more IO delay than the 4K block size.
 
The smaller your block size of your benchmark, the more IOPS you hit the array with, the higher your IO delay will become. but doesn't make sense to me, that the 64K block size causes more IO delay than the 4K block size.

What about disk bw? is it possible that the 512k block size tests are using up all the available bw?
 
On the CT:

fio Disk Speed Tests (Mixed R/W 50/50):
Code:
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 3.33 MB/s      (834) | 53.74 MB/s     (839)
Write      | 3.36 MB/s      (841) | 54.25 MB/s     (847)
Total      | 6.70 MB/s     (1.6k) | 108.00 MB/s   (1.6k)
           |                      |                    
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 89.76 MB/s     (175) | 128.29 MB/s    (125)
Write      | 94.53 MB/s     (184) | 136.84 MB/s    (133)
Total      | 184.29 MB/s    (359) | 265.14 MB/s    (258)


Please note that there are other VMs running too. I've already decided to upgrade to SSDs, I'm trying to evaluate if the drives are still good and I can still use this array for backups.


Also, is there an easy way to limit IOPS on all CTs to a certain value by default? Its impractical for us to limit IOPS on each CT (existing and new) manually.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!