Random High IO issues

shaunieb · Nov 28, 2016

Hi

Over the last month I've been experiencing random high IO issues on random nodes in my PVE4 cluster.

There is no indication of what VM would be causing this high io as all VM's can be idle at the time of the event.

If I run a IOTop on the VM with high IO the disk read and writes are minimal a few megs W/R on occasion.
What I do notice is that if I run TOP id is sitting at 40-50, although there is no substantial resource use besides idle processes.

To resolve the issue, I need to migrate or shut down a VM or container. But not any VM or container. It will always be a specific one.

What I have noticed is that if I shut down the offending container the issue resolves but if I start the container up again the issue resumes. Only if I restart the Node can I start that specific container back up on that node.

Ill attach some relevant data to this post. If anyone has had a similar problem please let me know. On the storage side I am using CEPH with an Infiniband backbone.

manu · Nov 28, 2016

The spikes here indicating that the processes of the VM/Container were processing data faster than they could read from the storage (Ceph in you case)
It's not indicating a problem per se, it just shows that your workload is more depending on I/O than CPU but
I suggest you run a benchmarking tool inside the VM / Container like bonnie++ so you have numbers to see if your storage match your expectations (a few MB/s causing I/O wait is rather unusual)

shaunieb · Nov 28, 2016

manu said:
The spikes here indicating that the processes of the VM/Container were processing data faster than they could read from the storage (Ceph in you case)
It's not indicating a problem per se, it just shows that your workload is more depending on I/O than CPU but
I suggest you run a benchmarking tool inside the VM / Container like bonnie++ so you have numbers to see if your storage match your expectations (a few MB/s causing I/O wait is rather unusual)

Hi

Thanks for your response.
I do know what IO indicates. The Issue is that there is no indication of where that IO load is coming from. None of the VM's indicate anything but Idle IO. When I migrated the VM's off that node, the IO on the node which I migrate the vms to remains static and the IO I migrated the nodes from drops back to idle. This would indicate that those VM's definitely are not contributing to a high IO workload as the workload would of moved with them.

Search

Search

Random High IO issues

shaunieb

New Member

Attachments

manu

Proxmox Staff Member

shaunieb

New Member

We value your privacy