Hi
Over the last month I've been experiencing random high IO issues on random nodes in my PVE4 cluster.
There is no indication of what VM would be causing this high io as all VM's can be idle at the time of the event.
If I run a IOTop on the VM with high IO the disk read and writes are minimal a few megs W/R on occasion.
What I do notice is that if I run TOP id is sitting at 40-50, although there is no substantial resource use besides idle processes.
To resolve the issue, I need to migrate or shut down a VM or container. But not any VM or container. It will always be a specific one.
What I have noticed is that if I shut down the offending container the issue resolves but if I start the container up again the issue resumes. Only if I restart the Node can I start that specific container back up on that node.
Ill attach some relevant data to this post. If anyone has had a similar problem please let me know. On the storage side I am using CEPH with an Infiniband backbone.
Over the last month I've been experiencing random high IO issues on random nodes in my PVE4 cluster.
There is no indication of what VM would be causing this high io as all VM's can be idle at the time of the event.
If I run a IOTop on the VM with high IO the disk read and writes are minimal a few megs W/R on occasion.
What I do notice is that if I run TOP id is sitting at 40-50, although there is no substantial resource use besides idle processes.
To resolve the issue, I need to migrate or shut down a VM or container. But not any VM or container. It will always be a specific one.
What I have noticed is that if I shut down the offending container the issue resolves but if I start the container up again the issue resumes. Only if I restart the Node can I start that specific container back up on that node.
Ill attach some relevant data to this post. If anyone has had a similar problem please let me know. On the storage side I am using CEPH with an Infiniband backbone.