IO Usage in ceph

Volker Lieder

Well-Known Member
Nov 6, 2017
54
4
48
45
Hi,
we are using proxmox 5.1-36 and ceph luminous.
Sometimes we see higher io latency on some pgs and if i view iostat -x 1 in shell, i see one hardware drive from ceph with high awaits and %util.
Is it possible to identify the vm which is the cause for the awaits and io usage?

Best regards,
Volker
 
Well, you can find out which client is doing heavy IO but this is more in generell. It sounds to me that there are some disks that are slower then others or are more congested. How does your crushmap and osd tree look like? Are there mixed disks in a root bucket?
 
Hi Alwin,
the usage jumps between different OSDs and everytime i see it, its on all 3 nodes of ceph on one osd per hardware node.
HDDs are all the same. Is it possible to identify io usage inside ceph? On nodes there is only one node which shows round about 15MB/second from proxmox io view, but thats not so much. Or is the view inside proxmox not shown correctly?
The ceph is build over infiniband with 56Gbit, so i think this should not be a bottleneck.

Regards,
Volker
 
There are different perf counters, but I guess they might be more general.
http://docs.ceph.com/docs/luminous/dev/perf_counters/

But with some disks having high wait and utilization, while others don't. Sounds to me, as either the PGs are not evenly distributed (crush map) or that some of the OSDs are starving. Are you using a RAID controller for the OSDs?

EDIT: corrected the link, verion was wrong.