ceph osd apply latency is high

Edwin Ye

New Member
Mar 30, 2019
2
0
1
40
Hello Sirs.

Has there anyone encountered the same issue as mine?

I found one of OSDs in our production proxmox CEPH cluster environment which had high apply latency(around 500ms.)
It caused our CEPH cluster performance to degrade. After I restarted the OSD, the cluster performance is back to normal.

Why does one OSD with high apply latency will cause a whole ceph cluster performance to degrade?
How to fix this issue, please?
If I need to monitor all OSDs apply latency, how many milliseconds will be a best practice threshold?

Thank you in advance.

Edwin.
 
Why does one OSD with high apply latency will cause a whole ceph cluster performance to degrade?
As Ceph is a distributed storage, alle Ceph services are connected with eachother and the weakest link will determine the performance of the cluster.

How to fix this issue, please?
This needs monitoring and checking the health of all involved subsystems.

If I need to monitor all OSDs apply latency, how many milliseconds will be a best practice threshold?
As low as possible. This depends on your hardware and performance requirements. To get performance counters, see the link.
https://access.redhat.com/documenta...tml/administration_guide/performance_counters