I've currently got a 4 node cluster running Ceph on Proxmox 5.1 and noticed recently I'm getting a lot of blocked requests due to request_slow.
For example:
2019-02-12 11:47:33 cluster [WRN] Health check failed: 6 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-02-12 11:47:47 cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-02-12 11:47:53 cluster [WRN] Health check update: 2 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-02-12 11:48:03 cluster [INF] Health check cleared: REQUEST_SLOW (was: 2 slow requests are blocked > 32 sec)
There are currently 2 OSD's per node, 4TB each at 7.2K RPM. These use a journal disk with is a NVME SSD drive.
Latency is always shown as 0 for commit and 0-2 for apply.
Any suggestions on ways to investigate what's causing issues? It's causing noticeable performance issues on VM's, particularly Windows Server 2016.
For example:
2019-02-12 11:47:33 cluster [WRN] Health check failed: 6 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-02-12 11:47:47 cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-02-12 11:47:53 cluster [WRN] Health check update: 2 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-02-12 11:48:03 cluster [INF] Health check cleared: REQUEST_SLOW (was: 2 slow requests are blocked > 32 sec)
There are currently 2 OSD's per node, 4TB each at 7.2K RPM. These use a journal disk with is a NVME SSD drive.
Latency is always shown as 0 for commit and 0-2 for apply.
Any suggestions on ways to investigate what's causing issues? It's causing noticeable performance issues on VM's, particularly Windows Server 2016.