Ceph Apply/Commit latency too high

LibrarianBear

New Member
May 5, 2024
2
0
1
Hello Proxmox Community,

I hope everyone is doing well. I'm encountering significant commit and apply latency in my Ceph cluster setup and would greatly appreciate your insights and advice on diagnosing and resolving this issue.

Setup Overview:
  • Ceph Version: 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) - Reef (stable)
  • Proxmox VE Version: 8.2.0
  • Running Kernel: 6.8.4-2-pve
  • Cluster Configuration:
    • OSDs Distribution:
      • Host "light":
        • OSD.2 (SSD) - Samsung_SSD_870_QVO_4TB - bluestore
      • Host "nasprox":
        • OSD.1 (SSD) - Samsung_SSD_870_QVO_4TB - bluestore
      • Host "slim":
        • OSD.0 (SSD) - Samsung_SSD_870_QVO_4TB - bluestore
    • Weights: All OSDs have a weight of 3.63869.
    • Status: All OSDs are currently up.
Symptoms and Benchmark Results:

I've been experiencing high commit and apply latency within the Ceph cluster, as evidenced by the following OSD performance metrics obtained from ceph osd perf:

  • OSD.2 (Host "light"):
    • Commit Latency: 478 ms
    • Apply Latency: 478 ms
  • OSD.1 (Host "nasprox"):
    • Commit Latency: 434 ms
    • Apply Latency: 434 ms
  • OSD.0 (Host "slim"):
    • Commit Latency: 488 ms
    • Apply Latency: 488 ms
These latency figures are significantly higher than expected and are impacting the overall performance of my storage infrastructure.

Networking:

I use a dedicated Mikrotik CRS309-1G-8S+ switch(10GBE sfp+) for my Ceph Cluster Network, and a separate CRS309-1G-8S+ for my Public network as well as my PVE cluster network. In some of the nodes I use a dual NIC for those two networks. MTU is set to 9000 on PVE on the Ceph Cluster ports.

Ping tests report an avarage of ~0.150ms between the nodes on the Ceph Cluster network.
iperf reports a full utilization of the bandwidth.


  1. What could be causing the high commit and apply latency(which are the same!) in my Ceph cluster?
  2. Are there specific OSD tuning parameters or configurations I should review or modify to optimize performance?
  3. How can I reduce commit and apply latency to improve overall cluster performance?
 
Last edited:
This is the smallest possible Ceph cluster without any room for parallelization.
Each OSD has to participate in every write request.

The Samsung 870 QVO is a QLC SSD with a small "TurboWrite" cache of only a few gigabytes.
As soon as this is full the write performance drops to around 160MB/s. With Ceph's random IO pattern this cache is full most of the time.

TL;DR: do not use consumer grade SSDs for a Ceph cluster.
 
don't use consumer ssd . (and qvo are the worst drive of all crappy consumer ssd drivers)

you need a ssd/nvme with supercapacitor for ceph && zfs, to handle the fsync. (or at minimum as a wal/journal drive)
 
Thank you all for all of the replies!!! Do you have a recommendation for either a non-wallet-breaking 2TB enterprise SSD? Or perhaps a high quality 256GB one for db/wal?

Which parameters should I monitor on a S.M.A.R.T test for this application? I am asking since it used to run much faster at the beginning...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!