Ceph Apply/Commit latency too high

LibrarianBear · May 5, 2024

Hello Proxmox Community,

I hope everyone is doing well. I'm encountering significant commit and apply latency in my Ceph cluster setup and would greatly appreciate your insights and advice on diagnosing and resolving this issue.

Setup Overview:

Ceph Version: 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) - Reef (stable)
Proxmox VE Version: 8.2.0
Running Kernel: 6.8.4-2-pve
Cluster Configuration:
- OSDs Distribution:
  - Host "light":
    - OSD.2 (SSD) - Samsung_SSD_870_QVO_4TB - bluestore
  - Host "nasprox":
    - OSD.1 (SSD) - Samsung_SSD_870_QVO_4TB - bluestore
  - Host "slim":
    - OSD.0 (SSD) - Samsung_SSD_870_QVO_4TB - bluestore
- Weights: All OSDs have a weight of 3.63869.
- Status: All OSDs are currently up.

Symptoms and Benchmark Results:

I've been experiencing high commit and apply latency within the Ceph cluster, as evidenced by the following OSD performance metrics obtained from ceph osd perf:

OSD.2 (Host "light"):
- Commit Latency: 478 ms
- Apply Latency: 478 ms
OSD.1 (Host "nasprox"):
- Commit Latency: 434 ms
- Apply Latency: 434 ms
OSD.0 (Host "slim"):
- Commit Latency: 488 ms
- Apply Latency: 488 ms

These latency figures are significantly higher than expected and are impacting the overall performance of my storage infrastructure.

Networking:

I use a dedicated Mikrotik CRS309-1G-8S+ switch(10GBE sfp+) for my Ceph Cluster Network, and a separate CRS309-1G-8S+ for my Public network as well as my PVE cluster network. In some of the nodes I use a dual NIC for those two networks. MTU is set to 9000 on PVE on the Ceph Cluster ports.

Ping tests report an avarage of ~0.150ms between the nodes on the Ceph Cluster network.
iperf reports a full utilization of the bandwidth.

What could be causing the high commit and apply latency(which are the same!) in my Ceph cluster?
Are there specific OSD tuning parameters or configurations I should review or modify to optimize performance?
How can I reduce commit and apply latency to improve overall cluster performance?

gurubert · May 6, 2024

This is the smallest possible Ceph cluster without any room for parallelization.
Each OSD has to participate in every write request.

The Samsung 870 QVO is a QLC SSD with a small "TurboWrite" cache of only a few gigabytes.
As soon as this is full the write performance drops to around 160MB/s. With Ceph's random IO pattern this cache is full most of the time.

TL;DR: do not use consumer grade SSDs for a Ceph cluster.

spirit · May 6, 2024

don't use consumer ssd . (and qvo are the worst drive of all crappy consumer ssd drivers)

you need a ssd/nvme with supercapacitor for ceph && zfs, to handle the fsync. (or at minimum as a wal/journal drive)

ness1602 · May 6, 2024

Or if you already both these samsungs,get smaller enterprise drives and store db/wal on them.

LibrarianBear · May 6, 2024

Thank you all for all of the replies!!! Do you have a recommendation for either a non-wallet-breaking 2TB enterprise SSD? Or perhaps a high quality 256GB one for db/wal?

Which parameters should I monitor on a S.M.A.R.T test for this application? I am asking since it used to run much faster at the beginning...

Search

Search

Ceph Apply/Commit latency too high

LibrarianBear

New Member

gurubert

Famous Member

spirit

Distinguished Member

ness1602

Renowned Member

LibrarianBear

New Member