latency on ceph ssd

lapapa

Member
Jul 26, 2021
9
0
6
54
Hi,

i'm trying to lower write latency on ceph ssd without success.

env: small prod site, v 7.1.10.3 pc, 1Gpbs. 13 ceph entry ssd with different sizes, from 200GB do 1TB, older ones. 10 vm linux, not IO demanding. I had similar problem with elastic and hdd, default options was to many threads. After lowering number of threads from 10 to 1 all fine. Atop shows 118 threads for 1 osd, to many ? how to lower that number ? I've tried with lowering threads on osd but nothing.
Through day work is done without: scrub, rebalance.

picture with very small IO but with lots of weights and ssd is 100% busy. Not always but with any IO demand all became very slow for all vm.

Please help

1718275995212.png
1718277122207.png
1718275283442.png
 
Last edited:
Hi,

ssd and hdd:
sda KINGSTON_SM2280S3G2120G
sdb TOSHIBA_HDWD130
sdc Samsung_SSD_870_EVO_500GB
sdd KINGSTON_SM2280S3G2120G
sde CT240BX500SSD1
sda TOSHIBA_HDWD120
sdb KINGSTON_SA400S37240G
sdc KINGSTON_SA400S37240G
sdd KINGSTON_SH103S3240G
sde SPCC_Solid_State_Disk
sdf WDC_WD40EZRZ-00GXCB0
sda INTEL_SSDSC2CT060A3
sdb TOSHIBA_HDWD120
sdc CT1000BX500SSD1
sdd CT1000BX500SSD1
sde Samsung_SSD_860_EVO_1TB
 
update,

i've cloned 3 vm from ssd to hdd pool and situation it's much better now. Stress test is writing around 60MB/s to disk and through network (1Gpbs). No more slowdons, latency on hdd goes to max 300ms. I'm planning to change ssd device-class from ssd to hdd for now. Any other thoughts ?
 
I went through the list. I see two problems with your SSDs:
- All the SSDs have no Power-Loss-Protection (PLP). Without PLP all writes have to be completed to report it as finished. This slows Ceph writes down. With PLP the drive could report that the data is written faster - while it is still writing. This gets you less SSD degradation, too.
- Some SSDs are using QLC cells. If you are using QLC, the data will be written in an SLC cache. If this cache is full, write speed will degrade down to something like 10 MB/s. This will bottleneck writes on Ceph, as all drives have to report finished writes.

sda KINGSTON_SM2280S3G2120G: no PLP
sdb TOSHIBA_HDWD130: HDD
sdc Samsung_SSD_870_EVO_500GB: no PLP
sdd KINGSTON_SM2280S3G2120G: no PLP
sde CT240BX500SSD1: no PLP
sda TOSHIBA_HDWD120: HDD
sdb KINGSTON_SA400S37240G: no PLP, QLC
sdc KINGSTON_SA400S37240G: no PLP, QLC
sdd KINGSTON_SH103S3240G: no PLP
sde SPCC_Solid_State_Disk
sdf WDC_WD40EZRZ-00GXCB0: HDD
sda INTEL_SSDSC2CT060A3: no PLP
sdb TOSHIBA_HDWD120: HDD
sdc CT1000BX500SSD1: no PLP, QLC
sdd CT1000BX500SSD1: no PLP, QLC
sde Samsung_SSD_860_EVO_1TB: no PLP
 
thanks for info.
i don't understand how could hdd be faster and with less latency than ssd ? Now we have 5 prod vm on just 4 hdd in 3 pc and perf is ok. Is there no way to tell ceph to use ssd like hdd, with less threads and ... ?

This is small site with very low IO and small budget
 
tell ceph to use ssd like hdd
It's just that these are SSD consumer drives without PLP. These are slow if you force them to write the data (sync writes). You'll find many threads about this on the forum.

Maybe you could buy small enterprise SSDs with PLP and put the WAL+DB of the HDD OSDs on it. This should boost the HDD performance a bit, but I don't know how much.
 
perf is now acceptable without those 1TB ssd:
CT1000BX500SSD1

CT1000BX500SSD1
SPCC_Solid_State_Disk


only one 1TB ssd is now in ceph and is doing fine, latency bellow 30 ms:
Samsung_SSD_860_EVO_1TB

this one also does not have PLP, what is the secret ?
 
thanks for info.
i don't understand how could hdd be faster and with less latency than ssd ? Now we have 5 prod vm on just 4 hdd in 3 pc and perf is ok. Is there no way to tell ceph to use ssd like hdd, with less threads and ... ?
see this old blog:

https://www.sebastien-han.fr/blog/2...-if-your-ssd-is-suitable-as-a-journal-device/

The main reason is sync write, without plp, the ssd need to write directly to a full nand.

for example, you need to write 4k, it'll write a full nand of 32MB (size of nand depend of the ssd, mlc,tlc,qlc , model ...).

That's overamplification. that's why you'll have 200iops vs 10000~20000 iops for an ssd with a PLP.

with a plp, the sync write are keep in ssd buffer, and when it's full, a full nand is writen, only once.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!