Ceph low performance

Mar 11, 2024
15
0
6
Hello.

I'm installing a 3 node cluster with these servers:

1 - DELL r660xs 2 cpu 64 GB RAM boot from HD - 2 x Crucial BX500 4Tb SSD (no raid)
2 - DELL r660xs 2 cpu 196 GB RAM boot from HD - 2 x Crucial BX500 4Tb SSD (no raid)
3 - DELL T440 1 cpu 128 GB RAM boot from SSD - 2 x Crucial BX500 4Tb SSD (no raid)

Network 10Gb (tested )

ceph pool on 6 osd (Crucial BX500 4TB)

I experience hi latency and terrible VM perfomances on ceph.

With 1 windows 2019 server runnig (no activity)

root@pve2:~# ceph osd perf
osd commit_latency(ms) apply_latency(ms)
5 1 1
4 1 1
0 153 153
3 1 1
2 103 103
1 1 1

I suppose that the probem is BX500 and I'm thinkg to replace them with Kingston SEDC600 enterprise SSD but I'd like to be sure before buying 5k euros disks

Do you think that performance pronblem are SSD not enterprise ?


Denis
 
Last edited:
Crucial BX500
Those are on the cheaper and slower side of consumer SSDs. They will not perform well with sustained load and the primarily sync writes that Ceph does.

The recommendation for enterprise SSDs with power loss protection (PLP) is there for good reasons. Performance under sustained continuous load is one of them.
 
Last edited:
  • Like
Reactions: leesteken
Additionally, with just 3 nodes in a ceph cluster, make sure you have at least 4 OSDs in each. Because with just 2 per node, you will likely have issues if one of the OSDs fails. As then Ceph will recover the lost replicas to the only node it can -> the same. So the remaining OSD needs to accomodate the data of the lost one.
This means it is very likely to run full. It is better to have smaller but more OSDs per node. The downside is the additional RAM and CPU usage of those additional OSD service. That's what you need to balance.