Basic Ceph Improvements

CelticWebs

Member
Mar 14, 2023
75
3
8
I've recently been working in a place that has a 7 Node Cluster with Ceph running on them. The Cluster consists of 3 Computer nodes and 4 storage nodes. The storage nodes have 12 x Enterprise HDD while the computer nodes have some 5 x Enterprise 2.5" SSD in them. So while the 3 compute nodes are the ones that mostly do the running of virtual machines, there are SSD in each of the 3 compute nodes which are in the CEPH cluster.

There are maps setup to create pools with SSD only and HDD only. The nodes are linked via dual 10GB networking on each Node, so data throughput "Should" be decent.

When under loads the CEPH cluster doesn't actually appear to perform as well as I'd expect, I know this is a vague statement but it just feels laggy and not as responsive as I'd expect. I've done some tests with various bits of software and the more you push it with repeated tests, the slower it gets (to a point). So my question is this, what's the best way to test CEPH speed and what would people expect?

I've also attached an image showing the pool setup, I can see one thing I assume could be impacting performance, Meta is on HDD, is this an issue? Would simply editing eth pool and selecting SSD instead of HDD be worthwhile? Would moving It to be on SSD require anything more than just changing the location in the pool settings before doing it? It's a live cluster after all.

I've also attached some tests, you'll see there's not massive difference between the SSD and the HDD
 

Attachments

  • Image 11-03-2024 at 14.09.jpeg
    Image 11-03-2024 at 14.09.jpeg
    59.2 KB · Views: 12
  • Screenshot 2024-01-30 at 01.40.14.png
    Screenshot 2024-01-30 at 01.40.14.png
    95 KB · Views: 12
  • Screenshot 2024-01-30 at 01.40.57.png
    Screenshot 2024-01-30 at 01.40.57.png
    108.4 KB · Views: 11
  • Screenshot 2024-01-30 at 01.40.14.png
    Screenshot 2024-01-30 at 01.40.14.png
    95 KB · Views: 10
  • Screenshot 2024-01-30 at 01.40.57.png
    Screenshot 2024-01-30 at 01.40.57.png
    108.4 KB · Views: 12
HDD-only OSDs are not recommended any more.
The RocksDB part of an OSD creates so much random IO that throughput drops to 15MB/s and is basically unusable.
Always put the RocksDB part of an OSD on SSD or NVMe.

The CephFS metadata pools should always be on an SSD-only pool.
 
Last edited:
HDD-only OSDs are not recommended any more.
The RocksDB part of an OSD creates so much random IO that throughput drops to 15MB/s and is basically unusable.
Always put the RocksDB part of an OSD on SSD or NVMe.

The CephFS metadata pools should always be on an SSD-only pool.
Thanks for the response. Whats the RocksDB part? Also, can this be swapped to SSD on a live system?
 
An OSD uses a RocksDB to manage the objects on the disk. Each object access needs an access into the RocksDB. That is why RocksDB on HDD is not a good idea. HDDs are good for reading and writing larger sequential data. Ceph has a very random IO pattern.

When creating an OSD you can specify a different device (e.g. logical volume) on an SSD for the RocksDB.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!