Ceph uses false osd_mclock_max_capacity_iops_ssd value

Noah0302

Member
Jul 21, 2022
42
4
8
Hello everyone,

I recently installed Ceph into my 3 node cluster, which worked out great at first.
But after a while I noticed that the Ceph Pool would sometimes hang and stutter. Thats when I looked into the Configuration and saw this:
1667321193992.png
I use 3 exactly the same SSDs, checked if every node uses Sata 6G and so on. Everything should be working fine, but it seems Ceph thinks OSD 1 is using SATA 3 or something.
There probably is a way to manually adjust it, or let Ceph recalculate the values, but I have not seen it anywhere I looked.

This also happens sometimes, right now I tried to re-add the OSD and see if that works, but now I think I nuked my Ceph Pool:
1667321867982.png
Might this SSD be dead, although the SMART values are OK?

If anyone could help me here, I would be very grateful!
Thanks!
 

Attachments

  • 1667321818269.png
    1667321818269.png
    18 KB · Views: 8
Last edited:
Here is the Ceph Manual. To set a custom mclock iops value, use the following command:

Code:
ceph config set osd.N osd_mclock_max_capacity_iops_[hdd,ssd] <value>



What type of drives are these?
Thank you, that worked! Lets see if it improves performance...

They are cheap 480GB PNY CS900 SSDs with:
IOPS 4K r/w 89k/83k, but barely any cache

I obviously know, that if I want to be serious about Ceph, I should use better SSDs like Samsung Pros or something the like, but this is more testing in my homelab than anything else. I will probably upgrade then down the line, if I have tested Ceph a bit more.
I was just wondering why it was not recognized like the others, since they are exactly the same
 
I benchmark all of my drives multiple times and then set a consistent value for all of the same type across the cluster. The variance between each is easily explained by differences at the time of benchmarking (which occurs automatically when you upgraded or installed Ceph)
 
I benchmark all of my drives multiple times and then set a consistent value for all of the same type across the cluster. The variance between each is easily explained by differences at the time of benchmarking (which occurs automatically when you upgraded or installed Ceph)
Interesting.
Did you leave some buffer or just set the max value? The Docs described some reservation with the max IOPS
 
Same EXACT problem here, never ran into this before, but it's been a few months since i had a drive fail.. somewhere along the lines this came in as defaults.. been PULLING MY HAIR OUT trying to figure out what was going on. Stumbled across the values in the "CONFIGURATION DATABASE" section, spent a couple hours in google, and figured it out.

1672015645673.png

Benchmarked the actual drives, and got MUCH different results than the "automatic" values.

So the HDD drives absolutely can't take 600-800 IOPS:

Code:
{
    "bytes_written": 12288000,
    "blocksize": 4096,
    "elapsed_sec": 33.020831205999997,
    "bytes_per_sec": 372128.73059861764,
    "iops": 90.851740868803134
}

I wrote a script to walk to the database, and set the values, and the cluster is responding MUCH better. (sharing in case it helps others)

Would suggest setting some sort of lower defaults, or ability to tune in the webUI.


Code:
#intel S3500 120G = 11,500 write IOPS (AKA MSD)
#intel S3700 200G = 32,000 write IOPS (AKA SSD)
#seagate 8TB drives = 120 write IOPS (AKA HDD)

ceph osd df > /tmp/osd.txt
grep hdd /tmp/osd.txt | awk '{ print $1}' > /tmp/hdd.list
grep msd /tmp/osd.txt | awk '{ print $1}' > /tmp/msd.list
grep ssd /tmp/osd.txt | awk '{ print $1}' > /tmp/ssd.list

HDDIOP=100
SSDIOP=16000
MSDIOP=5500

echo "Setting $(cat /tmp/hdd.list | wc -l) HDD to ${HDDIOP}"
while read OSD; do
        echo "setting ${HDDIOP} on OSD.${OSD}"
        ceph config set osd.${OSD} osd_mclock_max_capacity_iops_hdd ${HDDIOP}
done </tmp/hdd.list

echo "Setting $(cat /tmp/ssd.list | wc -l) SSD to ${SSDIOP}"
while read OSD; do
        echo "setting ${SSDIOP} on OSD.${OSD}"
        ceph config set osd.${OSD} osd_mclock_max_capacity_iops_ssd ${SSDIOP}
done </tmp/ssd.list

echo "Setting $(cat /tmp/msd.list | wc -l) MSD to ${MSDIOP}"
while read OSD; do
        echo "setting ${MSDIOP} on OSD.${OSD}"
        ceph config set osd.${OSD} osd_mclock_max_capacity_iops_ssd ${HDDIOP}
done </tmp/msd.list

REFS:
** https://docs.ceph.com/en/quincy/rad...ref/#confval-osd_mclock_max_capacity_iops_hdd
** https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!