ceph firefly ceph.conf thread

RobFantini

Famous Member
May 24, 2012
2,009
102
133
Boston,Mass
Lowering Ceph scrub I/O priority

using version 0.80.8 +
The disk I/O of a Ceph OSD thread scrubbing is the same as all other threads by default. It can be lowered with ioprio options for all OSDs with:

*in ceph.conf
Code:
        osd_disk_thread_ioprio_class  = idle
        osd_disk_thread_ioprio_priority = 7
*set now:
Code:
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'

see per http://dachary.org/?p=3268
 
Last edited:
HEALTH_WARN_clock skew detected

I had that occur when network load was high..... I think it was due to multiple nodes backing up at the same time. We backup one node at a time, and made this adjustment to ceph.conf:

Code:
mon clock drift allowed = 1

I found the suggestion here
http://wiki.deimos.fr/Ceph_:_perfor...lution#health_HEALTH_WARN_clock_skew_detected

Is there any drawback to setting it?

We have used it for 6 months and it seems to have helped settle the cluster.
 
Last edited:
Lowering Ceph scrub I/O priority

using version 0.80.8 +
The disk I/O of a Ceph OSD thread scrubbing is the same as all other threads by default. It can be lowered with ioprio options for all OSDs with:

*in ceph.conf
Code:
        osd_disk_thread_ioprio_class  = idle
        osd_disk_thread_ioprio_priority = 7
*set now:
Code:
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 7'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'

see per http://dachary.org/?p=3268

Hi,
but you must have enabled the cfq-scheduler for this. OK, this is also commented on Loics Blog.

Udo
 
Hi Rob,yes cfq is the default, but for io-related things (esp. with SSD) many people use the deadline scheduler. But I have switched back to cfq, to see if the ioprio settings work well.Udo
Udo, what's the matter to use CFQ if it performs worse (in terms of overal IOPs performance) than dedaline or noop?
 
Hi,
with the cfq scheduler, the normal read-IO can get an higher prio than the scrub-process-IO. If this work, the IO with cfq-scheduler should be better than with the others...
cfq with proititizing may be better than without it, but for most high-bandwidth workloads they both are worse than deadline. I added elevator=deadline to my PVE's kernel cmdline instead of playing with cfq priorities.
 
here is another set of settings we've used a few months:

* ceph.conf :
Code:
        osd_max_backfills = 1
      # osd-max-recovery-threads = 1
        osd-recovery-op-priority = 1
        osd-client-op-priority = 63
        osd-recovery-max-active = 1

*cli appy now
Code:
ceph tell osd.* injectargs '--osd-max-backfills 1'
#  
#  this does not work, results in :  osd.0:  failed to parse arguments: --osd-max-recovery-threads,1
# ceph tell osd.* injectargs '--osd-max-recovery-threads 1'  
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'
ceph tell osd.* injectargs '--osd-recovery-max-active 1'

added those from reading some of Udo's posts and the following :

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028926.html -
" During recovery/backfill, the entire cluster suffers degraded performancebecause of the IO storm that backfills cause. Client IO becomes extremely
latent. I've tried to decrease the impact that recovery/backfill has with
the following:
"

That user has journals on spinning disks..

Any comments those settings?
 
Last edited:
here is another set of settings we've used a few months:

* ceph.conf :
Code:
        osd_max_backfills = 1
      # osd-max-recovery-threads = 1
        osd-recovery-op-priority = 1
        osd-client-op-priority = 63
        osd-recovery-max-active = 1

*cli appy now
Code:
ceph tell osd.* injectargs '--osd-max-backfills 1'
#  
#  this does not work, results in :  osd.0:  failed to parse arguments: --osd-max-recovery-threads,1
# ceph tell osd.* injectargs '--osd-max-recovery-threads 1'  
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'
ceph tell osd.* injectargs '--osd-recovery-max-active 1'

added those from reading some of Udo's posts and the following :

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/028926.html -
" During recovery/backfill, the entire cluster suffers degraded performancebecause of the IO storm that backfills cause. Client IO becomes extremely
latent. I've tried to decrease the impact that recovery/backfill has with
the following:
"

That user has journals on spinning disks..

Any comments those settings?
Hi Rob,
osd-max-recovery-threads don't work, because the option are osd_recovery_threads .
You can easily find the option with the admin socket:
Code:
ceph --admin-daemon /var/run/ceph/ceph-osd.11.asok config show | grep recovery
  "osd_min_recovery_priority": "0",
  "osd_recovery_threads": "1",
  "osd_recovery_thread_timeout": "30",
  "osd_recovery_delay_start": "0",
  "osd_recovery_max_active": "5",
  "osd_recovery_max_single_start": "5",
  "osd_recovery_max_chunk": "8388608",
  "osd_recovery_forget_lost_objects": "false",
  "osd_recovery_op_priority": "10",
  "osd_recovery_op_warn_multiple": "16",
I just fill the seventh node now, and to get a little bit more recovery-performance I switched osd_recovery_max_active fron 1 to 5 and osd_max_backfills to 2.
osd_client_op_priority 63 is default.

Udo
 
I finished adding OSDs to our 6th node last night. I must say i am happy to see the little performance improvement. Each node can take upto 12 OSDs but we are only using 4 OSDs in each for now. Planned to grow to full capacity by end of this year. I have been using max backfills 1 and max recovery active 1. So far looks good. I tested by adding whole 2TB OSD without stepping up. My lab rat Users did not seem to notice anything. :)
Right now we are still converting all XFS OSDs to ext4. Here are the config from this cluster:
Code:
         mon osd full ratio = .90
         mon osd nearfull ratio = 0.75
         osd client op priority = 60
         osd disk threads = 4
         osd max backfills = 1
         osd max scrubs = 1
         osd op threads = 8
         osd pool default min size = 1
         osd pool default pg num = 512
         osd pool default pgp num = 512
         osd pool default size = 3
         osd recovery max active = 1
         osd recovery op priority = 1
 
Hi Rob,
osd-max-recovery-threads don't work, because the option are osd_recovery_threads .
You can easily find the option with the admin socket:
Code:
ceph --admin-daemon /var/run/ceph/ceph-osd.11.asok config show | grep recovery
  "osd_min_recovery_priority": "0",
  "osd_recovery_threads": "1",
  "osd_recovery_thread_timeout": "30",
  "osd_recovery_delay_start": "0",
  "osd_recovery_max_active": "5",
  "osd_recovery_max_single_start": "5",
  "osd_recovery_max_chunk": "8388608",
  "osd_recovery_forget_lost_objects": "false",
  "osd_recovery_op_priority": "10",
  "osd_recovery_op_warn_multiple": "16",
I just fill the seventh node now, and to get a little bit more recovery-performance I switched osd_recovery_max_active fron 1 to 5 and osd_max_backfills to 2.
osd_client_op_priority 63 is default.

Udo

Thank you Udo.

Also - have you noticed any benefit or not from osd_disk_thread_ioprio_* ?
 
Thank you Udo.

Also - have you noticed any benefit or not from osd_disk_thread_ioprio_* ?
Hi Rob,
it's feel good, but it's only a feeling, because deep-scrubs are done not the whole time and it's hard to say if the IO-opparation will also use the same PG (or OSD) which currently is scrubbing.

Real measurement is not easy, but the values during rados bench are closer together now.

Udo
 
I finished adding OSDs to our 6th node last night. I must say i am happy to see the little performance improvement. Each node can take upto 12 OSDs but we are only using 4 OSDs in each for now. Planned to grow to full capacity by end of this year. I have been using max backfills 1 and max recovery active 1. So far looks good. I tested by adding whole 2TB OSD without stepping up. My lab rat Users did not seem to notice anything. :)
Right now we are still converting all XFS OSDs to ext4. Here are the config from this cluster:
Code:
         mon osd full ratio = .90
         mon osd nearfull ratio = 0.75
         osd client op priority = 60
         osd disk threads = 4
         osd max backfills = 1
         osd max scrubs = 1
         osd op threads = 8
         osd pool default min size = 1
         osd pool default pg num = 512
         osd pool default pgp num = 512
         osd pool default size = 3
         osd recovery max active = 1
         osd recovery op priority = 1

Hello there

I wonder how your settings have evolved from that. Between Udo and yourself there have been some good suggestions.

I've a 8 node ceph test cluster [ plenty of good spare hardware to use ] and am starting to tune osd .

we've an all ssd set up.

best regards
Rob Fantini
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!