Urgent: Ceph Help Needed

ejmerkel · Sep 21, 2015

Hello,

We have a 3 node proxmox/ceph cluster. This morning on one node all of the OSD were marked down/out. Right now all the VM's are up but the IO is getting killed by the recovery of the OSD's and they are basically not responding.

I have added the following in /etc/pve/ceph.conf

Code:

osd max backfills = 1
osd recovery max active = 1

How do I make this active to lessen the IO of the rebuild?

We already had one OSD out but why would all of the OSD's on one server get marked down/out at once? Here is all I see in the logs.

Code:

2015-09-21 7:00:54.877290 mon.0 10.0.3.11:6789/0 9323720 : [INF] pgmap v12130886: 1536 pgs: 1431 active+clean, 77 active+degraded, 28 active+remapped; 3394 GB data, 9949 GB used, 53353 GB / 63302 GB avail; 1635 kB/s rd, 222 kB/s wr, 78 op/s; 63933/2613351 objects degraded (2.446%)
2015-09-21 07:03:54.142139 mon.1 10.0.3.12:6789/0 3973 : [INF] pgmap v12130959: 1536 pgs: 1518 active+degraded, 18 active+remapped; 3394 GB data, 9949 GB used, 53353 GB / 63302 GB avail; 16683 kB/s rd, 1012 kB/s wr, 868 op/s; 870311/2613351 objects degraded (33.302%)
2015-09-21 07:03:55.670321 mon.1 10.0.3.12:6789/0 3974 : [INF] mon.1 calling new monitor election
2015-09-21 07:03:55.673390 mon.0 10.0.3.11:6789/0 9323722 : [INF] mon.0 calling new monitor election
2015-09-21 07:03:55.675321 mon.0 10.0.3.11:6789/0 9323723 : [INF] mon.0 calling new monitor election
2015-09-21 07:03:55.893161 mon.0 10.0.3.11:6789/0 9323724 : [INF] mon.0@0 won leader election with quorum 0,1,2
2015-09-21 07:03:56.231606 mon.0 10.0.3.11:6789/0 9323725 : [INF] monmap e3: 3 mons at {0=10.0.3.11:6789/0,1=10.0.3.12:6789/0,2=10.0.3.13:6789/0}
2015-09-21 07:03:56.231748 mon.0 10.0.3.11:6789/0 9323726 : [INF] pgmap v12130959: 1536 pgs: 1518 active+degraded, 18 active+remapped; 3394 GB data, 9949 GB used, 53353 GB / 63302 GB avail; 870311/2613351 objects degraded (33.302%)
2015-09-21 07:03:56.232198 mon.0 10.0.3.11:6789/0 9323727 : [INF] mdsmap e1: 0/0/1 up
2015-09-21 07:03:56.251357 mon.0 10.0.3.11:6789/0 9323728 : [INF] osdmap e1155: 18 osds: 12 up, 17 in

My first priority is how to lesson the recovery so the VM's will become responsive again. Thanks in advance for any advice or help!

Best regards,
Eric

spirit · Sep 21, 2015

you can change running config with "ceph tell osd.x"

Code:

>>[I] ceph tell osd.* injectargs '--osd-max-backfills 1'
[/I]>[I] ceph tell osd.* injectargs '--osd-max-recovery-threads 1'
[/I]>[I] ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
[/I]>[I] ceph tell osd.* injectargs '--osd-client-op-priority 63'
[/I]>[I] ceph tell osd.* injectargs '--osd-recovery-max-active 1'

[/I]

ejmerkel · Sep 21, 2015

spirit said:

you can change running config with "ceph tell osd.x"

Code:

>>[I] ceph tell osd.* injectargs '--osd-max-backfills 1'
[/I]>[I] ceph tell osd.* injectargs '--osd-max-recovery-threads 1'
[/I]>[I] ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
[/I]>[I] ceph tell osd.* injectargs '--osd-client-op-priority 63'
[/I]>[I] ceph tell osd.* injectargs '--osd-recovery-max-active 1'

[/I]

Thank you that seemed to help. I have a question in regards to "--osd-client-op-priority 63" isn't that already the default? I suppose you were just wanting to make sure they were set correctly?

Eric

udo · Sep 21, 2015

ejmerkel said:
Thank you that seemed to help. I have a question in regards to "--osd-client-op-priority 63" isn't that already the default? I suppose you were just wanting to make sure they were set correctly?

Eric

Hi,
yes it's the default. You can control your values with

Code:

ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep priority

Udo

BTW. Perhaps it's much faster to bring the OSDs on the node back to life and set before an "ceph osd set noout" to stop an rebuild to the other disks...

Search

Search

Urgent: Ceph Help Needed

ejmerkel

Renowned Member

spirit

Distinguished Member

ejmerkel

Renowned Member

udo

Distinguished Member

We value your privacy