Moving OSD's from 1 node to another

cwilliams · Feb 21, 2018

Hi,

I was attempting to rebalance my 3 node cluster by balancing the amount of space between my nodes. I had 14 600GB drives on 1 host and 14 146GB drives on another, with 6 600GB drives on the 3rd host.

I pulled 7 drives from host 1 and 7 drives from host 2, down out and destroyed. When I put them into their new hosts I am having an issue getting them added back in.

In this instance I am trying to add osd 7 after having just added osd 6 to this node.

I have removed the auth and the osd's using:

ceph auth del osd.7
ceph osd rm ods.7

Now when I try to add a new osd I receive an error. The process I follow is:

ceph-disk zap /dev/sdX
ceph-disk prepare --bluestore /dev/sdX --osd-id 7 --osd-uuid XXXXxxxxXXX
ceph-disk activate /dev/sdX1

The error I get is:
command_with_stdin: Error EEXIST: entity osd.6 exists but key does not match

mount_activate: Failed to activate
'['ceph', '--cluster', 'ceph', '--name', 'client.bootstrap-osd', '--keyring', '/var/lib/ceph/bootstrap-osd/ceph.keyring', '-i', '-', 'osd', 'new', u'XXXXxxxxXXX']' failed with status code 17

osd.6 does exist, because I created it a few minutes prior to this, so I don't know why ceph-disk is trying to make this new disk use osd.6, especially when I have assigned it the osd-id of 7.

I am only able to add 1 osd back to this node, any others that I try just fail. I would like to learn from this and not just revert to reinstalling ceph altogether. This should be the same process that is followed to replace a failed osd

Is this not the correct way to add new osd's?

I have googled extensively, but if anyone has any insight or has had this issue before I would really appreciate the help.

I'm using Proxmox 5.1 and ceph 12.2.2

Thanks

Jarek · Feb 22, 2018

1. When you move an osd from one node to another, you don't need to destroy and create it. It will be automagically discovered and added.
2. To remove an osd from cluster, use

Code:

ceph osd purge {id} --yes-i-really-mean-it

3. In ceph commands you should use numeric osd id, i.e.

Code:

ceph auth del 7

instead of

Code:

ceph auth del osd.7

The 'osd.X' identifier is used only in osd crush commands.
4. Please post your crushmap and ceph osd tree.

cwilliams · Feb 22, 2018

Sure thing, Here is the CRUSH map from the configuration panel

Code:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 device7
device 8 device8
device 9 device9
device 10 device10
device 11 device11
device 12 device12
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 device19
device 20 device20
device 21 device21
device 22 device22
device 23 device23
device 24 device24
device 25 device25
device 26 device26
device 27 osd.27 class hdd
device 28 osd.28 class hdd
device 29 osd.29 class hdd
device 30 osd.30 class hdd
device 31 osd.31 class hdd
device 32 osd.32 class hdd
device 33 osd.33 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host prox1 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 3.274
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 0.546
    item osd.1 weight 0.546
    item osd.2 weight 0.546
    item osd.3 weight 0.546
    item osd.4 weight 0.546
    item osd.5 weight 0.546
}
host prox2 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 1.346
    alg straw2
    hash 0    # rjenkins1
    item osd.13 weight 0.133
    item osd.14 weight 0.133
    item osd.15 weight 0.133
    item osd.16 weight 0.133
    item osd.17 weight 0.133
    item osd.18 weight 0.133
    item osd.6 weight 0.546
}
host prox3 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 3.820
    alg straw2
    hash 0    # rjenkins1
    item osd.27 weight 0.546
    item osd.28 weight 0.546
    item osd.29 weight 0.546
    item osd.30 weight 0.546
    item osd.31 weight 0.546
    item osd.32 weight 0.546
    item osd.33 weight 0.546
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 8.440
    alg straw2
    hash 0    # rjenkins1
    item prox1 weight 3.274
    item prox2 weight 1.346
    item prox3 weight 3.820
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

Server View
Datacenter
prox1
prox2
prox3
Logs

Here is the osd tree

Code:

ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       8.43994 root default                          
-3       3.27411     host prox1                        
 0   hdd 0.54568         osd.0      up  1.00000 1.00000
 1   hdd 0.54568         osd.1      up  1.00000 1.00000
 2   hdd 0.54568         osd.2      up  1.00000 1.00000
 3   hdd 0.54568         osd.3      up  1.00000 1.00000
 4   hdd 0.54568         osd.4      up  1.00000 1.00000
 5   hdd 0.54568         osd.5      up  1.00000 1.00000
-5       1.34604     host prox2                        
 6   hdd 0.54568         osd.6      up  1.00000 1.00000
13   hdd 0.13339         osd.13     up  1.00000 1.00000
14   hdd 0.13339         osd.14     up  1.00000 1.00000
15   hdd 0.13339         osd.15     up  1.00000 1.00000
16   hdd 0.13339         osd.16     up  1.00000 1.00000
17   hdd 0.13339         osd.17     up  1.00000 1.00000
18   hdd 0.13339         osd.18     up  1.00000 1.00000
-7       3.81979     host prox3                        
27   hdd 0.54568         osd.27     up  1.00000 1.00000
28   hdd 0.54568         osd.28     up  1.00000 1.00000
29   hdd 0.54568         osd.29     up  1.00000 1.00000
30   hdd 0.54568         osd.30     up  1.00000 1.00000
31   hdd 0.54568         osd.31     up  1.00000 1.00000
32   hdd 0.54568         osd.32     up  1.00000 1.00000
33   hdd 0.54568         osd.33     up  1.00000 1.00000

Thanks!

Jarek · Feb 22, 2018

So the osd.6 is this one taken from prox1 to prox2?
I guess you try to add disk which is already added to ceph as osd.6.

cwilliams · Feb 22, 2018

I swapped 7 disks from Prox2 to Prox3 to balance out the amount of space on each node. OSD 6-12 were taken from Prox2 and added to I Prox3, I believe it was OSD 20-26 on Prox3's side.

I was able to add osd.6 just fine, then when I tried to add the next one it failed. It kept trying to use osd.6 auth key so it seems like it is trying to register osd.7 as osd.6.

Jarek · Feb 22, 2018

Are you sure you using right /dev/sdX device and sdX device isn't mounted?

cwilliams · Feb 22, 2018

Here is my lsblk

Code:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0  68.4G  0 disk
├─sda1   8:1    0  1007K  0 part
├─sda2   8:2    0  68.4G  0 part
└─sda9   8:9    0     8M  0 part
sdb      8:16   0 558.9G  0 disk
├─sdb1   8:17   0   100M  0 part /var/lib/ceph/osd/ceph-6
└─sdb2   8:18   0 558.8G  0 part
sdc      8:32   0 558.9G  0 disk
├─sdc1   8:33   0   100M  0 part
└─sdc2   8:34   0 558.8G  0 part
sdd      8:48   0  68.4G  0 disk
├─sdd1   8:49   0  1007K  0 part
├─sdd2   8:50   0  68.4G  0 part
└─sdd9   8:57   0     8M  0 part
sde      8:64   0 558.9G  0 disk
sdf      8:80   0 558.9G  0 disk
sdg      8:96   0 558.9G  0 disk
sdh      8:112  0 558.9G  0 disk
sdi      8:128  0 136.8G  0 disk
sdj      8:144  0 136.8G  0 disk
├─sdj1   8:145  0   100M  0 part /var/lib/ceph/osd/ceph-13
└─sdj2   8:146  0 136.6G  0 part
sdk      8:160  0 136.8G  0 disk
├─sdk1   8:161  0   100M  0 part /var/lib/ceph/osd/ceph-14
└─sdk2   8:162  0 136.6G  0 part
sdl      8:176  0 136.8G  0 disk
├─sdl1   8:177  0   100M  0 part /var/lib/ceph/osd/ceph-15
└─sdl2   8:178  0 136.6G  0 part
sdm      8:192  0 136.8G  0 disk
├─sdm1   8:193  0   100M  0 part /var/lib/ceph/osd/ceph-16
└─sdm2   8:194  0 136.6G  0 part
sdn      8:208  0 136.8G  0 disk
├─sdn1   8:209  0   100M  0 part /var/lib/ceph/osd/ceph-17
└─sdn2   8:210  0 136.6G  0 part
sdo      8:224  0 136.8G  0 disk
├─sdo1   8:225  0   100M  0 part /var/lib/ceph/osd/ceph-18
└─sdo2   8:226  0 136.6G  0 part
sdq     65:0    0 558.9G  0 disk
├─sdq1  65:1    0   100M  0 part
└─sdq2  65:2    0 558.8G  0 part
zd0    230:0    0     8G  0 disk [SWAP]

the /dev/sdb was correctly added as osd 6, now when I run through the same steps for /dev/sdc as osd 7 it fails with the above error code.

cwilliams · Feb 23, 2018

I was able to get OSD's 6-19 created on prox2, I rebooted the host and it worked after that. Having the exact same issues on prox3, even after zapping all the fresh drives and rebooting the host.

Search

Search

Moving OSD's from 1 node to another

cwilliams

Member

Jarek

Well-Known Member

cwilliams

Member

Jarek

Well-Known Member

cwilliams

Member

Jarek

Well-Known Member

cwilliams

Member

cwilliams

Member