osd issue after adding node to ceph

RobFantini · Jun 1, 2014

Hello
I just added a 4-th node to a ceph cluster.

After creating the osd's in pve, the new nodes OSD's do not show up on the ceph>osd tab. And at cli they are not mounted.

Any clues on how to get the osd's used?

dietmar · Jun 1, 2014

OSD creation worked without errors? I just ask because of this bug fix:

https://git.proxmox.com/?p=pve-manager.git;a=commitdiff;h=008dfe672eab84ffc8231a2df715b27ebef850fe

before that, it was not possible to create an OSD on hosts which do not run a monitor.

RobFantini · Jun 1, 2014

there were no errors when I added the 8 disks to osd

on the pve web page, for the new node , ceph > disks show partitions after I created the osd's, before they did not show as having partitions . prior ro adding the osd's I ran ceph-disk zap on each disk.

wahmed · Jun 1, 2014

whats the output when you run: # ceph osd tree
and #ceph health detail

RobFantini · Jun 1, 2014

symmcom said:
whats the output when you run: # ceph osd tree
and #ceph health detail

Code:

ceph6-ib  ~ # ceph osd tree                                                                                  
# id    weight  type name       up/down reweight                                                             
-1      20.02   root default                                                                                 
-2      7.28            host ceph4-ib
0       1.82                    osd.0   up      1
1       1.82                    osd.1   up      1
2       1.82                    osd.2   up      1
3       1.82                    osd.3   up      1
-3      7.28            host ceph3-ib
4       1.82                    osd.4   up      1
5       1.82                    osd.5   up      1
6       1.82                    osd.6   up      1
7       1.82                    osd.7   up      1
-4      5.46            host ceph2-ib
8       1.82                    osd.8   up      1
9       1.82                    osd.9   up      1
10      1.82                    osd.10  up      1

Code:

ceph6-ib  ~ # ceph health detail 
HEALTH_ERR 2 pgs inconsistent; 4 scrub errors
pg 2.75 is active+clean+inconsistent, acting [7,10,1]
pg 2.12b is active+clean+inconsistent, acting [1,10,6]
4 scrub errors

wahmed · Jun 1, 2014

RobFantini said:

Code:

ceph6-ib  ~ # ceph osd tree                                                                                  
# id    weight  type name       up/down reweight                                                             
-1      20.02   root default                                                                                 
-2      7.28            host ceph4-ib
0       1.82                    osd.0   up      1
1       1.82                    osd.1   up      1
2       1.82                    osd.2   up      1
3       1.82                    osd.3   up      1
-3      7.28            host ceph3-ib
4       1.82                    osd.4   up      1
5       1.82                    osd.5   up      1
6       1.82                    osd.6   up      1
7       1.82                    osd.7   up      1
-4      5.46            host ceph2-ib
8       1.82                    osd.8   up      1
9       1.82                    osd.9   up      1
10      1.82                    osd.10  up      1

Code:

ceph6-ib  ~ # ceph health detail 
HEALTH_ERR 2 pgs inconsistent; 4 scrub errors
pg 2.75 is active+clean+inconsistent, acting [7,10,1]
pg 2.12b is active+clean+inconsistent, acting [1,10,6]
4 scrub errors

Which one is your new node? ceph1-ib which is not showing? or ceph4-ib?

RobFantini · Jun 1, 2014

the new node is ceph6-ib .

wahmed · Jun 1, 2014

Only thing i can suggest is to make sure the new node is also a monitor. Then try to recreate OSD. Also look for issues in syslog. Should give you some clue.

RobFantini · Jun 1, 2014

symmcom said:
Only thing i can suggest is to make sure the new node is also a monitor. Then try to recreate OSD. Also look for issues in syslog. Should give you some clue.

OK I'll do that. AFAIR I had created the monitor for the new node after the osd's.

To remove the osd's should I just run ceph-disk zap on each disk?

RobFantini · Jun 1, 2014

I zapped one of the disks, and the 2 journal ssd's,

then created a new osd at pve web page.

Result: the osd does not show up in ceph osd tree or the web gui.

wahmed · Jun 1, 2014

Nothing in syslog of ceph6-ib?

Also just wandering, did you do #pveceph install on the new node?

RobFantini · Jun 1, 2014

symmcom said:
Nothing in syslog of ceph6-ib?

Also just wandering, did you do #pveceph install on the new node?

OK I jsut tried zapping all the disks and recreated osd.

in syslog:

Code:

Jun  1 11:10:03 ceph6-ib pvedaemon[3571]: <root@pam> starting task UPID:ceph6-ib:0000D3FB:001653B1:538B424B:cephcreateosd:sdd:root@pam:
Jun  1 11:10:04 ceph6-ib kernel: sdd: unknown partition table
Jun  1 11:10:05 ceph6-ib kernel: sdd:
Jun  1 11:10:06 ceph6-ib kernel: sdb: sdb1
Jun  1 11:10:08 ceph6-ib kernel: sdd: sdd1
Jun  1 11:10:09 ceph6-ib kernel: XFS (sdd1): Mounting Filesystem
Jun  1 11:10:11 ceph6-ib kernel: XFS (sdd1): Ending clean mount
Jun  1 11:10:13 ceph6-ib kernel: sdd: sdd1
Jun  1 11:10:13 ceph6-ib pvedaemon[3571]: <root@pam> end task UPID:ceph6-ib:0000D3FB:001653B1:538B424B:cephcreateosd:sdd:root@pam: OK

i used this command to install:

Code:

pveceph install -version firefly

and current tree:

Code:

ceph6-ib  /var/log # ceph osd tree
# id    weight  type name       up/down reweight
-1      20.02   root default
-2      7.28            host ceph4-ib
0       1.82                    osd.0   up      1
1       1.82                    osd.1   up      1
2       1.82                    osd.2   up      1
3       1.82                    osd.3   up      1
-3      7.28            host ceph3-ib
4       1.82                    osd.4   up      1
5       1.82                    osd.5   up      1
6       1.82                    osd.6   up      1
7       1.82                    osd.7   up      1
-4      5.46            host ceph2-ib
8       1.82                    osd.8   up      1
9       1.82                    osd.9   up      1
10      1.82                    osd.10  up      1

RobFantini · Jun 1, 2014

also note:

Code:

Jun  1 11:10:11 ceph6-ib kernel: XFS (sdd1): Ending clean mount

but there is no mount for that disk.

RobFantini · Jun 1, 2014

this may be related to our issue

Code:

Jun  1 10:35:08 ceph6-ib kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
Jun  1 10:35:24 ceph6-ib kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
Jun  1 10:35:40 ceph6-ib kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22

I'll check that out later on, have to leave this until tonight....

udo · Jun 1, 2014

RobFantini said:

Code:

ceph6-ib  ~ # ceph health detail 
HEALTH_ERR 2 pgs inconsistent; 4 scrub errors
pg 2.75 is active+clean+inconsistent, acting [7,10,1]
pg 2.12b is active+clean+inconsistent, acting [1,10,6]
4 scrub errors

Hi,
you should first bring your cluster in healthy conditions

Code:

ceph pg repair 2.12b
ceph pg repair 2.75

Udo

RobFantini · Jun 3, 2014

Udo,
thanks for the info on ceph pg repair.

Now we've got : HEALTH_OK ..
and the multicast issue was because one of 2 IB cables was not connected... the cluster was always OK...

So I'm back to not able to add osd's from one node....

I can add an osd on node ceph6-ib ... but those do not show up in osd tree ...

Any more suggestions?

If not I'll try removing the node from ceph mon and cluster , then reinstall.

RobFantini · Jun 3, 2014

here is some more info from cli

node ceph4-ib /var/lib/ceph/osd :

Code:

 # ls -l /var/lib/ceph/osd
total 0
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-0
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-1
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-2
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-3

and at ceph6-ib there is nothing.

Code:

ls -l /var/lib/ceph/osd
total 0

wahmed · Jun 3, 2014

Ok, lets try this differently. Try to create a OSD from CLI instead of Proxmox GUI. Should not really matter but worth trying. Not sure if you know how to use CLI to manage Ceph, but the following are some steps to create OSD through CLI.

1. You are going to need ceph-deploy tool for this. If it is not installed, simply install it by # apt-get install ceph-deploy

2. From same node take a disk list of your ceph6 node: #ceph-deploy disk list <node_name>

3. See what are the status of the disk drives then Zap a choice of disk drive: #ceph-deploy disk zap <node_name>:/dev/sdX

4. After successful zapping create an OSD on the disk drive: #ceph-deploy osd activate <node_name>:/dev/sdX

Lets see what happens now.

RobFantini · Jun 3, 2014

symmcom said:
Ok, lets try this differently. Try to create a OSD from CLI instead of Proxmox GUI. Should not really matter but worth trying. Not sure if you know how to use CLI to manage Ceph, but the following are some steps to create OSD through CLI.

1. You are going to need ceph-deploy tool for this. If it is not installed, simply install it by # apt-get install ceph-deploy

2. From same node take a disk list of your ceph6 node: #ceph-deploy disk list <node_name>

3. See what are the status of the disk drives then Zap a choice of disk drive: #ceph-deploy disk zap <node_name>:/dev/sdX

4. After successful zapping create an OSD on the disk drive: #ceph-deploy osd activate <node_name>:/dev/sdX

Lets see what happens now.

2: from 2 diff nodes:

Code:

ceph6-ib  ~ # ceph-deploy disk list ceph6-ib
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy disk list ceph6-ib
[ceph_deploy][ERROR ] ConfigError: Cannot load config: [Errno 2] No such file or directory: 'ceph.conf'; has `ceph-deploy new` been run in this directory?

Code:

ceph4-ib  ~ # ceph-deploy disk list ceph4-ib
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy disk list ceph4-ib
[ceph_deploy][ERROR ] ConfigError: Cannot load config: [Errno 2] No such file or directory: 'ceph.conf'; has `ceph-deploy new` been run in this directory?

I do not want to run `ceph-deploy new` until I hear back...

PS: I strongly like cli. and thank you for the very good ceph threads..

wahmed · Jun 3, 2014

Were you in #/etc/pve on Proxmox node while running the CLI commands? If not then thats why it cannot find the ceph.conf file.

osd issue after adding node to ceph

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

Distinguished Member

Famous Member

Famous Member

Famous Member

Famous Member

Famous Member

We value your privacy