osd issue after adding node to ceph

RobFantini

Famous Member
May 24, 2012
2,042
110
133
Boston,Mass
Hello
I just added a 4-th node to a ceph cluster.

After creating the osd's in pve, the new nodes OSD's do not show up on the ceph>osd tab. And at cli they are not mounted.

Any clues on how to get the osd's used?
 
there were no errors when I added the 8 disks to osd

on the pve web page, for the new node , ceph > disks show partitions after I created the osd's, before they did not show as having partitions . prior ro adding the osd's I ran ceph-disk zap on each disk.
 
Last edited:
whats the output when you run: # ceph osd tree
and #ceph health detail

Code:
ceph6-ib  ~ # ceph osd tree                                                                                  
# id    weight  type name       up/down reweight                                                             
-1      20.02   root default                                                                                 
-2      7.28            host ceph4-ib
0       1.82                    osd.0   up      1
1       1.82                    osd.1   up      1
2       1.82                    osd.2   up      1
3       1.82                    osd.3   up      1
-3      7.28            host ceph3-ib
4       1.82                    osd.4   up      1
5       1.82                    osd.5   up      1
6       1.82                    osd.6   up      1
7       1.82                    osd.7   up      1
-4      5.46            host ceph2-ib
8       1.82                    osd.8   up      1
9       1.82                    osd.9   up      1
10      1.82                    osd.10  up      1

Code:
ceph6-ib  ~ # ceph health detail 
HEALTH_ERR 2 pgs inconsistent; 4 scrub errors
pg 2.75 is active+clean+inconsistent, acting [7,10,1]
pg 2.12b is active+clean+inconsistent, acting [1,10,6]
4 scrub errors
 
Code:
ceph6-ib  ~ # ceph osd tree                                                                                  
# id    weight  type name       up/down reweight                                                             
-1      20.02   root default                                                                                 
-2      7.28            host ceph4-ib
0       1.82                    osd.0   up      1
1       1.82                    osd.1   up      1
2       1.82                    osd.2   up      1
3       1.82                    osd.3   up      1
-3      7.28            host ceph3-ib
4       1.82                    osd.4   up      1
5       1.82                    osd.5   up      1
6       1.82                    osd.6   up      1
7       1.82                    osd.7   up      1
-4      5.46            host ceph2-ib
8       1.82                    osd.8   up      1
9       1.82                    osd.9   up      1
10      1.82                    osd.10  up      1

Code:
ceph6-ib  ~ # ceph health detail 
HEALTH_ERR 2 pgs inconsistent; 4 scrub errors
pg 2.75 is active+clean+inconsistent, acting [7,10,1]
pg 2.12b is active+clean+inconsistent, acting [1,10,6]
4 scrub errors
Which one is your new node? ceph1-ib which is not showing? or ceph4-ib?
 
Only thing i can suggest is to make sure the new node is also a monitor. Then try to recreate OSD. Also look for issues in syslog. Should give you some clue.
 
Only thing i can suggest is to make sure the new node is also a monitor. Then try to recreate OSD. Also look for issues in syslog. Should give you some clue.

OK I'll do that. AFAIR I had created the monitor for the new node after the osd's.

To remove the osd's should I just run ceph-disk zap on each disk?
 
I zapped one of the disks, and the 2 journal ssd's,

then created a new osd at pve web page.

Result: the osd does not show up in ceph osd tree or the web gui.
 
Nothing in syslog of ceph6-ib?

Also just wandering, did you do #pveceph install on the new node?

OK I jsut tried zapping all the disks and recreated osd.

in syslog:
Code:
Jun  1 11:10:03 ceph6-ib pvedaemon[3571]: <root@pam> starting task UPID:ceph6-ib:0000D3FB:001653B1:538B424B:cephcreateosd:sdd:root@pam:
Jun  1 11:10:04 ceph6-ib kernel: sdd: unknown partition table
Jun  1 11:10:05 ceph6-ib kernel: sdd:
Jun  1 11:10:06 ceph6-ib kernel: sdb: sdb1
Jun  1 11:10:08 ceph6-ib kernel: sdd: sdd1
Jun  1 11:10:09 ceph6-ib kernel: XFS (sdd1): Mounting Filesystem
Jun  1 11:10:11 ceph6-ib kernel: XFS (sdd1): Ending clean mount
Jun  1 11:10:13 ceph6-ib kernel: sdd: sdd1
Jun  1 11:10:13 ceph6-ib pvedaemon[3571]: <root@pam> end task UPID:ceph6-ib:0000D3FB:001653B1:538B424B:cephcreateosd:sdd:root@pam: OK

i used this command to install:
Code:
pveceph install -version firefly

and current tree:
Code:
ceph6-ib  /var/log # ceph osd tree
# id    weight  type name       up/down reweight
-1      20.02   root default
-2      7.28            host ceph4-ib
0       1.82                    osd.0   up      1
1       1.82                    osd.1   up      1
2       1.82                    osd.2   up      1
3       1.82                    osd.3   up      1
-3      7.28            host ceph3-ib
4       1.82                    osd.4   up      1
5       1.82                    osd.5   up      1
6       1.82                    osd.6   up      1
7       1.82                    osd.7   up      1
-4      5.46            host ceph2-ib
8       1.82                    osd.8   up      1
9       1.82                    osd.9   up      1
10      1.82                    osd.10  up      1
 
this may be related to our issue
Code:
Jun  1 10:35:08 ceph6-ib kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
Jun  1 10:35:24 ceph6-ib kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
Jun  1 10:35:40 ceph6-ib kernel: ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22

I'll check that out later on, have to leave this until tonight....
 
Udo,
thanks for the info on ceph pg repair.

Now we've got : HEALTH_OK ..
and the multicast issue was because one of 2 IB cables was not connected... the cluster was always OK...


So I'm back to not able to add osd's from one node....

I can add an osd on node ceph6-ib ... but those do not show up in osd tree ...

Any more suggestions?

If not I'll try removing the node from ceph mon and cluster , then reinstall.
 
here is some more info from cli

node ceph4-ib /var/lib/ceph/osd :
Code:
 # ls -l /var/lib/ceph/osd
total 0
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-0
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-1
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-2
drwxr-xr-x 3 root root 218 Jun  2 17:04 ceph-3

and at ceph6-ib there is nothing.
Code:
ls -l /var/lib/ceph/osd
total 0
 
Ok, lets try this differently. Try to create a OSD from CLI instead of Proxmox GUI. Should not really matter but worth trying. Not sure if you know how to use CLI to manage Ceph, but the following are some steps to create OSD through CLI.

1. You are going to need ceph-deploy tool for this. If it is not installed, simply install it by # apt-get install ceph-deploy

2. From same node take a disk list of your ceph6 node: #ceph-deploy disk list <node_name>

3. See what are the status of the disk drives then Zap a choice of disk drive: #ceph-deploy disk zap <node_name>:/dev/sdX

4. After successful zapping create an OSD on the disk drive: #ceph-deploy osd activate <node_name>:/dev/sdX

Lets see what happens now.
 
Ok, lets try this differently. Try to create a OSD from CLI instead of Proxmox GUI. Should not really matter but worth trying. Not sure if you know how to use CLI to manage Ceph, but the following are some steps to create OSD through CLI.

1. You are going to need ceph-deploy tool for this. If it is not installed, simply install it by # apt-get install ceph-deploy

2. From same node take a disk list of your ceph6 node: #ceph-deploy disk list <node_name>

3. See what are the status of the disk drives then Zap a choice of disk drive: #ceph-deploy disk zap <node_name>:/dev/sdX

4. After successful zapping create an OSD on the disk drive: #ceph-deploy osd activate <node_name>:/dev/sdX

Lets see what happens now.

2: from 2 diff nodes:
Code:
ceph6-ib  ~ # ceph-deploy disk list ceph6-ib
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy disk list ceph6-ib
[ceph_deploy][ERROR ] ConfigError: Cannot load config: [Errno 2] No such file or directory: 'ceph.conf'; has `ceph-deploy new` been run in this directory?

Code:
ceph4-ib  ~ # ceph-deploy disk list ceph4-ib
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.3): /usr/bin/ceph-deploy disk list ceph4-ib
[ceph_deploy][ERROR ] ConfigError: Cannot load config: [Errno 2] No such file or directory: 'ceph.conf'; has `ceph-deploy new` been run in this directory?

I do not want to run `ceph-deploy new` until I hear back...

PS: I strongly like cli. and thank you for the very good ceph threads..