ceph software upgrade and ceph restart .

RobFantini · Oct 12, 2014

Hello

on October 7 last week our systems upgrades ceph and ceph-common from 0.80.5-1~bpo70+1 to 0.80.6-1~bpo70+1 .

however the ceph processes show still running from starting October 2.

My questions -

1- do we need to manually restart ceph after upgrades?

2- for restarting ceph on nodes is there a particular way it should be done to prevent osd issues?

e100 · Oct 12, 2014

I simply upgraded one CEPH node at a time and then rebooted.
Wait for ceph health to go back to OK and moved onto the next node.

RobFantini · Oct 12, 2014

e100: thanks for the answer . I'm doing that now..

In addition we use a script to auto set this flag ' noout ' . That was suggested by forum member Udo a couple months ago. noout flag prevents something like setting the missing node out and rebuild the ceph cluster to exclude the down node. that happened when we were using many disks for osd and I had to replace a bad one.

Also The last time I did not wait until ceph health was 100% OK [ besides the noout flag set ] . That caused an issue. So your suggestion of making sure the rebooted node is 100% back before proceding helps.

I think that a 'how to upgrade ceph cluster kernel / ceph software ' section on wiki would be useful info .

spirit · Oct 12, 2014

The correct way is :

-upgrade packages on all nodes

-restart mon on each node 1 by 1 : /etc/init.d/ceph restart mon

-restart osd on each node 1 by 1 : /etc/init.d/ceph restart osd

https://ceph.com/docs/v0.79/install/upgrading-ceph/

RobFantini · Oct 16, 2014

spirit said:
The correct way is :
-upgrade packages on all nodes
-restart mon on each node 1 by 1 : /etc/init.d/ceph restart mon
-restart osd on each node 1 by 1 : /etc/init.d/ceph restart osd

Hello Spirit

I tried doing the above, however when restarting mon on each node got this:

Code:

 #  /etc/init.d/ceph restart mon
 === mon.1 === 
 === mon.1 === 
 Stopping Ceph mon.1 on ceph2-ib...done
 === mon.1 === 
 Starting Ceph mon.1 on ceph2-ib...
 2014-10-16 04:45:24.985110 7ffa0e816780 -1 asok(0x26fed20) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-mon.1.asok': (17) File exists
 IO error: lock /var/lib/ceph/mon/ceph-1/store.db/LOCK: Resource temporarily unavailable
 IO error: lock /var/lib/ceph/mon/ceph-1/store.db/LOCK: Resource temporarily unavailable
 2014-10-16 04:45:24.990042 7ffa0e816780 -1 failed to create new leveldb store
 failed: 'ulimit -n 32768;  /usr/bin/ceph-mon -i 1 --pid-file /var/run/ceph/mon.1.pid -c /etc/ceph/ceph.conf --cluster ceph '
 Starting ceph-create-keys on ceph2-ib...

I assume that the restart of mon did not work OK?

also the process has an old start time :

Code:

 root        3012       1  0 Oct12 ?        00:25:38 /usr/bin/ceph-mon -i 2 --pid-file /var/run/ceph/mon.2.pid -c /etc/ceph/ceph.conf --cluster ceph

so I restarted each node one by one.

My question - did the mon restart above fail? or are those just warnings?

thanks!

Search

Search

ceph software upgrade and ceph restart .

RobFantini

Famous Member

e100

Renowned Member

RobFantini

Famous Member

spirit

Distinguished Member

RobFantini

Famous Member

We value your privacy