this is how we're dealing with rebooting ceph cluster to use an upgraded kerrnel.
please suggest improvements.
*set noout flag ( we do this automatically , but it does not hurt to run this.)
*do the following for all nodes :
1 pick a system to restart
2 migrate any of the must be available systems to another node if needed.
3 restart the system.
4 from another node run this
and wait until result is this before proceeding to next node restart [1]
*when done with all nodes unset noout :
[1] note on 2014-10-12 it took 10-15 minutes for this to complete on a node
please suggest improvements.
*set noout flag ( we do this automatically , but it does not hurt to run this.)
Code:
/usr/bin/ceph osd set noout
1 pick a system to restart
2 migrate any of the must be available systems to another node if needed.
3 restart the system.
4 from another node run this
Code:
watch ceph health
Code:
ceph healthHEALTH_WARN noout flag(s) set
*when done with all nodes unset noout :
Code:
ceph osd unset noout
[1] note on 2014-10-12 it took 10-15 minutes for this to complete on a node
Code:
# ceph health
HEALTH_WARN 16 pgs recovering; 13 pgs recovery_wait; 29 pgs stuck unclean; 137 requests are blocked > 32 sec;
recovery 211/560733 objects degraded (0.038%); noout flag(s) set