[SOLVED] OSD outdated and OUT?

Oct 17, 2008
96
6
73
48
Netherlands
After a recent PVE upgrade (non-enterprise) taking CEPH from 16.2.5 to 16.2.7, the converged storage went offline on my test cluster.

Following the advice to get my monitors and managers up again, 2 out of 4 hosts gained an upgraded OSD.

After a couple of reboots many OSD's (12 out of 16) went in "OUT" status without the desire to come back onine.

Any idea how to get this fixed?


Thanks,
Martijn
 
Are they up?

Can you post the output of ceph -s and ceph versions?
 
This is what ceph -s looks like

root@utr-tst-vh03:~# ceph -s
cluster:
id: 67b4dbb5-1d5e-4b62-89b0-46ff1ec560fd
health: HEALTH_WARN
1 filesystem is degraded
1 MDSs report slow metadata IOs
7 osds down
2 hosts (8 osds) down
Reduced data availability: 193 pgs inactive
5 daemons have recently crashed

services:
mon: 4 daemons, quorum utr-tst-vh02,utr-tst-vh03,utr-tst-hv04,utr-tst-vh01 (age 17h)
mgr: utr-tst-hv04(active, since 17h), standbys: utr-tst-vh01
mds: 1/1 daemons up, 1 standby
osd: 16 osds: 4 up (since 25h), 11 in (since 16h)

data:
volumes: 0/1 healthy, 1 recovering
pools: 4 pools, 193 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
193 unknown

And ceph versions

root@utr-tst-vh03:~# ceph versions
{
"mon": {
"ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)": 4
},
"mgr": {
"ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)": 2
},
"osd": {
"ceph version 16.2.5 (9b9dd76e12f1907fe5dcc0c1fadadbb784022a42) pacific (stable)": 1,
"ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)": 3
},
"mds": {
"ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)": 2
},
"overall": {
"ceph version 16.2.5 (9b9dd76e12f1907fe5dcc0c1fadadbb784022a42) pacific (stable)": 1,
"ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)": 11
}
}



I see there are a few old ones in between....
 
Well, would be interesting to see why the OSDs won't start up. Try to start them and check the logs /var/log/ceph/ceph-osd.X.log for any hints why they don't want to start.