Hi all. I'm (VERY) new to PVE and ceph. Let's put it this way, my knowledge of ceph is "I've heard of it once". Been having some issues for the past few months. I'm not entirely sure where to start either..
Long and short, can't start some VM's due to this being FUBAR'd, can't back up anything (this ceph cluster was also housing our backups), and I didn't set it up and was really just thrown into this. The person who did set it up isn't available (no longer work here), so I'm floundering.
Code:
ceph osd tree
# id weight type name up/down reweight
-1 113.9 root default
-3 2.7 host ceph02
1 0.54 osd.1 down 0
2 0.54 osd.2 down 0
3 0.54 osd.3 down 0
4 0.54 osd.4 down 0
5 0.54 osd.5 down 0
-4 2.16 host ceph03
6 0.54 osd.6 up 1
7 0.54 osd.7 down 0
8 0.54 osd.8 up 1
9 0.54 osd.9 up 1
-2 27.25 host ceph01
10 5.45 osd.10 down 0
11 5.45 osd.11 down 0
14 5.45 osd.14 down 0
15 5.45 osd.15 down 0
0 5.45 osd.0 down 0
-6 27.25 host ceph05
19 5.45 osd.19 up 1
20 5.45 osd.20 up 1
21 5.45 osd.21 up 1
22 5.45 osd.22 up 1
23 5.45 osd.23 up 1
-7 27.25 host ceph06
24 5.45 osd.24 down 0
26 5.45 osd.26 up 1
27 5.45 osd.27 up 1
29 5.45 osd.29 up 1
25 5.45 osd.25 down 0
-5 27.25 host ceph4
12 5.45 osd.12 down 0
16 5.45 osd.16 down 0
17 5.45 osd.17 down 0
18 5.45 osd.18 down 0
13 5.45 osd.13 down 0
Code:
ceph health
HEALTH_WARN 1992 pgs backfill; 14 pgs backfilling; 2760 pgs degraded; 2283 pgs down; 2506 pgs peering; 687 pgs stale; 2611 pgs stuck inactive; 687 pgs stuck stale; 5612 pgs stuck unclean; recovery 2498299/10101735 objects degraded (24.731%); clock skew detected on mon.1, mon.2
Code:
ceph status
cluster a62c1605-2026-44e8-8496-696a8d070b2f
health HEALTH_WARN 1992 pgs backfill; 14 pgs backfilling; 2760 pgs degraded; 2283 pgs down; 2506 pgs peering; 687 pgs stale; 2611 pgs stuck inactive; 687 pgs stuck stale; 5612 pgs stuck unclean; recovery 2497013/10101735 objects degraded (24.719%); clock skew detected on mon.1, mon.2
monmap e21: 5 mons at {0=10.254.253.100:6789/0,1=10.254.253.101:6789/0,2=10.254.253.102:6789/0,4=10.254.253.104:6789/0,5=10.254.253.105:6789/0}, election epoch 568, quorum 0,1,2,3,4 0,1,2,4,5
mdsmap e54: 0/0/1 up
osdmap e119016: 29 osds: 11 up, 11 in
pgmap v115919348: 6522 pgs, 5 pools, 13004 GB data, 3264 kobjects
25469 GB used, 20841 GB / 46311 GB avail
2497013/10101735 objects degraded (24.719%)
3 down+remapped+peering
695 active+clean
139 stale+down+peering
13 stale+remapped+peering
14 active+degraded+remapped+backfilling
2138 down+peering
1990 active+degraded+remapped+wait_backfill
239 active+remapped
4 stale+active+degraded
2 active+clean+scrubbing+deep
2 active+remapped+wait_backfill
105 stale
3 stale+down+remapped+peering
752 active+degraded
213 stale+active+clean
210 stale+peering
Long and short, can't start some VM's due to this being FUBAR'd, can't back up anything (this ceph cluster was also housing our backups), and I didn't set it up and was really just thrown into this. The person who did set it up isn't available (no longer work here), so I'm floundering.