Cannot see the contents of my CEPH storage

Bimlesh Singh · Oct 1, 2022

Last week my promox rack had problem with power supply. 5 servers mainboard damaged. I fixed the 3 servers but 2 servers are not back to online. Ceph storage show warning as below:

ceph -s

cluster:

id: 17fc003a-208b-4c20-82e2-c59307bd8334

health: HEALTH_WARN

Reduced data availability: 137 pgs inactive, 100 pgs down

100 pgs not deep-scrubbed in time

100 pgs not scrubbed in time

3 slow ops, oldest one blocked for 108705 sec, osd.13 has slow ops

1/5 mons down, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112

services:

mon: 5 daemons, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112 (age 29h), out of quorum: pve-tr104

mgr: pve-tr100(active, since 29h), standbys: pve-tr102, pve-tr106, pve-tr112

osd: 33 osds: 27 up (since 29h), 27 in (since 31h)

data:

pools: 1 pools, 2048 pgs

objects: 556.20k objects, 2.1 TiB

usage: 6.3 TiB used, 7.8 TiB / 14 TiB avail

pgs: 1.807% pgs unknown

4.883% pgs not active

1911 active+clean

100 down

37 unknown

Deepen Dhulla · Oct 1, 2022

Self-Healing would take time , you have 6 OSD are down, which is big amount of data. plus its two host down. , so its like if you had 3/2 replication , it would have to rebuild with 1 copy left... hopefully ceph would become health. keep a close watch.

Bimlesh Singh · Oct 1, 2022

Deepen Dhulla said:
Self-Healing would take time , you have 6 OSD are down, which is big amount of data. plus its two host down. , so its like if you had 3/2 replication , it would have to rebuild with 1 copy left... hopefully ceph would become health. keep a close watch.

Thank you for your reply. Is there any way to speed up this process because many vm guests were running in this cluster.

Bimlesh Singh · Oct 4, 2022

yesterday ceph status changed from warning to error as how in attached image. Do I have to wait only

ceph -s
cluster:
id: 17fc003a-208b-4c20-82e2-c59307bd8334
health: HEALTH_ERR
1 scrub errors
Reduced data availability: 137 pgs inactive, 100 pgs down
Possible data damage: 1 pg inconsistent
100 pgs not deep-scrubbed in time
100 pgs not scrubbed in time
1/5 mons down, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112

services:
mon: 5 daemons, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112 (age 27h), out of quorum: pve-tr104
mgr: pve-tr100(active, since 38h), standbys: pve-tr112, pve-tr106, pve-tr102
osd: 33 osds: 27 up (since 27h), 27 in (since 27h)

data:
pools: 1 pools, 2048 pgs
objects: 556.20k objects, 2.1 TiB
usage: 6.3 TiB used, 7.8 TiB / 14 TiB avail
pgs: 1.807% pgs unknown
4.883% pgs not active
1910 active+clean
100 down
37 unknown
1 active+clean+inconsistent

progress:
Rebalancing after osd.32 marked in
[..............................]

Bimlesh Singh · Oct 5, 2022

Waiting time is too long. Is there any way to solve this problem ASAP.

Bimlesh Singh · Oct 14, 2022

Now its been 3 weeks but condition is same. Can not able to access Ceph storage.

pille99 · Oct 14, 2022

a timeout usually appears if something is wrong with the network, and i dont mean the speed of the replication take the whole bandwidth. at least i know that issue from such behalf. for any reason the server/Service/network isnt reachable

Bimlesh Singh · Oct 18, 2022

Thank you for your reply. I checked all services and ceph network access in all running servers and all are running normal. If you can give me more details about which services maybe down then I can able to find it more accurately.

Search

Search

Cannot see the contents of my CEPH storage

Bimlesh Singh

Active Member

Attachments

Deepen Dhulla

Renowned Member

Bimlesh Singh

Active Member

Bimlesh Singh

Active Member

Attachments

Bimlesh Singh

Active Member

Bimlesh Singh

Active Member

pille99

Active Member

Bimlesh Singh

Active Member