Cannot see the contents of my CEPH storage

Bimlesh Singh

Active Member
Jul 10, 2018
11
0
41
44
Last week my promox rack had problem with power supply. 5 servers mainboard damaged. I fixed the 3 servers but 2 servers are not back to online. Ceph storage show warning as below:

ceph -s

cluster:

id: 17fc003a-208b-4c20-82e2-c59307bd8334

health: HEALTH_WARN

Reduced data availability: 137 pgs inactive, 100 pgs down

100 pgs not deep-scrubbed in time

100 pgs not scrubbed in time

3 slow ops, oldest one blocked for 108705 sec, osd.13 has slow ops

1/5 mons down, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112



services:

mon: 5 daemons, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112 (age 29h), out of quorum: pve-tr104

mgr: pve-tr100(active, since 29h), standbys: pve-tr102, pve-tr106, pve-tr112

osd: 33 osds: 27 up (since 29h), 27 in (since 31h)



data:

pools: 1 pools, 2048 pgs

objects: 556.20k objects, 2.1 TiB

usage: 6.3 TiB used, 7.8 TiB / 14 TiB avail

pgs: 1.807% pgs unknown

4.883% pgs not active

1911 active+clean

100 down

37 unknown
 

Attachments

  • Ceph Storage.jpg
    Ceph Storage.jpg
    100.2 KB · Views: 12
  • Ceph Storage1.jpg
    Ceph Storage1.jpg
    171.3 KB · Views: 12
  • proxmox.jpg
    proxmox.jpg
    175.7 KB · Views: 13
  • proxmox1.jpg
    proxmox1.jpg
    195.9 KB · Views: 12
Self-Healing would take time , you have 6 OSD are down, which is big amount of data. plus its two host down. , so its like if you had 3/2 replication , it would have to rebuild with 1 copy left... hopefully ceph would become health. keep a close watch.
 
Self-Healing would take time , you have 6 OSD are down, which is big amount of data. plus its two host down. , so its like if you had 3/2 replication , it would have to rebuild with 1 copy left... hopefully ceph would become health. keep a close watch.
Thank you for your reply. Is there any way to speed up this process because many vm guests were running in this cluster.
 
yesterday ceph status changed from warning to error as how in attached image. Do I have to wait only

ceph -s
cluster:
id: 17fc003a-208b-4c20-82e2-c59307bd8334
health: HEALTH_ERR
1 scrub errors
Reduced data availability: 137 pgs inactive, 100 pgs down
Possible data damage: 1 pg inconsistent
100 pgs not deep-scrubbed in time
100 pgs not scrubbed in time
1/5 mons down, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112

services:
mon: 5 daemons, quorum pve-tr100,pve-tr102,pve-tr106,pve-tr112 (age 27h), out of quorum: pve-tr104
mgr: pve-tr100(active, since 38h), standbys: pve-tr112, pve-tr106, pve-tr102
osd: 33 osds: 27 up (since 27h), 27 in (since 27h)

data:
pools: 1 pools, 2048 pgs
objects: 556.20k objects, 2.1 TiB
usage: 6.3 TiB used, 7.8 TiB / 14 TiB avail
pgs: 1.807% pgs unknown
4.883% pgs not active
1910 active+clean
100 down
37 unknown
1 active+clean+inconsistent

progress:
Rebalancing after osd.32 marked in
[..............................]
 

Attachments

  • 2022-10-04_14-48-07.png
    2022-10-04_14-48-07.png
    150.5 KB · Views: 14
a timeout usually appears if something is wrong with the network, and i dont mean the speed of the replication take the whole bandwidth. at least i know that issue from such behalf. for any reason the server/Service/network isnt reachable
 
Thank you for your reply. I checked all services and ceph network access in all running servers and all are running normal. If you can give me more details about which services maybe down then I can able to find it more accurately.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!