CEPH: 4 OSDs on 4 different nodes went down at once

p.jahn

Active Member
Feb 19, 2019
26
3
43
43
Hello everyone,

I'm running a hyperconverged Proxmox/CEPH cluster with 2/4 replication.

Everything was fine but yesterday more and more PGs became laggy and 4 OSDs on 4 different nodes went down at once.

Code:
2022-09-18T20:26:46.661199+0200 mgr.pve03 (mgr.171599314) 3545439 : cluster [DBG] pgmap v3548926: 1025 pgs: 1025 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.2 MiB/s rd, 1.6 MiB/s wr, 127 op/s
2022-09-18T20:26:48.664416+0200 mgr.pve03 (mgr.171599314) 3545440 : cluster [DBG] pgmap v3548927: 1025 pgs: 1 active+clean+laggy, 1024 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.5 MiB/s rd, 2.5 MiB/s wr, 139 op/s
2022-09-18T20:26:50.666507+0200 mgr.pve03 (mgr.171599314) 3545441 : cluster [DBG] pgmap v3548928: 1025 pgs: 1 active+clean+laggy, 1024 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.1 MiB/s rd, 2.9 MiB/s wr, 129 op/s
2022-09-18T20:26:52.669651+0200 mgr.pve03 (mgr.171599314) 3545442 : cluster [DBG] pgmap v3548929: 1025 pgs: 1 active+clean+laggy, 1024 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.4 MiB/s rd, 3.3 MiB/s wr, 131 op/s
2022-09-18T20:26:54.672367+0200 mgr.pve03 (mgr.171599314) 3545443 : cluster [DBG] pgmap v3548930: 1025 pgs: 1 active+clean+laggy, 1024 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.4 MiB/s rd, 3.2 MiB/s wr, 120 op/s
2022-09-18T20:26:56.674663+0200 mgr.pve03 (mgr.171599314) 3545444 : cluster [DBG] pgmap v3548931: 1025 pgs: 1 active+clean+laggy, 1024 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.3 MiB/s rd, 2.4 MiB/s wr, 87 op/s
2022-09-18T20:26:58.677747+0200 mgr.pve03 (mgr.171599314) 3545445 : cluster [DBG] pgmap v3548932: 1025 pgs: 8 active+clean+laggy, 1017 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.6 MiB/s rd, 2.3 MiB/s wr, 80 op/s
2022-09-18T20:27:00.679882+0200 mgr.pve03 (mgr.171599314) 3545446 : cluster [DBG] pgmap v3548933: 1025 pgs: 13 active+clean+laggy, 1012 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.1 MiB/s rd, 1.1 MiB/s wr, 42 op/s
2022-09-18T20:27:02.683052+0200 mgr.pve03 (mgr.171599314) 3545447 : cluster [DBG] pgmap v3548934: 1025 pgs: 23 active+clean+laggy, 1002 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 1.1 MiB/s rd, 586 KiB/s wr, 33 op/s
2022-09-18T20:27:04.685768+0200 mgr.pve03 (mgr.171599314) 3545448 : cluster [DBG] pgmap v3548935: 1025 pgs: 29 active+clean+laggy, 996 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 592 KiB/s rd, 157 KiB/s wr, 20 op/s
2022-09-18T20:27:06.688655+0200 mgr.pve03 (mgr.171599314) 3545449 : cluster [DBG] pgmap v3548936: 1025 pgs: 33 active+clean+laggy, 992 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 572 KiB/s rd, 134 KiB/s wr, 17 op/s
2022-09-18T20:27:08.691850+0200 mgr.pve03 (mgr.171599314) 3545450 : cluster [DBG] pgmap v3548937: 1025 pgs: 39 active+clean+laggy, 986 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 375 KiB/s rd, 122 KiB/s wr, 16 op/s
2022-09-18T20:27:10.694008+0200 mgr.pve03 (mgr.171599314) 3545451 : cluster [DBG] pgmap v3548938: 1025 pgs: 45 active+clean+laggy, 980 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 103 KiB/s rd, 126 KiB/s wr, 13 op/s
2022-09-18T20:27:12.697203+0200 mgr.pve03 (mgr.171599314) 3545452 : cluster [DBG] pgmap v3548939: 1025 pgs: 1 active+clean+scrubbing, 54 active+clean+laggy, 970 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 24 KiB/s rd, 130 KiB/s wr, 15 op/s
2022-09-18T20:27:14.699965+0200 mgr.pve03 (mgr.171599314) 3545453 : cluster [DBG] pgmap v3548940: 1025 pgs: 1 active+clean+scrubbing, 56 active+clean+laggy, 968 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 12 KiB/s rd, 116 KiB/s wr, 12 op/s
2022-09-18T20:27:16.702304+0200 mgr.pve03 (mgr.171599314) 3545454 : cluster [DBG] pgmap v3548941: 1025 pgs: 1 active+clean+scrubbing, 56 active+clean+laggy, 968 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 9.7 KiB/s rd, 101 KiB/s wr, 11 op/s
2022-09-18T20:27:18.705425+0200 mgr.pve03 (mgr.171599314) 3545455 : cluster [DBG] pgmap v3548942: 1025 pgs: 1 active+clean+scrubbing, 57 active+clean+laggy, 967 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 12 KiB/s rd, 91 KiB/s wr, 11 op/s
2022-09-18T20:27:19.321616+0200 mon.pve02 (mon.0) 3081414 : cluster [DBG] osd.20 reported failed by osd.32
2022-09-18T20:27:19.529986+0200 mon.pve02 (mon.0) 3081415 : cluster [DBG] osd.20 reported failed by osd.13
2022-09-18T20:27:19.530286+0200 mon.pve02 (mon.0) 3081416 : cluster [INF] osd.20 failed (root=default,host=pve03) (2 reporters from different host after 23.208814 >= grace 20.000000)
2022-09-18T20:27:19.920523+0200 mon.pve02 (mon.0) 3081417 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2022-09-18T20:27:19.958093+0200 mon.pve02 (mon.0) 3081418 : cluster [DBG] osdmap e65602: 40 total, 39 up, 40 in
2022-09-18T20:27:20.268591+0200 mon.pve02 (mon.0) 3081419 : cluster [DBG] osd.17 reported failed by osd.3
2022-09-18T20:27:20.268675+0200 mon.pve02 (mon.0) 3081420 : cluster [DBG] osd.34 reported failed by osd.3
2022-09-18T20:27:20.527443+0200 mon.pve02 (mon.0) 3081421 : cluster [DBG] osd.17 reported failed by osd.27
2022-09-18T20:27:20.527528+0200 mon.pve02 (mon.0) 3081422 : cluster [DBG] osd.34 reported failed by osd.27
2022-09-18T20:27:20.527553+0200 mon.pve02 (mon.0) 3081423 : cluster [INF] osd.34 failed (root=default,host=pve04) (2 reporters from different host after 21.000160 >= grace 20.000000)
2022-09-18T20:27:20.528874+0200 mon.pve02 (mon.0) 3081424 : cluster [DBG] osd.17 reported failed by osd.24
2022-09-18T20:27:20.528917+0200 mon.pve02 (mon.0) 3081425 : cluster [DBG] osd.34 reported failed by osd.24
2022-09-18T20:27:20.583362+0200 mon.pve02 (mon.0) 3081426 : cluster [DBG] osd.17 reported failed by osd.6
2022-09-18T20:27:20.583435+0200 mon.pve02 (mon.0) 3081427 : cluster [DBG] osd.34 reported failed by osd.6
2022-09-18T20:27:20.616782+0200 mon.pve02 (mon.0) 3081428 : cluster [DBG] osd.34 reported failed by osd.40
2022-09-18T20:27:20.667307+0200 mon.pve02 (mon.0) 3081429 : cluster [DBG] osd.17 reported failed by osd.29
2022-09-18T20:27:20.667380+0200 mon.pve02 (mon.0) 3081430 : cluster [DBG] osd.34 reported failed by osd.29
2022-09-18T20:27:20.837266+0200 mon.pve02 (mon.0) 3081431 : cluster [DBG] osd.17 reported failed by osd.16
2022-09-18T20:27:20.837373+0200 mon.pve02 (mon.0) 3081432 : cluster [DBG] osd.34 reported failed by osd.16
2022-09-18T20:27:20.875328+0200 mon.pve02 (mon.0) 3081433 : cluster [DBG] osdmap e65603: 40 total, 38 up, 40 in
2022-09-18T20:27:20.706708+0200 mgr.pve03 (mgr.171599314) 3545456 : cluster [DBG] pgmap v3548944: 1025 pgs: 10 peering, 14 stale+active+clean, 1 active+clean+scrubbing, 56 active+clean+laggy, 944 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 12 KiB/s rd, 64 KiB/s wr, 8 op/s
2022-09-18T20:27:20.945995+0200 mon.pve02 (mon.0) 3081434 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY)
2022-09-18T20:27:21.306264+0200 mon.pve02 (mon.0) 3081435 : cluster [DBG] osd.17 reported failed by osd.30
2022-09-18T20:27:21.877075+0200 mon.pve02 (mon.0) 3081436 : cluster [DBG] osdmap e65604: 40 total, 38 up, 40 in
2022-09-18T20:27:22.048194+0200 mon.pve02 (mon.0) 3081437 : cluster [DBG] osd.17 reported failed by osd.39
2022-09-18T20:27:22.050539+0200 mon.pve02 (mon.0) 3081438 : cluster [DBG] osd.5 reported failed by osd.36
2022-09-18T20:27:22.050584+0200 mon.pve02 (mon.0) 3081439 : cluster [DBG] osd.17 reported failed by osd.36
2022-09-18T20:27:22.386269+0200 mon.pve02 (mon.0) 3081440 : cluster [DBG] osd.17 reported failed by osd.0
2022-09-18T20:27:22.464868+0200 mon.pve02 (mon.0) 3081441 : cluster [DBG] osd.5 reported failed by osd.19
2022-09-18T20:27:22.464918+0200 mon.pve02 (mon.0) 3081442 : cluster [INF] osd.5 failed (root=default,host=pve01) (2 reporters from different host after 21.000054 >= grace 20.000000)
2022-09-18T20:27:22.868047+0200 mon.pve02 (mon.0) 3081443 : cluster [DBG] osd.17 reported failed by osd.26
2022-09-18T20:27:22.708880+0200 mgr.pve03 (mgr.171599314) 3545457 : cluster [DBG] pgmap v3548947: 1025 pgs: 60 peering, 19 stale+active+clean, 43 active+clean+laggy, 41 active+undersized+degraded, 862 active+clean; 10 TiB data, 40 TiB used, 30 TiB / 70 TiB avail; 20 KiB/s rd, 643 KiB/s wr, 33 op/s; 107592/10737386 objects degraded (1.002%)
2022-09-18T20:27:22.928537+0200 mon.pve02 (mon.0) 3081444 : cluster [WRN] Health check failed: Degraded data redundancy: 107592/10737386 objects degraded (1.002%), 41 pgs degraded (PG_DEGRADED)
2022-09-18T20:27:22.949195+0200 mon.pve02 (mon.0) 3081445 : cluster [DBG] osdmap e65605: 40 total, 37 up, 40 in
2022-09-18T20:27:23.196650+0200 mon.pve02 (mon.0) 3081446 : cluster [DBG] osd.17 reported failed by osd.23
2022-09-18T20:27:23.273881+0200 mon.pve02 (mon.0) 3081447 : cluster [DBG] osd.17 reported failed by osd.25
2022-09-18T20:27:23.643619+0200 mon.pve02 (mon.0) 3081448 : cluster [DBG] osd.17 reported failed by osd.9
2022-09-18T20:27:23.643695+0200 mon.pve02 (mon.0) 3081449 : cluster [INF] osd.17 failed (root=default,host=pve02) (4 reporters from different host after 23.593099 >= grace 23.285043)
2022-09-18T20:27:23.852823+0200 mon.pve02 (mon.0) 3081451 : cluster [DBG] osd.17 reported failed by osd.28
2022-09-18T20:27:23.957937+0200 mon.pve02 (mon.0) 3081452 : cluster [DBG] osdmap e65606: 40 total, 36 up, 40 in

Eventually OSD 5 was set OUT by CEPH and the cluster recovered and is now healthy again.

I've checked the logs of every OSD that went down and found latency errors in each log.

Code:
2022-09-18T20:27:31.043+0200 7f8744b47700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for submit_transact, latency = 44.209278107s
2022-09-18T20:27:31.043+0200 7f8744b47700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for throttle_transact, latency = 44.209030151s
2022-09-18T20:27:31.043+0200 7f874834e700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for submit_transact, latency = 44.205947876s
2022-09-18T20:27:31.043+0200 7f874834e700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for throttle_transact, latency = 44.205734253s
2022-09-18T20:27:31.043+0200 7f874a352700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for submit_transact, latency = 16.303409576s
2022-09-18T20:27:31.043+0200 7f874a352700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for throttle_transact, latency = 16.303327560s
2022-09-18T20:27:31.043+0200 7f8751b61700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for kv_commit, latency = 44.216323853s
2022-09-18T20:27:31.043+0200 7f8751b61700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency slow operation observed for kv_sync, latency = 44.216327667s
2022-09-18T20:27:31.043+0200 7f8756b6b700  0 bluestore(/var/lib/ceph/osd/ceph-5) log_latency_fn slow operation observed for _txc_committed_kv, latency = 47.877239227s, txc = 0x563e71fa6000

Then i did a SMART self-test on OSD 5 (/dev/sdh in node PVE01) and it completed without error.

Can someone please advise if it is safe to start OSD 5 and set it IN or if it should be replaced?

Many thanks.

Patrick
 

Attachments

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!