Ceph OSD down/out even though disk looks okay

Jun 14, 2022
4
0
6
Hi.

We have a proxmox cluster (version 7.3-1) with Ceph storage. This morning some drives got marked down/out and the pool with those drives stopped. However, if I run "smartctl -a" or run hdsentinel the drives seem to be in good health.

Ceph health says this:

Code:
root@jarn24:~# ceph health
HEALTH_WARN 1 nearfull osd(s); Reduced data availability: 6 pgs inactive; Low space hindering backfill (add storage if this doesn't resolve itself): 6 pgs backfill_toofull; Degraded data redundancy: 147733/7884885 objects degraded (1.874%), 61 pgs degraded, 64 pgs undersized; 64 pgs not deep-scrubbed in time; 64 pgs not scrubbed in time; 4 pool(s) nearfull; 25 daemons have recently crashed

Here is the output from ceph osd tree:

Code:
root@jarn24:~# ceph osd tree
ID  CLASS  WEIGHT     TYPE NAME        STATUS  REWEIGHT  PRI-AFF
-1         395.52695  root default                            
-5         131.84232      host jarn24                        
12    hdd   10.91409          osd.12       up   1.00000  1.00000
13    hdd   10.91409          osd.13       up   1.00000  1.00000
14    hdd   10.91409          osd.14       up   1.00000  1.00000
15    hdd   10.91409          osd.15       up   1.00000  1.00000
16    hdd   10.91409          osd.16       up   1.00000  1.00000
17    hdd   10.91409          osd.17       up   1.00000  1.00000
18    hdd   10.91409          osd.18       up   1.00000  1.00000
19    hdd   10.91409          osd.19       up   1.00000  1.00000
20    hdd   10.91409          osd.20       up   1.00000  1.00000
21    hdd   10.91409          osd.21       up   1.00000  1.00000
22    hdd   10.91409          osd.22       up   1.00000  1.00000
23    hdd   10.91409          osd.23       up   1.00000  1.00000
36   nvme    0.43660          osd.36     down         0  1.00000
37   nvme    0.43660          osd.37       up   1.00000  1.00000
-3         131.84232      host jarn25                        
 0    hdd   10.91409          osd.0        up   1.00000  1.00000
 1    hdd   10.91409          osd.1        up   1.00000  1.00000
 2    hdd   10.91409          osd.2        up   1.00000  1.00000
 3    hdd   10.91409          osd.3        up   1.00000  1.00000
 4    hdd   10.91409          osd.4        up   1.00000  1.00000
 5    hdd   10.91409          osd.5        up   1.00000  1.00000
 6    hdd   10.91409          osd.6        up   1.00000  1.00000
 7    hdd   10.91409          osd.7        up   1.00000  1.00000
 8    hdd   10.91409          osd.8        up   1.00000  1.00000
 9    hdd   10.91409          osd.9        up   1.00000  1.00000
10    hdd   10.91409          osd.10       up   1.00000  1.00000
11    hdd   10.91409          osd.11       up   1.00000  1.00000
38   nvme    0.43660          osd.38     down         0  1.00000
39   nvme    0.43660          osd.39     down         0  1.00000
-7         131.84232      host jarn26                        
24    hdd   10.91409          osd.24       up   1.00000  1.00000
25    hdd   10.91409          osd.25       up   1.00000  1.00000
26    hdd   10.91409          osd.26       up   1.00000  1.00000
27    hdd   10.91409          osd.27       up   1.00000  1.00000
28    hdd   10.91409          osd.28       up   1.00000  1.00000
29    hdd   10.91409          osd.29       up   1.00000  1.00000
30    hdd   10.91409          osd.30       up   1.00000  1.00000
31    hdd   10.91409          osd.31       up   1.00000  1.00000
32    hdd   10.91409          osd.32       up   1.00000  1.00000
33    hdd   10.91409          osd.33       up   1.00000  1.00000
34    hdd   10.91409          osd.34       up   1.00000  1.00000
35    hdd   10.91409          osd.35       up   1.00000  1.00000
40   nvme    0.43660          osd.40       up   1.00000  1.00000
41   nvme    0.43660          osd.41       up   1.00000  1.00000

...and the contents of /var/log/ceph/ceph-osd.36.log (and the other log files for the other NVMe disks that are down/out) is full of stack traces that I have not idea how to decipher:

Code:
-92> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:        Options.write_buffer_size: 268435456
   -91> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:  Options.max_write_buffer_number: 4
   -90> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:          Options.compression: NoCompression
   -89> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                  Options.bottommost_compression: Disabled
   -88> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:       Options.prefix_extractor: nullptr
   -87> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:   Options.memtable_insert_with_hint_prefix_extractor: nullptr
   -86> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:             Options.num_levels: 7
   -85> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:        Options.min_write_buffer_number_to_merge: 1
   -84> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:     Options.max_write_buffer_number_to_maintain: 0
   -83> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:     Options.max_write_buffer_size_to_maintain: 0
   -82> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:            Options.bottommost_compression_opts.window_bits: -14
   -81> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                  Options.bottommost_compression_opts.level: 32767
   -80> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:               Options.bottommost_compression_opts.strategy: 0
   -79> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:         Options.bottommost_compression_opts.max_dict_bytes: 0
   -78> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:         Options.bottommost_compression_opts.zstd_max_train_bytes: 0
   -77> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                  Options.bottommost_compression_opts.enabled: false
   -76> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:            Options.compression_opts.window_bits: -14
   -75> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                  Options.compression_opts.level: 32767
   -74> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:               Options.compression_opts.strategy: 0
   -73> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:         Options.compression_opts.max_dict_bytes: 0
   -72> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:         Options.compression_opts.zstd_max_train_bytes: 0
   -71> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                  Options.compression_opts.enabled: false
   -70> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:      Options.level0_file_num_compaction_trigger: 4
   -69> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:          Options.level0_slowdown_writes_trigger: 20
   -68> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:              Options.level0_stop_writes_trigger: 36
   -67> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                   Options.target_file_size_base: 67108864
   -66> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:             Options.target_file_size_multiplier: 1
   -65> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                Options.max_bytes_for_level_base: 268435456
   -64> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.level_compaction_dynamic_level_bytes: 0
   -63> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:          Options.max_bytes_for_level_multiplier: 10.000000
   -62> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[0]: 1
   -61> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[1]: 1
   -60> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[2]: 1
   -59> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[3]: 1
   -58> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[4]: 1
   -57> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[5]: 1
   -56> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.max_bytes_for_level_multiplier_addtl[6]: 1
   -55> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:       Options.max_sequential_skip_in_iterations: 8
   -54> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                    Options.max_compaction_bytes: 1677721600
   -53> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                        Options.arena_block_size: 33554432
   -52> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:   Options.soft_pending_compaction_bytes_limit: 68719476736
   -51> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:   Options.hard_pending_compaction_bytes_limit: 274877906944
   -50> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:       Options.rate_limit_delay_max_milliseconds: 100
   -49> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                Options.disable_auto_compactions: 0
   -48> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                        Options.compaction_style: kCompactionStyleLevel
   -47> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                          Options.compaction_pri: kMinOverlappingRatio
   -46> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_universal.size_ratio: 1
   -45> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_universal.min_merge_width: 2
   -44> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_universal.max_merge_width: 4294967295
   -43> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_universal.max_size_amplification_percent: 200
   -42> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_universal.compression_size_percent: -1
   -41> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_universal.stop_style: kCompactionStopStyleTotalSize
   -40> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_fifo.max_table_files_size: 1073741824
   -39> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb: Options.compaction_options_fifo.allow_compaction: 0
   -38> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                   Options.table_properties_collectors:
   -37> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                   Options.inplace_update_support: 0
   -36> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                 Options.inplace_update_num_locks: 10000
   -35> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:               Options.memtable_prefix_bloom_size_ratio: 0.000000
   -34> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:               Options.memtable_whole_key_filtering: 0
   -33> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:   Options.memtable_huge_page_size: 0
   -32> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                           Options.bloom_locality: 0
   -31> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                    Options.max_successive_merges: 0
   -30> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                Options.optimize_filters_for_hits: 0
   -29> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                Options.paranoid_file_checks: 0
   -28> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                Options.force_consistency_checks: 0
   -27> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                Options.report_bg_io_stats: 0
   -26> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:                               Options.ttl: 2592000
   -25> 2022-12-26T14:05:19.583+0000 7f76921c4080  4 rocksdb:          Options.periodic_compaction_seconds: 0
-24> 2022-12-26T14:05:19.583+0000 7f76921c4080 4 rocksdb: [column_family.cc:555] (skipping printing options)
-23> 2022-12-26T14:05:19.583+0000 7f76921c4080 4 rocksdb: [column_family.cc:555] (skipping printing options)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!