Ceph osd crash

innerhippy

New Member
Mar 29, 2024
6
0
1
My 4 node cluster has been rock solid for 6 months, until now.
Bash:
$ root@pve02:~# ceph -s
  cluster:
    id:     3e788c55-0a22-4edc-af28-94b8e4ff1cac
    health: HEALTH_WARN
            Degraded data redundancy: 1105460/3316380 objects degraded (33.333%), 193 pgs degraded, 193 pgs undersized
            12 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum pve01,pve03,pve02 (age 80m)
    mgr: pve01(active, since 45h), standbys: pve03, pve02
    mds: 1/1 daemons up, 2 standby
    osd: 4 osds: 2 up (since 9h), 2 in (since 22m)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 1.11M objects, 1.5 TiB
    usage:   3.0 TiB used, 696 GiB / 3.6 TiB avail
    pgs:     1105460/3316380 objects degraded (33.333%)
             193 active+undersized+degraded
 
  io:
    client:   9.7 KiB/s wr, 0 op/s rd, 1 op/s wr

Bash:
$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         7.27759  root default                             
-3         1.81940      host pve01                           
 0   nvme  1.81940          osd.0       up   1.00000  1.00000
-5         1.81940      host pve02                           
 1   nvme  1.81940          osd.1     down         0  1.00000
-7         1.81940      host pve03                           
 2   nvme  1.81940          osd.2     down         0  1.00000
-9         1.81940      host pve04                           
 3   nvme  1.81940          osd.3       up   1.00000  1.00000

The crash reports are not that revealing

Bash:
ceph crash info 2024-11-07T10:37:23.263067Z_3e208c58-2cd5-409f-ab96-e13ea861a27a
{
    "assert_condition": "false",
    "assert_file": "./src/os/bluestore/HybridAllocator.cc",
    "assert_func": "HybridAllocator::init_rm_free(uint64_t, uint64_t)::<lambda(uint64_t, uint64_t, bool)>",
    "assert_line": 178,
    "assert_msg": "./src/os/bluestore/HybridAllocator.cc: In function 'HybridAllocator::init_rm_free(uint64_t, uint64_t)::<lambda(uint64_t, uint64_t, bool)>' thread 78e76870d840 time 2024-11-07T10:37:23.250154+0000\n./src/os/bluestore/HybridAllocator.cc: 178: FAILED ceph_assert(false)\n",
    "assert_thread_name": "ceph-osd",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x78e76925b050]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x8ae3c) [0x78e7692a9e3c]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x5b4b3ec42362]",
        "/usr/bin/ceph-osd(+0x6334a2) [0x5b4b3ec424a2]",
        "/usr/bin/ceph-osd(+0xd70dd7) [0x5b4b3f37fdd7]",
        "(AvlAllocator::_try_remove_from_tree(unsigned long, unsigned long, std::function<void (unsigned long, unsigned long, bool)>)+0x230) [0x5b4b3f3724e0]",
        "(HybridAllocator::init_rm_free(unsigned long, unsigned long)+0xc9) [0x5b4b3f3800a9]",
        "(BlueFS::mount()+0x1e9) [0x5b4b3f353269]",
        "(BlueStore::_open_bluefs(bool, bool)+0x2dd) [0x5b4b3f2551fd]",
        "(BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x27c) [0x5b4b3f255d1c]",
        "(BlueStore::_open_db(bool, bool, bool)+0x37c) [0x5b4b3f26e71c]",
        "(BlueStore::_open_db_and_around(bool, bool)+0x48e) [0x5b4b3f2d6c4e]",
        "(BlueStore::_mount()+0x347) [0x5b4b3f2d9017]",
        "(OSD::init()+0x4b1) [0x5b4b3ed9e3d1]",
        "main()",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x78e76924624a]",
        "__libc_start_main()",
        "_start()"
    ],
    "ceph_version": "18.2.4",
    "crash_id": "2024-11-07T10:37:23.263067Z_3e208c58-2cd5-409f-ab96-e13ea861a27a",
    "entity_name": "osd.1",
    "os_id": "12",
    "os_name": "Debian GNU/Linux 12 (bookworm)",
    "os_version": "12 (bookworm)",
    "os_version_id": "12",
    "process_name": "ceph-osd",
    "stack_sig": "5b64dfaf5da18c15ac63e445825a3bfa2cfaab78c531a1ef2cd84f73ebfad950",
    "timestamp": "2024-11-07T10:37:23.263067Z",
    "utsname_hostname": "pve02",
    "utsname_machine": "x86_64",
    "utsname_release": "6.8.12-3-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC PMX 6.8.12-3 (2024-10-23T11:41Z)"
}

This is happening on 2 nodes, pve02 and pve03. Any idea how to recover from this?
 
Is there anything in the syslogs around that time regarding the physical disk? Any I/O errors for example?

Have you tried (re)starting the OSD services?
 
Service restart enters failed state after a minute
Bash:
systemctl status ceph-osd@1.service
× ceph-osd@1.service - Ceph object storage daemon osd.1
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: signal) since Thu 2024-11-07 11:17:55 GMT; 2min 33s ago
   Duration: 23.563s
    Process: 49073 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 1 (code=exited, status=0/SUCCESS)
    Process: 49077 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 1 --setuser ceph --setgroup ceph (code=killed, signal=ABR>
   Main PID: 49077 (code=killed, signal=ABRT)
        CPU: 17.355s

Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Scheduled restart job, restart counter is at 3.
Nov 07 11:17:55 pve02 systemd[1]: Stopped ceph-osd@1.service - Ceph object storage daemon osd.1.
Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Consumed 17.355s CPU time.
Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Start request repeated too quickly.
Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
Nov 07 11:17:55 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.

Nothing obvious in syslogs
Bash:
journalctl -p err -b
Nov 07 09:38:06 pve02 kernel: x86/cpu: SGX disabled by BIOS.
Nov 07 09:38:11 pve02 kernel: ipmi_si hardcode-ipmi-si.0: Interface detection failed
Nov 07 09:38:11 pve02 pmxcfs[1748]: [quorum] crit: quorum_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [quorum] crit: can't initialize service
Nov 07 09:38:11 pve02 pmxcfs[1748]: [confdb] crit: cmap_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [confdb] crit: can't initialize service
Nov 07 09:38:11 pve02 pmxcfs[1748]: [dcdb] crit: cpg_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [dcdb] crit: can't initialize service
Nov 07 09:38:11 pve02 pmxcfs[1748]: [status] crit: cpg_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [status] crit: can't initialize service
Nov 07 09:39:58 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.
Nov 07 10:37:33 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.
Nov 07 11:17:55 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.
 
Only thing vaguely interesting is this

systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
 
Maybe this indicates something?
Bash:
Nov 07 01:44:40 pve03 kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Nov 07 01:45:00 pve03 kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Nov 07 01:45:00 pve03 kernel: Buffer I/O error on dev dm-0, logical block 488378352, async page read
Nov 07 01:45:00 pve03 kernel: Buffer I/O error on dev dm-0, logical block 488378352, async page read
Nov 07 01:45:00 pve03 kernel: Buffer I/O error on dev dm-0, logical block 488378352, async page read
Nov 07 01:45:11 pve03 kernel: Buffer I/O error on dev dm-0, logical block 0, async page read
 
Getting closer
Code:
smartctl -c /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-3-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!