Ceph osd crash

innerhippy · 2024-11-07T12:04:34+0100

My 4 node cluster has been rock solid for 6 months, until now.

Bash:

$ root@pve02:~# ceph -s
  cluster:
    id:     3e788c55-0a22-4edc-af28-94b8e4ff1cac
    health: HEALTH_WARN
            Degraded data redundancy: 1105460/3316380 objects degraded (33.333%), 193 pgs degraded, 193 pgs undersized
            12 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum pve01,pve03,pve02 (age 80m)
    mgr: pve01(active, since 45h), standbys: pve03, pve02
    mds: 1/1 daemons up, 2 standby
    osd: 4 osds: 2 up (since 9h), 2 in (since 22m)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 193 pgs
    objects: 1.11M objects, 1.5 TiB
    usage:   3.0 TiB used, 696 GiB / 3.6 TiB avail
    pgs:     1105460/3316380 objects degraded (33.333%)
             193 active+undersized+degraded
 
  io:
    client:   9.7 KiB/s wr, 0 op/s rd, 1 op/s wr

Bash:

$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         7.27759  root default                             
-3         1.81940      host pve01                           
 0   nvme  1.81940          osd.0       up   1.00000  1.00000
-5         1.81940      host pve02                           
 1   nvme  1.81940          osd.1     down         0  1.00000
-7         1.81940      host pve03                           
 2   nvme  1.81940          osd.2     down         0  1.00000
-9         1.81940      host pve04                           
 3   nvme  1.81940          osd.3       up   1.00000  1.00000

The crash reports are not that revealing

Bash:

ceph crash info 2024-11-07T10:37:23.263067Z_3e208c58-2cd5-409f-ab96-e13ea861a27a
{
    "assert_condition": "false",
    "assert_file": "./src/os/bluestore/HybridAllocator.cc",
    "assert_func": "HybridAllocator::init_rm_free(uint64_t, uint64_t)::<lambda(uint64_t, uint64_t, bool)>",
    "assert_line": 178,
    "assert_msg": "./src/os/bluestore/HybridAllocator.cc: In function 'HybridAllocator::init_rm_free(uint64_t, uint64_t)::<lambda(uint64_t, uint64_t, bool)>' thread 78e76870d840 time 2024-11-07T10:37:23.250154+0000\n./src/os/bluestore/HybridAllocator.cc: 178: FAILED ceph_assert(false)\n",
    "assert_thread_name": "ceph-osd",
    "backtrace": [
        "/lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x78e76925b050]",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x8ae3c) [0x78e7692a9e3c]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x5b4b3ec42362]",
        "/usr/bin/ceph-osd(+0x6334a2) [0x5b4b3ec424a2]",
        "/usr/bin/ceph-osd(+0xd70dd7) [0x5b4b3f37fdd7]",
        "(AvlAllocator::_try_remove_from_tree(unsigned long, unsigned long, std::function<void (unsigned long, unsigned long, bool)>)+0x230) [0x5b4b3f3724e0]",
        "(HybridAllocator::init_rm_free(unsigned long, unsigned long)+0xc9) [0x5b4b3f3800a9]",
        "(BlueFS::mount()+0x1e9) [0x5b4b3f353269]",
        "(BlueStore::_open_bluefs(bool, bool)+0x2dd) [0x5b4b3f2551fd]",
        "(BlueStore::_prepare_db_environment(bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0x27c) [0x5b4b3f255d1c]",
        "(BlueStore::_open_db(bool, bool, bool)+0x37c) [0x5b4b3f26e71c]",
        "(BlueStore::_open_db_and_around(bool, bool)+0x48e) [0x5b4b3f2d6c4e]",
        "(BlueStore::_mount()+0x347) [0x5b4b3f2d9017]",
        "(OSD::init()+0x4b1) [0x5b4b3ed9e3d1]",
        "main()",
        "/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x78e76924624a]",
        "__libc_start_main()",
        "_start()"
    ],
    "ceph_version": "18.2.4",
    "crash_id": "2024-11-07T10:37:23.263067Z_3e208c58-2cd5-409f-ab96-e13ea861a27a",
    "entity_name": "osd.1",
    "os_id": "12",
    "os_name": "Debian GNU/Linux 12 (bookworm)",
    "os_version": "12 (bookworm)",
    "os_version_id": "12",
    "process_name": "ceph-osd",
    "stack_sig": "5b64dfaf5da18c15ac63e445825a3bfa2cfaab78c531a1ef2cd84f73ebfad950",
    "timestamp": "2024-11-07T10:37:23.263067Z",
    "utsname_hostname": "pve02",
    "utsname_machine": "x86_64",
    "utsname_release": "6.8.12-3-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC PMX 6.8.12-3 (2024-10-23T11:41Z)"
}

This is happening on 2 nodes, pve02 and pve03. Any idea how to recover from this?

aaron · 2024-11-07T12:14:57+0100

Is there anything in the syslogs around that time regarding the physical disk? Any I/O errors for example?

Have you tried (re)starting the OSD services?

innerhippy · 2024-11-07T12:23:39+0100

Service restart enters failed state after a minute

Bash:

systemctl status ceph-osd@1.service
× ceph-osd@1.service - Ceph object storage daemon osd.1
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: signal) since Thu 2024-11-07 11:17:55 GMT; 2min 33s ago
   Duration: 23.563s
    Process: 49073 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 1 (code=exited, status=0/SUCCESS)
    Process: 49077 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 1 --setuser ceph --setgroup ceph (code=killed, signal=ABR>
   Main PID: 49077 (code=killed, signal=ABRT)
        CPU: 17.355s

Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Scheduled restart job, restart counter is at 3.
Nov 07 11:17:55 pve02 systemd[1]: Stopped ceph-osd@1.service - Ceph object storage daemon osd.1.
Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Consumed 17.355s CPU time.
Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Start request repeated too quickly.
Nov 07 11:17:55 pve02 systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
Nov 07 11:17:55 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.

Nothing obvious in syslogs

Bash:

journalctl -p err -b
Nov 07 09:38:06 pve02 kernel: x86/cpu: SGX disabled by BIOS.
Nov 07 09:38:11 pve02 kernel: ipmi_si hardcode-ipmi-si.0: Interface detection failed
Nov 07 09:38:11 pve02 pmxcfs[1748]: [quorum] crit: quorum_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [quorum] crit: can't initialize service
Nov 07 09:38:11 pve02 pmxcfs[1748]: [confdb] crit: cmap_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [confdb] crit: can't initialize service
Nov 07 09:38:11 pve02 pmxcfs[1748]: [dcdb] crit: cpg_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [dcdb] crit: can't initialize service
Nov 07 09:38:11 pve02 pmxcfs[1748]: [status] crit: cpg_initialize failed: 2
Nov 07 09:38:11 pve02 pmxcfs[1748]: [status] crit: can't initialize service
Nov 07 09:39:58 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.
Nov 07 10:37:33 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.
Nov 07 11:17:55 pve02 systemd[1]: Failed to start ceph-osd@1.service - Ceph object storage daemon osd.1.

kellogs · 2024-11-07T14:13:52+0100

type dmesg and see if there is any interesting

innerhippy · 2024-11-07T14:49:39+0100

Only thing vaguely interesting is this

systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit uses KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update the service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.

innerhippy · 2024-11-07T17:47:23+0100

Maybe this indicates something?

Bash:

Nov 07 01:44:40 pve03 kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Nov 07 01:45:00 pve03 kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Nov 07 01:45:00 pve03 kernel: Buffer I/O error on dev dm-0, logical block 488378352, async page read
Nov 07 01:45:00 pve03 kernel: Buffer I/O error on dev dm-0, logical block 488378352, async page read
Nov 07 01:45:00 pve03 kernel: Buffer I/O error on dev dm-0, logical block 488378352, async page read
Nov 07 01:45:11 pve03 kernel: Buffer I/O error on dev dm-0, logical block 0, async page read

innerhippy · 2024-11-07T17:59:06+0100

Getting closer

Code:

smartctl -c /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-3-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error

Search

Search

Ceph osd crash

innerhippy

New Member

aaron

Proxmox Staff Member

innerhippy

New Member

kellogs

Member

innerhippy

New Member

innerhippy

New Member

innerhippy

New Member