OSD Segmentation faults (safe_timer)

43n12y

Member
May 26, 2023
3
4
8
Since last minor upgrade end of january, we have an crashing osd every few days. OSDs recovering from itself.
Journalctl looks very similar every time. It started a few days after the last minor update. My assumption is, it`s maybe related.
Do you have any ideas or tips which information/logs should I check/provide?

journalctl:
Code:
Mar 12 16:12:14  ceph-osd[5480]: *** Caught signal (Segmentation fault) **
Mar 12 16:12:14  ceph-osd[5480]:  in thread 77ec6e5796c0 thread_name:safe_timer
Mar 12 16:12:14  ceph-osd[5480]:  ceph version 19.2.3 (2f03f1cd83e5d40cdf1393cb64a662a8e8bb07c6) squid (stable)
Mar 12 16:12:14  ceph-osd[5480]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x77ec78a4adf0]
Mar 12 16:12:14  ceph-osd[5480]:  2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x77ec78cca53e]
Mar 12 16:12:14  ceph-osd[5480]:  3: (OSD::tick_without_osd_lock()+0x4ac) [0x581a2f04047c]
Mar 12 16:12:14  ceph-osd[5480]:  4: (Context::complete(int)+0xd) [0x581a2f04f15d]
Mar 12 16:12:14  ceph-osd[5480]:  5: (CommonSafeTimer<std::mutex>::timer_thread()+0x129) [0x581a2f74ed39]
Mar 12 16:12:14  ceph-osd[5480]:  6: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x581a2f74feb1]
Mar 12 16:12:14  ceph-osd[5480]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x77ec78a9db7b]
Mar 12 16:12:14  ceph-osd[5480]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x77ec78b1b7b8]
Mar 12 16:12:14  ceph-osd[5480]: 2026-03-12T16:12:14.949+0100 77ec6e5796c0 -1 *** Caught signal (Segmentation fault) **
Mar 12 16:12:14  ceph-osd[5480]:  in thread 77ec6e5796c0 thread_name:safe_timer
Mar 12 16:12:14  ceph-osd[5480]:  ceph version 19.2.3 (2f03f1cd83e5d40cdf1393cb64a662a8e8bb07c6) squid (stable)
Mar 12 16:12:14  ceph-osd[5480]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x77ec78a4adf0]
Mar 12 16:12:14  ceph-osd[5480]:  2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x77ec78cca53e]
Mar 12 16:12:14  ceph-osd[5480]:  3: (OSD::tick_without_osd_lock()+0x4ac) [0x581a2f04047c]
Mar 12 16:12:14  ceph-osd[5480]:  4: (Context::complete(int)+0xd) [0x581a2f04f15d]
Mar 12 16:12:14  ceph-osd[5480]:  5: (CommonSafeTimer<std::mutex>::timer_thread()+0x129) [0x581a2f74ed39]
Mar 12 16:12:14  ceph-osd[5480]:  6: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x581a2f74feb1]
Mar 12 16:12:14  ceph-osd[5480]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x77ec78a9db7b]
Mar 12 16:12:14  ceph-osd[5480]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x77ec78b1b7b8]
Mar 12 16:12:14  ceph-osd[5480]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mar 12 16:12:14  ceph-osd[5480]:      0> 2026-03-12T16:12:14.949+0100 77ec6e5796c0 -1 *** Caught signal (Segmentation fault) **
Mar 12 16:12:14  ceph-osd[5480]:  in thread 77ec6e5796c0 thread_name:safe_timer
Mar 12 16:12:14  ceph-osd[5480]:  ceph version 19.2.3 (2f03f1cd83e5d40cdf1393cb64a662a8e8bb07c6) squid (stable)
Mar 12 16:12:14  ceph-osd[5480]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x77ec78a4adf0]
Mar 12 16:12:14  ceph-osd[5480]:  2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x77ec78cca53e]
Mar 12 16:12:14  ceph-osd[5480]:  3: (OSD::tick_without_osd_lock()+0x4ac) [0x581a2f04047c]
Mar 12 16:12:14  ceph-osd[5480]:  4: (Context::complete(int)+0xd) [0x581a2f04f15d]
Mar 12 16:12:14  ceph-osd[5480]:  5: (CommonSafeTimer<std::mutex>::timer_thread()+0x129) [0x581a2f74ed39]
Mar 12 16:12:14  ceph-osd[5480]:  6: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x581a2f74feb1]
Mar 12 16:12:14  ceph-osd[5480]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x77ec78a9db7b]
Mar 12 16:12:14  ceph-osd[5480]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x77ec78b1b7b8]
Mar 12 16:12:14  ceph-osd[5480]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mar 12 16:12:15  ceph-osd[5480]:     -3> 2026-03-12T16:12:14.949+0100 77ec6e5796c0 -1 *** Caught signal (Segmentation fault) **
Mar 12 16:12:15  ceph-osd[5480]:  in thread 77ec6e5796c0 thread_name:safe_timer
Mar 12 16:12:15  ceph-osd[5480]:  ceph version 19.2.3 (2f03f1cd83e5d40cdf1393cb64a662a8e8bb07c6) squid (stable)
Mar 12 16:12:15  ceph-osd[5480]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x77ec78a4adf0]
Mar 12 16:12:15  ceph-osd[5480]:  2: (std::_Rb_tree_decrement(std::_Rb_tree_node_base const*)+0xe) [0x77ec78cca53e]
Mar 12 16:12:15  ceph-osd[5480]:  3: (OSD::tick_without_osd_lock()+0x4ac) [0x581a2f04047c]
Mar 12 16:12:15  ceph-osd[5480]:  4: (Context::complete(int)+0xd) [0x581a2f04f15d]
Mar 12 16:12:15  ceph-osd[5480]:  5: (CommonSafeTimer<std::mutex>::timer_thread()+0x129) [0x581a2f74ed39]
Mar 12 16:12:15  ceph-osd[5480]:  6: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x581a2f74feb1]
Mar 12 16:12:15  ceph-osd[5480]:  7: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x77ec78a9db7b]
Mar 12 16:12:15  ceph-osd[5480]:  8: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x77ec78b1b7b8]
Mar 12 16:12:15  ceph-osd[5480]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
pveversion -v
Code:
proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.4 (running version: 9.1.4/5ac30304265fbd8e)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
proxmox-kernel-6.8: 6.8.12-15
proxmox-kernel-6.8.12-15-pve-signed: 6.8.12-15
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph: 19.2.3-pve2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.4
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.4
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
openvswitch-switch: 3.5.0-1+b1
proxmox-backup-client: 4.1.1-1
proxmox-backup-file-restore: 4.1.1-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.3
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

Thanks in advance
 
Last edited:
The crashed OSDs have very diffrent age, some are just a few days. But some are older then 4 years.
So at first glance it seems no related with creation time or version.