ceph-osd segafults after update to 8.2.2 (segfault at 7d69a5a00990 error 4 in libc.so.6 )

Dec 6, 2022
51
18
8
I updatet my Proxmox cluster yesterday.
It is now running Kernel 6.8.4-2pve, and ceph 8.2.2 .

Shortly after the update i got a segfault.

Code:
[ +21.098693] ceph-osd[40129]: segfault at 7d69a5a00990 ip 00007d69b6ca97f2 sp 00007ffd3b048c80 error 4 in libc.so.6[7d69b6c45000+155000] likely on CPU 8 (core 16, socket 0)
[  +0.000009] Code: 00 48 89 44 24 28 31 c0 48 83 fa 0f 0f 86 9e 00 00 00 48 89 f5 64 48 39 3c 25 10 00 00 00 0f 84 c4 00 00 00 49 89 e4 48 89 d3 <8b> 97 d0 02 00 00 31 c0 4c 89 e7 48 8d 35 c8 c6 10 00 e8 b7 e4 fc
[May 7 18:51] libceph (4dd8285e-348f-461e-9d6b-53a2ad516449 e106217): osd0 weight 0x0 (out)

Only 1 of our 11 OSDs is having this problem for now.
So i don't know if this is a bug in ceph, proxmox or if the disk is not ok.
But if the disk is not ok i should get IO errors in demsg right?
(The disks s.m.a.r.t. status is "passed" and "7% wearout")

I postponed updating our big cluster for now.


Here is the full log of the last OSD retsrat attempt :

Code:
May 07 18:42:42 proxh2 systemd[1]: Starting ceph-osd@0.service - Ceph object storage daemon osd.0...
May 07 18:42:42 proxh2 systemd[1]: Started ceph-osd@0.service - Ceph object storage daemon osd.0.
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, uint64_t, uint64_t)' thread 7d69b5e516c0 time 2024-05-07T18:42>
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: 18880: FAILED ceph_assert((length & min_alloc_size_mask) == 0)
May 07 18:42:53 proxh2 ceph-osd[40129]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x579395f110f1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  2: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  3: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]:  4: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]:  5: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]:  6: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  7: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [0>
May 07 18:42:53 proxh2 ceph-osd[40129]:  8: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]:  9: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]:  10: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]:  11: main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]:  13: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  14: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: *** Caught signal (Aborted) **
May 07 18:42:53 proxh2 ceph-osd[40129]:  in thread 7d69b5e516c0 thread_name:ceph-osd
May 07 18:42:53 proxh2 ceph-osd[40129]: 2024-05-07T18:42:53.326+0200 7d69b5e516c0 -1 ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, uint64_t, uint64_>
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: 18880: FAILED ceph_assert((length & min_alloc_size_mask) == 0)
May 07 18:42:53 proxh2 ceph-osd[40129]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x579395f110f1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  2: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  3: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]:  4: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]:  5: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]:  6: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  7: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [0>
May 07 18:42:53 proxh2 ceph-osd[40129]:  8: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]:  9: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]:  10: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]:  11: main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]:  13: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  14: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7d69b6c5b050]
May 07 18:42:53 proxh2 ceph-osd[40129]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7d69b6ca9e2c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  3: gsignal()
May 07 18:42:53 proxh2 ceph-osd[40129]:  4: abort()
May 07 18:42:53 proxh2 ceph-osd[40129]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x579395f1114c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  6: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  7: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]:  8: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]:  9: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]:  10: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  11: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [>
May 07 18:42:53 proxh2 ceph-osd[40129]:  12: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]:  13: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]:  14: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]:  15: main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  16: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]:  17: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  18: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: 2024-05-07T18:42:53.329+0200 7d69b5e516c0 -1 *** Caught signal (Aborted) **
May 07 18:42:53 proxh2 ceph-osd[40129]:  in thread 7d69b5e516c0 thread_name:ceph-osd
May 07 18:42:53 proxh2 ceph-osd[40129]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7d69b6c5b050]
May 07 18:42:53 proxh2 ceph-osd[40129]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7d69b6ca9e2c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  3: gsignal()
May 07 18:42:53 proxh2 ceph-osd[40129]:  4: abort()
May 07 18:42:53 proxh2 ceph-osd[40129]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x579395f1114c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  6: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  7: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]:  8: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]:  9: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]:  10: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  11: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [>
May 07 18:42:53 proxh2 ceph-osd[40129]:  12: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]:  13: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]:  14: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]:  15: main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  16: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]:  17: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  18: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 07 18:42:53 proxh2 ceph-osd[40129]:     -1> 2024-05-07T18:42:53.326+0200 7d69b5e516c0 -1 ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, uint64_t,>
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: 18880: FAILED ceph_assert((length & min_alloc_size_mask) == 0)
May 07 18:42:53 proxh2 ceph-osd[40129]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x579395f110f1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  2: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  3: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]:  4: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]:  5: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]:  6: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  7: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [0>
May 07 18:42:53 proxh2 ceph-osd[40129]:  8: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]:  9: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]:  10: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]:  11: main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]:  13: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  14: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]:      0> 2024-05-07T18:42:53.329+0200 7d69b5e516c0 -1 *** Caught signal (Aborted) **
May 07 18:42:53 proxh2 ceph-osd[40129]:  in thread 7d69b5e516c0 thread_name:ceph-osd
May 07 18:42:53 proxh2 ceph-osd[40129]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7d69b6c5b050]
May 07 18:42:53 proxh2 ceph-osd[40129]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7d69b6ca9e2c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  3: gsignal()
May 07 18:42:53 proxh2 ceph-osd[40129]:  4: abort()
May 07 18:42:53 proxh2 ceph-osd[40129]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x579395f1114c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  6: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]:  7: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]:  8: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]:  9: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]:  10: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]:  11: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [>
May 07 18:42:53 proxh2 ceph-osd[40129]:  12: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]:  13: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]:  14: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]:  15: main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  16: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]:  17: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]:  18: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 07 18:42:53 proxh2 kernel: ceph-osd[40129]: segfault at 7d69a5a00990 ip 00007d69b6ca97f2 sp 00007ffd3b048c80 error 4 in libc.so.6[7d69b6c45000+155000] likely on CPU 8 (core 16, socket 0)
May 07 18:42:53 proxh2 kernel: Code: 00 48 89 44 24 28 31 c0 48 83 fa 0f 0f 86 9e 00 00 00 48 89 f5 64 48 39 3c 25 10 00 00 00 0f 84 c4 00 00 00 49 89 e4 48 89 d3 <8b> 97 d0 02 00 00 31 c0 4c 89 e7 48 8d 35 c8>
May 07 18:42:53 proxh2 systemd[1]: ceph-osd@0.service: Main process exited, code=killed, status=11/SEGV
May 07 18:42:53 proxh2 systemd[1]: ceph-osd@0.service: Failed with result 'signal'.
May 07 18:42:53 proxh2 systemd[1]: ceph-osd@0.service: Consumed 6.024s CPU time.
May 07 18:43:01 proxh2 snmpd[1031]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Scheduled restart job, restart counter is at 4.
May 07 18:43:03 proxh2 systemd[1]: Stopped ceph-osd@0.service - Ceph object storage daemon osd.0.
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Consumed 6.024s CPU time.
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Start request repeated too quickly.
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Failed with result 'signal'.
May 07 18:43:03 proxh2 systemd[1]: Failed to start ceph-osd@0.service - Ceph object storage daemon osd.0.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!