I updatet my Proxmox cluster yesterday.
It is now running Kernel 6.8.4-2pve, and ceph 8.2.2 .
Shortly after the update i got a segfault.
Only 1 of our 11 OSDs is having this problem for now.
So i don't know if this is a bug in ceph, proxmox or if the disk is not ok.
But if the disk is not ok i should get IO errors in demsg right?
(The disks s.m.a.r.t. status is "passed" and "7% wearout")
I postponed updating our big cluster for now.
Here is the full log of the last OSD retsrat attempt :
It is now running Kernel 6.8.4-2pve, and ceph 8.2.2 .
Shortly after the update i got a segfault.
Code:
[ +21.098693] ceph-osd[40129]: segfault at 7d69a5a00990 ip 00007d69b6ca97f2 sp 00007ffd3b048c80 error 4 in libc.so.6[7d69b6c45000+155000] likely on CPU 8 (core 16, socket 0)
[ +0.000009] Code: 00 48 89 44 24 28 31 c0 48 83 fa 0f 0f 86 9e 00 00 00 48 89 f5 64 48 39 3c 25 10 00 00 00 0f 84 c4 00 00 00 49 89 e4 48 89 d3 <8b> 97 d0 02 00 00 31 c0 4c 89 e7 48 8d 35 c8 c6 10 00 e8 b7 e4 fc
[May 7 18:51] libceph (4dd8285e-348f-461e-9d6b-53a2ad516449 e106217): osd0 weight 0x0 (out)
Only 1 of our 11 OSDs is having this problem for now.
So i don't know if this is a bug in ceph, proxmox or if the disk is not ok.
But if the disk is not ok i should get IO errors in demsg right?
(The disks s.m.a.r.t. status is "passed" and "7% wearout")
I postponed updating our big cluster for now.
Here is the full log of the last OSD retsrat attempt :
Code:
May 07 18:42:42 proxh2 systemd[1]: Starting ceph-osd@0.service - Ceph object storage daemon osd.0...
May 07 18:42:42 proxh2 systemd[1]: Started ceph-osd@0.service - Ceph object storage daemon osd.0.
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, uint64_t, uint64_t)' thread 7d69b5e516c0 time 2024-05-07T18:42>
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: 18880: FAILED ceph_assert((length & min_alloc_size_mask) == 0)
May 07 18:42:53 proxh2 ceph-osd[40129]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x579395f110f1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 2: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 3: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]: 4: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]: 5: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]: 6: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 7: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [0>
May 07 18:42:53 proxh2 ceph-osd[40129]: 8: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]: 9: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]: 10: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]: 11: main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]: 13: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 14: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: *** Caught signal (Aborted) **
May 07 18:42:53 proxh2 ceph-osd[40129]: in thread 7d69b5e516c0 thread_name:ceph-osd
May 07 18:42:53 proxh2 ceph-osd[40129]: 2024-05-07T18:42:53.326+0200 7d69b5e516c0 -1 ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, uint64_t, uint64_>
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: 18880: FAILED ceph_assert((length & min_alloc_size_mask) == 0)
May 07 18:42:53 proxh2 ceph-osd[40129]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x579395f110f1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 2: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 3: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]: 4: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]: 5: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]: 6: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 7: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [0>
May 07 18:42:53 proxh2 ceph-osd[40129]: 8: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]: 9: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]: 10: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]: 11: main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]: 13: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 14: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7d69b6c5b050]
May 07 18:42:53 proxh2 ceph-osd[40129]: 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7d69b6ca9e2c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 3: gsignal()
May 07 18:42:53 proxh2 ceph-osd[40129]: 4: abort()
May 07 18:42:53 proxh2 ceph-osd[40129]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x579395f1114c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 6: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 7: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]: 8: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]: 9: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]: 10: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 11: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [>
May 07 18:42:53 proxh2 ceph-osd[40129]: 12: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]: 13: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]: 14: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]: 15: main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 16: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]: 17: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 18: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: 2024-05-07T18:42:53.329+0200 7d69b5e516c0 -1 *** Caught signal (Aborted) **
May 07 18:42:53 proxh2 ceph-osd[40129]: in thread 7d69b5e516c0 thread_name:ceph-osd
May 07 18:42:53 proxh2 ceph-osd[40129]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7d69b6c5b050]
May 07 18:42:53 proxh2 ceph-osd[40129]: 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7d69b6ca9e2c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 3: gsignal()
May 07 18:42:53 proxh2 ceph-osd[40129]: 4: abort()
May 07 18:42:53 proxh2 ceph-osd[40129]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x579395f1114c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 6: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 7: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]: 8: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]: 9: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]: 10: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 11: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [>
May 07 18:42:53 proxh2 ceph-osd[40129]: 12: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]: 13: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]: 14: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]: 15: main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 16: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]: 17: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 18: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 07 18:42:53 proxh2 ceph-osd[40129]: -1> 2024-05-07T18:42:53.326+0200 7d69b5e516c0 -1 ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, uint64_t,>
May 07 18:42:53 proxh2 ceph-osd[40129]: ./src/os/bluestore/BlueStore.cc: 18880: FAILED ceph_assert((length & min_alloc_size_mask) == 0)
May 07 18:42:53 proxh2 ceph-osd[40129]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x579395f110f1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 2: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 3: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]: 4: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]: 5: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]: 6: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 7: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [0>
May 07 18:42:53 proxh2 ceph-osd[40129]: 8: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]: 9: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]: 10: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]: 11: main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]: 13: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 14: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: 0> 2024-05-07T18:42:53.329+0200 7d69b5e516c0 -1 *** Caught signal (Aborted) **
May 07 18:42:53 proxh2 ceph-osd[40129]: in thread 7d69b5e516c0 thread_name:ceph-osd
May 07 18:42:53 proxh2 ceph-osd[40129]: ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
May 07 18:42:53 proxh2 ceph-osd[40129]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7d69b6c5b050]
May 07 18:42:53 proxh2 ceph-osd[40129]: 2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ae2c) [0x7d69b6ca9e2c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 3: gsignal()
May 07 18:42:53 proxh2 ceph-osd[40129]: 4: abort()
May 07 18:42:53 proxh2 ceph-osd[40129]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x185) [0x579395f1114c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 6: /usr/bin/ceph-osd(+0x62628c) [0x579395f1128c]
May 07 18:42:53 proxh2 ceph-osd[40129]: 7: (BlueStore::set_allocation_in_simple_bmap(SimpleBitmap*, unsigned long, unsigned long)+0x1e0) [0x579396534150]
May 07 18:42:53 proxh2 ceph-osd[40129]: 8: (BlueStore::read_allocation_from_onodes(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x378) [0x57939658f528]
May 07 18:42:53 proxh2 ceph-osd[40129]: 9: (BlueStore::reconstruct_allocations(SimpleBitmap*, BlueStore::read_alloc_stats_t&)+0x54) [0x579396591884]
May 07 18:42:53 proxh2 ceph-osd[40129]: 10: (BlueStore::read_allocation_from_drive_on_startup()+0x101) [0x579396591aa1]
May 07 18:42:53 proxh2 ceph-osd[40129]: 11: (BlueStore::_init_alloc(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >*)+0x78b) [>
May 07 18:42:53 proxh2 ceph-osd[40129]: 12: (BlueStore::_open_db_and_around(bool, bool)+0x469) [0x5793965a3809]
May 07 18:42:53 proxh2 ceph-osd[40129]: 13: (BlueStore::_mount()+0x347) [0x5793965a5bf7]
May 07 18:42:53 proxh2 ceph-osd[40129]: 14: (OSD::init()+0x4b1) [0x57939606bb61]
May 07 18:42:53 proxh2 ceph-osd[40129]: 15: main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 16: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7d69b6c4624a]
May 07 18:42:53 proxh2 ceph-osd[40129]: 17: __libc_start_main()
May 07 18:42:53 proxh2 ceph-osd[40129]: 18: _start()
May 07 18:42:53 proxh2 ceph-osd[40129]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 07 18:42:53 proxh2 kernel: ceph-osd[40129]: segfault at 7d69a5a00990 ip 00007d69b6ca97f2 sp 00007ffd3b048c80 error 4 in libc.so.6[7d69b6c45000+155000] likely on CPU 8 (core 16, socket 0)
May 07 18:42:53 proxh2 kernel: Code: 00 48 89 44 24 28 31 c0 48 83 fa 0f 0f 86 9e 00 00 00 48 89 f5 64 48 39 3c 25 10 00 00 00 0f 84 c4 00 00 00 49 89 e4 48 89 d3 <8b> 97 d0 02 00 00 31 c0 4c 89 e7 48 8d 35 c8>
May 07 18:42:53 proxh2 systemd[1]: ceph-osd@0.service: Main process exited, code=killed, status=11/SEGV
May 07 18:42:53 proxh2 systemd[1]: ceph-osd@0.service: Failed with result 'signal'.
May 07 18:42:53 proxh2 systemd[1]: ceph-osd@0.service: Consumed 6.024s CPU time.
May 07 18:43:01 proxh2 snmpd[1031]: systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Scheduled restart job, restart counter is at 4.
May 07 18:43:03 proxh2 systemd[1]: Stopped ceph-osd@0.service - Ceph object storage daemon osd.0.
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Consumed 6.024s CPU time.
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Start request repeated too quickly.
May 07 18:43:03 proxh2 systemd[1]: ceph-osd@0.service: Failed with result 'signal'.
May 07 18:43:03 proxh2 systemd[1]: Failed to start ceph-osd@0.service - Ceph object storage daemon osd.0.