Ceph version: 14.2.11
There is a PG cause the OSDs in acting set crash whenever it enter backfilling state. I have to set nobackfill for now, so that osds don't flap.
Here is the osd log:
PG query has this info (full below):
"last_backfill_started": "6:33f11f71:::rbd_data.4ce87586136379.0000000000004e07:head",
I thought this object causes the issue, then I delete the volume that contain this object. But that doesn't help anything :-(
There is a PG cause the OSDs in acting set crash whenever it enter backfilling state. I have to set nobackfill for now, so that osds don't flap.
Here is the osd log:
Code:
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/osd/osd_types.cc: 5450: FAILED ceph_assert(clone_overlap.count(clone))
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x5564805f80e5]
2: (()+0x4d72ad) [0x5564805f82ad]
3: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x5564809120e2]
4: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>, pg_stat_t*)+0x28c) [0x556480843aac]
5: (PrimaryLogPG::recover_backfill(unsigned long, ThreadPool::TPHandle&, bool*)+0xf65) [0x556480872985]
6: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0x114c) [0x5564808767ac]
7: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x2ff) [0x5564806d74ef]
8: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x556480966529]
9: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x5564806f2d3f]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x556480ca6c46]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556480ca9760]
12: (()+0x7dd5) [0x7f6d9c6a1dd5]
13: (clone()+0x6d) [0x7f6d9b567ead]
2023-01-04 08:10:36.559 7f6d79c37700 -1 *** Caught signal (Aborted) **
in thread 7f6d79c37700 thread_name:tp_osd_tp
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
1: (()+0xf5d0) [0x7f6d9c6a95d0]
2: (gsignal()+0x37) [0x7f6d9b4a0207]
3: (abort()+0x148) [0x7f6d9b4a18f8]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x5564805f8134]
5: (()+0x4d72ad) [0x5564805f82ad]
6: (SnapSet::get_clone_bytes(snapid_t) const+0xc2) [0x5564809120e2]
7: (PrimaryLogPG::add_object_context_to_pg_stat(std::shared_ptr<ObjectContext>, pg_stat_t*)+0x28c) [0x556480843aac]
8: (PrimaryLogPG::recover_backfill(unsigned long, ThreadPool::TPHandle&, bool*)+0xf65) [0x556480872985]
9: (PrimaryLogPG::start_recovery_ops(unsigned long, ThreadPool::TPHandle&, unsigned long*)+0x114c) [0x5564808767ac]
10: (OSD::do_recovery(PG*, unsigned int, unsigned long, ThreadPool::TPHandle&)+0x2ff) [0x5564806d74ef]
11: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x19) [0x556480966529]
12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x90f) [0x5564806f2d3f]
13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) [0x556480ca6c46]
14: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556480ca9760]
15: (()+0x7dd5) [0x7f6d9c6a1dd5]
16: (clone()+0x6d) [0x7f6d9b567ead]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
PG query has this info (full below):
"last_backfill_started": "6:33f11f71:::rbd_data.4ce87586136379.0000000000004e07:head",
I thought this object causes the issue, then I delete the volume that contain this object. But that doesn't help anything :-(
Attachments
Last edited: