I've been trying to figure this out for over a week and i'm getting nowhere. I have 3 machines with identical hardware,, each with 3 enterprise nvme drives. 2x 4tb samsung m.2 pm983, and 1x 8 tb samsung u.2 pm983a (i think this is an oem drive for amazon).
For some reason PVE2 keeps getting ceph crashes and errors. Earlier today i destroyed all of the OSD's on PVE2, destroyed cep-mon, ceph-mgr, the ceph-mds, and deleted pve2 from the crush map, did a secure erase on all the drives, and then re-added everything to the cluster. I don't see anything wrong in the smart data, nothing in the OS to indicate hardware issues, nothing to indicate drive issues. I've run memtest86+, run various benchmarks on the machine as a stress test, but can't replicate any of the issues i've been having on PVE2.
Despite all of that PVE had a crash a couple hours AFTER the recovery finished with ceph-mon and ceph-mgr...
OSD 0, 3 and 8 are all on PVE2. I'm not getting crashes on PVE3 or PVE4. (there is no PVE1, it's retired).
For some reason PVE2 keeps getting ceph crashes and errors. Earlier today i destroyed all of the OSD's on PVE2, destroyed cep-mon, ceph-mgr, the ceph-mds, and deleted pve2 from the crush map, did a secure erase on all the drives, and then re-added everything to the cluster. I don't see anything wrong in the smart data, nothing in the OS to indicate hardware issues, nothing to indicate drive issues. I've run memtest86+, run various benchmarks on the machine as a stress test, but can't replicate any of the issues i've been having on PVE2.
Despite all of that PVE had a crash a couple hours AFTER the recovery finished with ceph-mon and ceph-mgr...
OSD 0, 3 and 8 are all on PVE2. I'm not getting crashes on PVE3 or PVE4. (there is no PVE1, it's retired).
Code:
root@pve2:/var/log/ceph# ceph crash ls
ID ENTITY NEW
2025-03-08T09:02:04.422973Z_0a1ef0d0-a2e1-409b-b906-dbdf60e99b42 osd.3
2025-04-06T12:21:39.334260Z_eac93866-64e1-42fd-8684-5e6bd6b78ed8 osd.3
2025-04-30T22:22:23.606109Z_6f8a53d4-2a79-4a15-beeb-f4b47933a728 osd.8
2025-05-01T23:54:28.590957Z_662ba25a-4d1f-40b0-8535-e5669cde47cb osd.8
2025-05-04T03:20:37.659471Z_10e04b3e-d68a-4c1b-bc49-853a795b44ba osd.0
2025-05-04T03:20:38.764210Z_10a46c7a-00fd-40d3-8700-911e1483d5ee osd.8
2025-05-04T06:17:33.229329Z_136203d3-690b-475f-a540-c4a9b330b771 osd.0
2025-05-07T12:55:45.875588Z_4b98d3b1-7d78-498b-a5e9-988a60d568fd mgr.pve2
2025-05-08T22:59:06.340977Z_3977e900-d87b-45ea-872e-f737804f78e3 osd.0
2025-05-09T00:40:51.579106Z_d849e41e-889e-485c-a8e2-8f7a6c6dd03e mgr.pve2
2025-05-09T03:30:05.519605Z_1cf6f13e-0de2-4faa-8856-e1b9ff313555 osd.8
2025-05-09T15:18:02.068671Z_3ceff48b-e47e-4fa4-b90b-3beb6df219ac mon.pve2
2025-05-09T15:19:42.533056Z_d3d60b71-dec2-46be-8843-70e10202f576 mon.pve2
2025-05-09T15:21:07.883912Z_bf737bea-f9df-4d21-9669-9279b8169c6e mon.pve2
2025-05-09T15:23:25.558031Z_fe4671ca-a628-41d1-a116-8dc7f5eff2db mon.pve2
2025-05-09T15:23:41.865995Z_4f6e50e9-f441-47d6-afdf-281e5917d19f mon.pve2
2025-05-09T15:23:55.375545Z_0c30f2bc-0861-4226-888c-5a51acd0bb91 mon.pve2
2025-05-09T15:30:01.815767Z_b66aaf1d-781f-4195-bcdc-6434008c35f6 osd.0
2025-05-10T00:20:16.062190Z_87482866-80ec-4ff9-99bb-3b5c9825eb0c osd.0
2025-05-10T05:17:52.227734Z_f71eec52-dc6c-4f0c-ae40-238841694c1b osd.8
2025-05-10T17:22:49.941053Z_4119cb0a-9443-4bff-8ee6-5aebac3a56db osd.0
2025-05-10T19:25:38.840688Z_ac59de83-5e5f-4341-964e-705baf3e8e0b osd.8
2025-05-11T10:22:16.202949Z_23e0457e-43b8-4620-85d3-e4b358e1b387 mon.pve2
2025-05-11T23:34:38.100949Z_3d496dce-4f19-44f9-8112-ba2e5ab4ac5c osd.8
2025-05-14T01:28:48.230247Z_dfdf3317-1c63-496a-a2fe-fc218ab5e81f osd.0
2025-05-15T03:53:57.244401Z_45ca5c58-f23f-4453-b30c-19ce41716b86 mgr.pve2
2025-05-15T08:46:37.737160Z_bcaf4a75-bfa3-459a-b07d-cba600308f52 osd.0
2025-05-15T08:46:56.291875Z_b5214405-c5a5-4d14-82e5-63bcbf6465f7 osd.0
2025-05-15T08:47:16.315073Z_750f8f25-8b2b-4f4f-8241-4e60b603772a osd.0
2025-05-15T08:47:37.188887Z_1efa19e5-0eed-4544-b56c-96be6ca2ac43 osd.0
2025-05-15T09:28:15.079537Z_71d39be8-d494-4b60-8185-b15c89a51099 osd.0
2025-05-15T09:28:35.087848Z_3d9128f1-21b3-4536-b471-98bacac9d35b osd.0
2025-05-15T09:28:55.474514Z_ec7455c3-1194-4284-8591-06e032256218 osd.0
2025-05-15T09:33:11.154229Z_59dd26a9-9b88-49c0-95a2-7058da251494 osd.0
2025-05-15T09:33:32.172022Z_fbb9eca4-0d31-4c42-8dda-293c8ddba991 osd.0
2025-05-15T09:33:52.204986Z_92606717-45b4-44ba-a2b6-9a6457efd1f7 osd.0
2025-05-15T09:35:07.614505Z_466e90ae-2166-441c-a6db-2e4dc4d2ed3b osd.0
2025-05-15T09:35:22.775400Z_f4c915c8-6ed8-42bc-929b-01ccbc38f2be osd.0
2025-05-15T09:35:43.260798Z_b818083e-3772-470f-8101-383fe5d59d71 osd.0
2025-05-15T09:44:31.592511Z_c61766b5-5794-46dd-835d-0c243470e2f0 osd.0
2025-05-15T09:44:52.002511Z_f7955f15-56c9-4ac1-a9ce-50f72e1e06d9 osd.0
2025-05-15T09:45:08.332280Z_327c13fd-34ff-4ccd-93d6-f1881b042d69 osd.0
2025-05-16T18:49:03.356772Z_211e8571-471e-44fd-93e4-9d5f84af145a mon.pve2
2025-05-16T19:02:08.462218Z_d1f5fd9e-1eca-4f0a-881b-33b971570f25 mon.pve2
2025-05-17T04:50:50.408230Z_89a406f1-6f21-4586-97ca-d8ec9a262f0f osd.8
2025-05-17T14:47:31.097463Z_85173b19-3f5b-4699-a01a-54c3fcdcc60f osd.3
2025-05-18T15:04:22.321243Z_231c0c05-426d-4db0-aa95-62ebd5bd2b94 mgr.pve2
2025-05-19T00:52:56.602501Z_e8882ff1-3d42-476e-bf7a-1b14078da96e osd.3
2025-05-19T02:07:45.252358Z_bee8ebde-4b7c-43f9-af13-fcc9d831de30 osd.8
2025-05-19T04:11:09.443414Z_a6934bcf-6473-49cc-a65f-fb3d5d4156d9 mon.pve2
2025-05-19T04:12:04.076445Z_b6746f4f-b3d8-4cce-a4dd-9ce251ef8a73 mon.pve2
2025-05-19T04:15:38.313004Z_03415c9e-9890-4251-aa79-40db1ecbc420 mon.pve2
2025-05-19T04:16:29.763155Z_29f7a38f-7eea-4cd3-8f72-147231e8eeef mon.pve2
2025-05-19T04:16:46.161168Z_b3ec4ac8-15a3-462d-bd1a-e25394ffe0c1 mon.pve2
2025-05-19T04:17:09.074646Z_8189c791-b471-4c98-80d4-4c63fc6a5001 mon.pve2
2025-05-19T08:22:19.989183Z_0a41b287-b7a3-4d5b-a3f5-2ef0bd53519f mgr.pve2
2025-05-19T15:45:30.822715Z_037ba8de-c0d9-4dc8-9870-948b3f7eeb9e osd.0
2025-05-19T20:18:55.358298Z_90b4ecaf-846a-42ee-9b13-8e2cb3533e77 osd.3
2025-05-19T20:19:18.552775Z_ad18c3a4-493c-48f6-85e8-e7fc44062ab5 osd.3
2025-05-20T15:23:33.755591Z_237d9241-8d67-4512-a1eb-2eb2f48c9fe5 mon.pve2
2025-05-20T15:23:45.345998Z_93ca8888-7824-4de2-bb54-35ac4e66c276 mon.pve2
2025-05-20T15:23:56.857746Z_0b1c47d2-9319-4ce9-9dfa-bcdafd26884d mon.pve2
2025-05-20T15:24:08.347289Z_74daf345-6d40-466b-88bd-a71126ddd6e5 mon.pve2
2025-05-20T15:24:21.449721Z_7ac2e9f2-6f39-48ff-96b2-3d7c6f860633 mon.pve2
2025-05-21T00:17:06.437811Z_5cbe3490-baf7-4ca6-bb93-9c9d7d4cdd52 mgr.pve2 *
2025-05-21T01:40:27.574018Z_4da27986-0c78-472e-aab0-0c8951043693 mon.pve2 *
Code:
May 20 21:40:27 pve2 ceph-mon[2005]: *** Caught signal (Segmentation fault) **
May 20 21:40:27 pve2 ceph-mon[2005]: in thread 78b0c46946c0 thread_name:rocksdb:low
May 20 21:40:27 pve2 ceph-mon[2005]: ceph version 18.2.7 (4cac8341a72477c60a6f153f3ed344b49870c932) reef (stable)
May 20 21:40:27 pve2 ceph-mon[2005]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x78b0c7bf5050]
May 20 21:40:27 pve2 ceph-mon[2005]: 2: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x1b) [0x634fedb6584b]
May 20 21:40:27 pve2 ceph-mon[2005]: 3: (rocksdb::BlockBuilder::AddWithLastKeyImpl(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*, unsigned long)+0x15a) [0x634fedafa61a]
May 20 21:40:27 pve2 ceph-mon[2005]: 4: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::SubcompactionState*)+0x14c6) [0x634feda5f776]
May 20 21:40:27 pve2 ceph-mon[2005]: 5: (rocksdb::CompactionJob::Run()+0x338) [0x634feda61878]
May 20 21:40:27 pve2 ceph-mon[2005]: 6: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0xda7) [0x634fed751027]
May 20 21:40:27 pve2 ceph-mon[2005]: 7: (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0x141) [0x634fed753261]
May 20 21:40:27 pve2 ceph-mon[2005]: 8: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x87) [0x634fed753b67]
May 20 21:40:27 pve2 ceph-mon[2005]: 9: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x539) [0x634fedb0b7b9]
May 20 21:40:27 pve2 ceph-mon[2005]: 10: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x64) [0x634fedb0bd64]
May 20 21:40:27 pve2 ceph-mon[2005]: 11: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd44a3) [0x78b0c7f6e4a3]
May 20 21:40:27 pve2 ceph-mon[2005]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x891f5) [0x78b0c7c421f5]
May 20 21:40:27 pve2 ceph-mon[2005]: 13: /lib/x86_64-linux-gnu/libc.so.6(+0x10989c) [0x78b0c7cc289c]
May 20 21:40:27 pve2 ceph-mon[2005]: 2025-05-20T21:40:27.572-0400 78b0c46946c0 -1 *** Caught signal (Segmentation fault) **
May 20 21:40:27 pve2 ceph-mon[2005]: in thread 78b0c46946c0 thread_name:rocksdb:low
May 20 21:40:27 pve2 ceph-mon[2005]: ceph version 18.2.7 (4cac8341a72477c60a6f153f3ed344b49870c932) reef (stable)
May 20 21:40:27 pve2 ceph-mon[2005]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x78b0c7bf5050]
May 20 21:40:27 pve2 ceph-mon[2005]: 2: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x1b) [0x634fedb6584b]
May 20 21:40:27 pve2 ceph-mon[2005]: 3: (rocksdb::BlockBuilder::AddWithLastKeyImpl(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*, unsigned long)+0x15a) [0x634fedafa61a]
May 20 21:40:27 pve2 ceph-mon[2005]: 4: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::SubcompactionState*)+0x14c6) [0x634feda5f776]
May 20 21:40:27 pve2 ceph-mon[2005]: 5: (rocksdb::CompactionJob::Run()+0x338) [0x634feda61878]
May 20 21:40:27 pve2 ceph-mon[2005]: 6: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0xda7) [0x634fed751027]
May 20 21:40:27 pve2 ceph-mon[2005]: 7: (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0x141) [0x634fed753261]
May 20 21:40:27 pve2 ceph-mon[2005]: 8: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x87) [0x634fed753b67]
May 20 21:40:27 pve2 ceph-mon[2005]: 9: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x539) [0x634fedb0b7b9]
May 20 21:40:27 pve2 ceph-mon[2005]: 10: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x64) [0x634fedb0bd64]
May 20 21:40:27 pve2 ceph-mon[2005]: 11: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd44a3) [0x78b0c7f6e4a3]
May 20 21:40:27 pve2 ceph-mon[2005]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x891f5) [0x78b0c7c421f5]
May 20 21:40:27 pve2 ceph-mon[2005]: 13: /lib/x86_64-linux-gnu/libc.so.6(+0x10989c) [0x78b0c7cc289c]
May 20 21:40:27 pve2 ceph-mon[2005]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 20 21:40:27 pve2 ceph-mon[2005]: 0> 2025-05-20T21:40:27.572-0400 78b0c46946c0 -1 *** Caught signal (Segmentation fault) **
May 20 21:40:27 pve2 ceph-mon[2005]: in thread 78b0c46946c0 thread_name:rocksdb:low
May 20 21:40:27 pve2 ceph-mon[2005]: ceph version 18.2.7 (4cac8341a72477c60a6f153f3ed344b49870c932) reef (stable)
May 20 21:40:27 pve2 ceph-mon[2005]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x78b0c7bf5050]
May 20 21:40:27 pve2 ceph-mon[2005]: 2: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x1b) [0x634fedb6584b]
May 20 21:40:27 pve2 ceph-mon[2005]: 3: (rocksdb::BlockBuilder::AddWithLastKeyImpl(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*, unsigned long)+0x15a) [0x634fedafa61a]
May 20 21:40:27 pve2 ceph-mon[2005]: 4: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::SubcompactionState*)+0x14c6) [0x634feda5f776]
May 20 21:40:27 pve2 ceph-mon[2005]: 5: (rocksdb::CompactionJob::Run()+0x338) [0x634feda61878]
May 20 21:40:27 pve2 ceph-mon[2005]: 6: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0xda7) [0x634fed751027]
May 20 21:40:27 pve2 ceph-mon[2005]: 7: (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0x141) [0x634fed753261]
May 20 21:40:27 pve2 ceph-mon[2005]: 8: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x87) [0x634fed753b67]
May 20 21:40:27 pve2 ceph-mon[2005]: 9: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x539) [0x634fedb0b7b9]
May 20 21:40:27 pve2 ceph-mon[2005]: 10: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x64) [0x634fedb0bd64]
May 20 21:40:27 pve2 ceph-mon[2005]: 11: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd44a3) [0x78b0c7f6e4a3]
May 20 21:40:27 pve2 ceph-mon[2005]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x891f5) [0x78b0c7c421f5]
May 20 21:40:27 pve2 ceph-mon[2005]: 13: /lib/x86_64-linux-gnu/libc.so.6(+0x10989c) [0x78b0c7cc289c]
May 20 21:40:27 pve2 ceph-mon[2005]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 20 21:40:27 pve2 ceph-mon[2005]: 0> 2025-05-20T21:40:27.572-0400 78b0c46946c0 -1 *** Caught signal (Segmentation fault) **
May 20 21:40:27 pve2 ceph-mon[2005]: in thread 78b0c46946c0 thread_name:rocksdb:low
May 20 21:40:27 pve2 ceph-mon[2005]: ceph version 18.2.7 (4cac8341a72477c60a6f153f3ed344b49870c932) reef (stable)
May 20 21:40:27 pve2 ceph-mon[2005]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x78b0c7bf5050]
May 20 21:40:27 pve2 ceph-mon[2005]: 2: (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_append(char const*, unsigned long)+0x1b) [0x634fedb6584b]
May 20 21:40:27 pve2 ceph-mon[2005]: 3: (rocksdb::BlockBuilder::AddWithLastKeyImpl(rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const&, rocksdb::Slice const*, unsigned long)+0x15a) [0x634fedafa61a]
May 20 21:40:27 pve2 ceph-mon[2005]: 4: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::SubcompactionState*)+0x14c6) [0x634feda5f776]
May 20 21:40:27 pve2 ceph-mon[2005]: 5: (rocksdb::CompactionJob::Run()+0x338) [0x634feda61878]
May 20 21:40:27 pve2 ceph-mon[2005]: 6: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0xda7) [0x634fed751027]
May 20 21:40:27 pve2 ceph-mon[2005]: 7: (rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)+0x141) [0x634fed753261]
May 20 21:40:27 pve2 ceph-mon[2005]: 8: (rocksdb::DBImpl::BGWorkCompaction(void*)+0x87) [0x634fed753b67]
May 20 21:40:27 pve2 ceph-mon[2005]: 9: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x539) [0x634fedb0b7b9]
May 20 21:40:27 pve2 ceph-mon[2005]: 10: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x64) [0x634fedb0bd64]
May 20 21:40:27 pve2 ceph-mon[2005]: 11: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xd44a3) [0x78b0c7f6e4a3]
May 20 21:40:27 pve2 ceph-mon[2005]: 12: /lib/x86_64-linux-gnu/libc.so.6(+0x891f5) [0x78b0c7c421f5]
May 20 21:40:27 pve2 ceph-mon[2005]: 13: /lib/x86_64-linux-gnu/libc.so.6(+0x10989c) [0x78b0c7cc289c]
May 20 21:40:27 pve2 ceph-mon[2005]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 20 21:40:27 pve2 systemd[1]: ceph-mon@pve2.service: Main process exited, code=killed, status=11/SEGV
May 20 21:40:27 pve2 systemd[1]: ceph-mon@pve2.service: Failed with result 'signal'.
May 20 21:40:27 pve2 systemd[1]: ceph-mon@pve2.service: Consumed 1min 5.580s CPU time.
May 20 21:40:37 pve2 systemd[1]: ceph-mon@pve2.service: Scheduled restart job, restart counter is at 1.
May 20 21:40:37 pve2 systemd[1]: Stopped ceph-mon@pve2.service - Ceph cluster monitor daemon.
May 20 21:40:37 pve2 systemd[1]: ceph-mon@pve2.service: Consumed 1min 5.580s CPU time.
May 20 21:40:37 pve2 systemd[1]: Started ceph-mon@pve2.service - Ceph cluster monitor daemon.
Last edited: