We've recently (yesterday) updated our test-cluster to the latest PVE-Version. While rebooting the system (upgrade finished without any incidents), all OSDs on each system crashed:
More
I'd be happy to provide full log if necessary.
** File Read Latency Histogram By Level [default] **
2023-01-30T10:21:52.827+0100 7f5f16fd1700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
2023-01-30T10:21:52.827+0100 7f5f16fd1700 -1 osd.12 1500643 *** Got signal Terminated ***
2023-01-30T10:21:52.827+0100 7f5f16fd1700 0 osd.12 1500643 Fast Shutdown: - cct->_conf->osd_fast_shutdown = 1, null-fm = 1
2023-01-30T10:21:52.827+0100 7f5f16fd1700 -1 osd.12 1500643 *** Immediate shutdown (osd_fast_shutdown=true) ***
2023-01-30T10:21:52.827+0100 7f5f16fd1700 0 osd.12 1500643 prepare_to_stop telling mon we are shutting down and dead
2023-01-30T10:21:57.827+0100 7f5f16fd1700 0 osd.12 1500643 prepare_to_stop starting shutdown
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) allocation stats probe 19: cnt: 467772 frags: 490381 size: 3361779712
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) probe -1: 3894856, 4409481, 24518307840
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) probe -3: 3473756, 3764265, 22888685568
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) probe -7: 3478652, 3500519, 24553218048
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) probe -11: 3447472, 3566519, 23551668224
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) probe -19: 3310749, 3468410, 22454980608
2023-01-30T10:21:57.835+0100 7f5f0ad12700 0 bluestore(/var/lib/ceph/osd/ceph-12) ------------
2023-01-30T10:21:57.835+0100 7f5f16fd1700 4 rocksdb: [db/db_impl/db_impl.cc:446] Shutdown: canceling all background work
2023-01-30T10:21:57.835+0100 7f5f16fd1700 4 rocksdb: [db/db_impl/db_impl.cc:625] Shutdown complete
2023-01-30T10:21:58.575+0100 7f5f16fd1700 1 bluefs umount
2023-01-30T10:21:58.575+0100 7f5f16fd1700 1 bdev(0x56408272dc00 /var/lib/ceph/osd/ceph-12/block) close
2023-01-30T10:21:58.859+0100 7f5f16fd1700 1 freelist shutdown
2023-01-30T10:21:59.083+0100 7f5f16fd1700 1 fbmap_alloc 0x564081c61440 shutdown
2023-01-30T10:21:59.083+0100 7f5f16fd1700 1 bdev(0x56408272c000 /var/lib/ceph/osd/ceph-12/block) close
2023-01-30T10:22:15.983+0100 7f5f16fd1700 -1 ./src/osd/OSD.cc: In function 'int OSD::shutdown()' thread 7f5f16fd1700 time 2023-01-30T10:22:15.981875+0100
./src/osd/OSD.cc: 4340: FAILED ceph_assert(end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout)
ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x56407edccf70]
2: /usr/bin/ceph-osd(+0xc2310e) [0x56407edcd10e]
3: (OSD::shutdown()+0x135d) [0x56407eec287d]
4: (SignalHandler::entry()+0x648) [0x56407f548408]
5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f5f1ae83ea7]
6: clone()
2023-01-30T10:22:15.987+0100 7f5f16fd1700 -1 *** Caught signal (Aborted) **
in thread 7f5f16fd1700 thread_name:signal_handler
ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f5f1ae8f140]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17e) [0x56407edccfca]
5: /usr/bin/ceph-osd(+0xc2310e) [0x56407edcd10e]
6: (OSD::shutdown()+0x135d) [0x56407eec287d]
7: (SignalHandler::entry()+0x648) [0x56407f548408]
8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f5f1ae83ea7]
9: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
More
-8> 2023-01-30T10:22:12.663+0100 7f5f08fd1700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-01-30T10:21:42.665682+0100)
-7> 2023-01-30T10:22:13.663+0100 7f5f08fd1700 10 monclient: tick
-6> 2023-01-30T10:22:13.663+0100 7f5f08fd1700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-01-30T10:21:43.665865+0100)
-5> 2023-01-30T10:22:14.663+0100 7f5f08fd1700 10 monclient: tick
-4> 2023-01-30T10:22:14.663+0100 7f5f08fd1700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-01-30T10:21:44.666015+0100)
-3> 2023-01-30T10:22:15.663+0100 7f5f08fd1700 10 monclient: tick
-2> 2023-01-30T10:22:15.663+0100 7f5f08fd1700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-01-30T10:21:45.666166+0100)
-1> 2023-01-30T10:22:15.983+0100 7f5f16fd1700 -1 ./src/osd/OSD.cc: In function 'int OSD::shutdown()' thread 7f5f16fd1700 time 2023-01-30T10:22:15.981875+0100
./src/osd/OSD.cc: 4340: FAILED ceph_assert(end_time - start_time_func < cct->_conf->osd_fast_shutdown_timeout)
ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x56407edccf70]
2: /usr/bin/ceph-osd(+0xc2310e) [0x56407edcd10e]
3: (OSD::shutdown()+0x135d) [0x56407eec287d]
4: (SignalHandler::entry()+0x648) [0x56407f548408]
5: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f5f1ae83ea7]
6: clone()
0> 2023-01-30T10:22:15.987+0100 7f5f16fd1700 -1 *** Caught signal (Aborted) **
in thread 7f5f16fd1700 thread_name:signal_handler
ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)
1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f5f1ae8f140]
2: gsignal()
3: abort()
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17e) [0x56407edccfca]
5: /usr/bin/ceph-osd(+0xc2310e) [0x56407edcd10e]
6: (OSD::shutdown()+0x135d) [0x56407eec287d]
7: (SignalHandler::entry()+0x648) [0x56407f548408]
8: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f5f1ae83ea7]
9: clone()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- logging levels ---
I'd be happy to provide full log if necessary.