Ceph 20.2 Tentacle Release Available as test preview and Ceph 18.2 Reef soon to be fully EOL

Hello,

here is a temporary solution to the dashboard issue:

ModuleNotFoundError: No module named “smb”


Simply disable the module on all MGRs and then restart them.

sudo mv /usr/share/ceph/mgr/dashboard/controllers/smb.py /usr/share/ceph/mgr/dashboard/controllers/smb.py.disabled

This will no longer work if a package update changes something in /usr/share/ceph/mgr/dashboard/controllers/smb.py.
Hopefully, there will be an official solution by then.

If not, the step above must be repeated.

Best regards
 
  • Like
Reactions: herzkerl
I'm having an issue with my telemetry module, if telemetry is enabled my MGR crashes.

Force Disabled Telemetry stops the Ceph MGR from crashing.

Code:
May 22 02:30:12 stark ceph-mgr[122655]: *** Caught signal (Aborted) **
May 22 02:30:12 stark ceph-mgr[122655]:  in thread 7ee31369e6c0 thread_name:telemetry
May 22 02:30:12 stark ceph-mgr[122655]: 2026-05-22T02:30:12.625+0100 7ee31369e6c0 -1 ./src/mgr/PyFormatter.h: In function 'virtual void PyFormatter::close_section()' thread 7ee31369e6c0 time 2026-05-22T02:30:12.626178+0100
May 22 02:30:12 stark ceph-mgr[122655]: ./src/mgr/PyFormatter.h: 84: FAILED ceph_assert(cursor != root)
May 22 02:30:12 stark ceph-mgr[122655]:  ceph version 20.2.1 (1846e8e84cd244e621f1395ea824e304691b5a58) tentacle (stable - None)
May 22 02:30:12 stark ceph-mgr[122655]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x137) [0x7ee34a1d6aa6]
May 22 02:30:12 stark ceph-mgr[122655]:  2: (PyFormatter::close_section()+0x90) [0x5ddd8c1cefc0]
 
I followed the instructions here Ceph Squid to Tentacle - Proxmox VE

I restarted the OSD on pve1 - all good, heath came back with just a warning for noout and telementry
I restarted the OSD on pve 2 only when the health came back and all my managers dropped offline and cannot be started


here is the log on pve2


Bash:
-- Boot bf48fd9c45ef44ca8a89f0dfbc16ea97 --
Jun 28 16:45:01 pve2 systemd[1]: Started ceph-mgr@pve2.service - Ceph cluster manager daemon.
Jun 29 00:14:16 pve2 ceph-mgr[1501]: 2026-06-29T00:14:16.695-0700 77b3a0bf86c0 -1 received  signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror cephfs-mirror  (PID: 382062) UID: 0
Jun 29 00:14:16 pve2 ceph-mgr[1501]: 2026-06-29T00:14:16.718-0700 77b3a0bf86c0 -1 received  signal: Hangup from  (PID: 382063) UID: 0
Jun 29 14:04:46 pve2 systemd[1]: ceph-mgr@pve2.service: Main process exited, code=killed, status=8/FPE
Jun 29 14:04:46 pve2 systemd[1]: ceph-mgr@pve2.service: Failed with result 'signal'.
Jun 29 14:04:46 pve2 systemd[1]: ceph-mgr@pve2.service: Consumed 1min 14.590s CPU time, 344.6M memory peak.
Jun 29 14:04:56 pve2 systemd[1]: ceph-mgr@pve2.service: Scheduled restart job, restart counter is at 1.
Jun 29 14:04:56 pve2 systemd[1]: Started ceph-mgr@pve2.service - Ceph cluster manager daemon.
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: did not load config file, using default settings.
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: ignoring --setuser ceph since I am not root
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: ignoring --setgroup ceph since I am not root
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: 2026-06-29T14:04:59.333-0700 7c73c9c4b0c0 -1 Errors while parsing config file!
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: 2026-06-29T14:04:59.333-0700 7c73c9c4b0c0 -1 can't open ceph.conf: (2) No such file or directory
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: unable to get monitor info from DNS SRV with service name: ceph-mon
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: 2026-06-29T14:04:59.336-0700 7c73c9c4b0c0 -1 failed for service _ceph-mon._tcp
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: 2026-06-29T14:04:59.336-0700 7c73c9c4b0c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Jun 29 14:04:59 pve2 ceph-mgr[1079024]: failed to fetch mon config (--no-mon-config to skip)
Jun 29 14:04:59 pve2 systemd[1]: ceph-mgr@pve2.service: Main process exited, code=exited, status=1/FAILURE
Jun 29 14:04:59 pve2 systemd[1]: ceph-mgr@pve2.service: Failed with result 'exit-code'.
Jun 29 14:04:59 pve2 systemd[1]: ceph-mgr@pve2.service: Consumed 2.387s CPU time, 299.4M memory peak.
Jun 29 14:05:09 pve2 systemd[1]: ceph-mgr@pve2.service: Scheduled restart job, restart counter is at 2.
Jun 29 14:05:09 pve2 systemd[1]: Started ceph-mgr@pve2.service - Ceph cluster manager daemon.
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: did not load config file, using default settings.
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: ignoring --setuser ceph since I am not root
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: ignoring --setgroup ceph since I am not root
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: 2026-06-29T14:05:59.650-0700 7f833b8ad0c0 -1 Errors while parsing config file!
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: 2026-06-29T14:05:59.650-0700 7f833b8ad0c0 -1 can't open ceph.conf: (2) No such file or directory
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: unable to get monitor info from DNS SRV with service name: ceph-mon
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: 2026-06-29T14:05:59.653-0700 7f833b8ad0c0 -1 failed for service _ceph-mon._tcp
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: 2026-06-29T14:05:59.653-0700 7f833b8ad0c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Jun 29 14:05:59 pve2 ceph-mgr[1079225]: failed to fetch mon config (--no-mon-config to skip)
Jun 29 14:05:59 pve2 systemd[1]: ceph-mgr@pve2.service: Main process exited, code=exited, status=1/FAILURE
Jun 29 14:05:59 pve2 systemd[1]: ceph-mgr@pve2.service: Failed with result 'exit-code'.
Jun 29 14:05:59 pve2 systemd[1]: ceph-mgr@pve2.service: Consumed 2.490s CPU time, 296.2M memory peak.
Jun 29 14:06:09 pve2 systemd[1]: ceph-mgr@pve2.service: Scheduled restart job, restart counter is at 3.
Jun 29 14:06:09 pve2 systemd[1]: Started ceph-mgr@pve2.service - Ceph cluster manager daemon.
Jun 29 14:07:31 pve2 systemd[1]: Stopping ceph-mgr@pve2.service - Ceph cluster manager daemon...
Jun 29 14:07:31 pve2 systemd[1]: ceph-mgr@pve2.service: Deactivated successfully.
Jun 29 14:07:31 pve2 systemd[1]: Stopped ceph-mgr@pve2.service - Ceph cluster manager daemon.
Jun 29 14:07:31 pve2 systemd[1]: ceph-mgr@pve2.service: Consumed 2.486s CPU time, 294.6M memory peak.
Jun 29 14:07:31 pve2 systemd[1]: ceph-mgr@pve2.service: Start request repeated too quickly.
Jun 29 14:07:31 pve2 systemd[1]: ceph-mgr@pve2.service: Failed with result 'start-limit-hit'.
Jun 29 14:07:31 pve2 systemd[1]: Failed to start ceph-mgr@pve2.service - Ceph cluster manager daemon.
Jun 29 14:11:10 pve2 systemd[1]: ceph-mgr@pve2.service: Start request repeated too quickly.
Jun 29 14:11:10 pve2 systemd[1]: ceph-mgr@pve2.service: Failed with result 'start-limit-hit'.
Jun 29 14:11:10 pve2 systemd[1]: Failed to start ceph-mgr@pve2.service - Ceph cluster manager daemon.

i tried this to no avail

Code:
root@pve2 14:21:09 ~ # systemctl reset-failed ceph-mgr@pve2
root@pve2 14:21:15 ~ # systemctl start ceph-mgr@pve2
root@pve2 14:21:23 ~ # systemctl status ceph-mgr@pve2
● ceph-mgr@pve2.service - Ceph cluster manager daemon
     Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mgr@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Mon 2026-06-29 14:21:23 PDT; 5s ago
 Invocation: c833888cfa9c46909b652c13caaa003d
   Main PID: 1094818 (ceph-mgr)
      Tasks: 144 (limit: 70525)
     Memory: 321.9M (peak: 322.3M)
        CPU: 2.670s
     CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@pve2.service
             └─1094818 /usr/bin/ceph-mgr -f --id pve2 --setuser ceph --setgroup ceph

Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.688-0700 70c6a89b56c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.688-0700 70c6a89b56c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.688-0700 70c6a89b56c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.688-0700 70c6a89b56c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.688-0700 70c6a89b56c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.689-0700 70c6a599b6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.689-0700 70c6a599b6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.689-0700 70c6a599b6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.689-0700 70c6a599b6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:21:25 pve2 ceph-mgr[1094818]: 2026-06-29T14:21:25.689-0700 70c6a599b6c0 -1 client.0 error registering admin socket command: (17) File exists
root@pve2 14:21:28 ~ #

my OSDs seem to be a healthy state, just not upgraded


Code:
root@pve1 14:08:55 ~ # ceph -s
  cluster:
    id:     5e55fd50-d135-413d-bffe-9d0fae0ef5fa
    health: HEALTH_WARN
            no active mgr
            noout flag(s) set
            1 daemons have recently crashed
            Telemetry requires re-opt-in
 
  services:
    mon: 3 daemons, quorum pve2,pve1,pve3 (age 16m) [leader: pve2]
    mgr: no daemons active (since 20s)
    mds: 3/3 daemons up, 3 standby
    osd: 6 osds: 6 up (since 14m), 6 in (since 2w)
         flags noout
 
  data:
    volumes: 3/3 healthy
    pools:   9 pools, 353 pgs
    objects: 108.36k objects, 215 GiB
    usage:   473 GiB used, 5.0 TiB / 5.4 TiB avail
    pgs:     353 active+clean
 
  io:
    client:   20 KiB/s rd, 859 KiB/s wr, 6 op/s rd, 79 op/s wr

and incase a picture helps
1782768670251.png

destroying pve3 manager (as atest) and recoreating it didn't change anything
 
Last edited:
AND

Code:
root@pve1 14:38:18 ~ # journalctl -u ceph-mgr@pve1 -n 120 --no-pager
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  4: /usr/bin/ceph-mgr(+0x18aff4) [0x5bf9164d4ff4]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  5: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x11a75b) [0x7223ba71a75b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  6: PyObject_Vectorcall()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  7: _PyEval_EvalFrameDefault()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  8: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x110af3) [0x7223ba710af3]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x10f769) [0x7223ba70f769]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  10: PyObject_CallMethod()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  11: (PyModuleRunner::serve()+0x66) [0x5bf9165a4d46]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  12: (PyModuleRunner::PyModuleRunnerThread::entry()+0x130) [0x5bf9165a54c0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  13: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7223b909eb7b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  14: /lib/x86_64-linux-gnu/libc.so.6(+0x1107f8) [0x7223b911c7f8]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:      0> 2026-06-29T14:05:42.535-0700 72238c9596c0 -1 *** Caught signal (Aborted) **
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  in thread 72238c9596c0 thread_name:telemetry
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  ceph version 20.2.1 (1846e8e84cd244e621f1395ea824e304691b5a58) tentacle (stable - None)
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x7223b904bdf0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x9495c) [0x7223b90a095c]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  3: gsignal()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  4: abort()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x196) [0x7223b988cb05]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  6: (PyFormatter::close_section()+0x90) [0x5bf9164bafc0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  7: (ActivePyModules::get_perf_schema_python(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1239) [0x5bf9164b1559]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  8: /usr/bin/ceph-mgr(+0x18aff4) [0x5bf9164d4ff4]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x11a75b) [0x7223ba71a75b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  10: PyObject_Vectorcall()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  11: _PyEval_EvalFrameDefault()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  12: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x110af3) [0x7223ba710af3]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  13: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x10f769) [0x7223ba70f769]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  14: PyObject_CallMethod()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  15: (PyModuleRunner::serve()+0x66) [0x5bf9165a4d46]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x130) [0x5bf9165a54c0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  17: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7223b909eb7b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  18: /lib/x86_64-linux-gnu/libc.so.6(+0x1107f8) [0x7223b911c7f8]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1963> 2026-06-29T14:05:32.532-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1913> 2026-06-29T14:05:32.533-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1911> 2026-06-29T14:05:32.533-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1909> 2026-06-29T14:05:32.533-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1907> 2026-06-29T14:05:32.533-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1905> 2026-06-29T14:05:32.533-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1526> 2026-06-29T14:05:32.561-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1524> 2026-06-29T14:05:32.561-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1522> 2026-06-29T14:05:32.561-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1520> 2026-06-29T14:05:32.561-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1518> 2026-06-29T14:05:32.561-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1464> 2026-06-29T14:05:32.562-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1462> 2026-06-29T14:05:32.562-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1460> 2026-06-29T14:05:32.562-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1458> 2026-06-29T14:05:32.562-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1456> 2026-06-29T14:05:32.562-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1118> 2026-06-29T14:05:32.590-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1116> 2026-06-29T14:05:32.590-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1114> 2026-06-29T14:05:32.590-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1112> 2026-06-29T14:05:32.590-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1109> 2026-06-29T14:05:32.590-0700 72238311a6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1070> 2026-06-29T14:05:32.591-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1068> 2026-06-29T14:05:32.591-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1066> 2026-06-29T14:05:32.591-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1064> 2026-06-29T14:05:32.591-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  -1062> 2026-06-29T14:05:32.591-0700 72238693d6c0 -1 client.0 error registering admin socket command: (17) File exists
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:     -8> 2026-06-29T14:05:42.534-0700 72238c9596c0 -1 ./src/mgr/PyFormatter.h: In function 'virtual void PyFormatter::close_section()' thread 72238c9596c0 time 2026-06-29T14:05:42.534705-0700
Jun 29 14:05:42 pve1 ceph-mgr[1014782]: ./src/mgr/PyFormatter.h: 84: FAILED ceph_assert(cursor != root)
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  ceph version 20.2.1 (1846e8e84cd244e621f1395ea824e304691b5a58) tentacle (stable - None)
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x137) [0x7223b988caa6]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  2: (PyFormatter::close_section()+0x90) [0x5bf9164bafc0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  3: (ActivePyModules::get_perf_schema_python(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1239) [0x5bf9164b1559]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  4: /usr/bin/ceph-mgr(+0x18aff4) [0x5bf9164d4ff4]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  5: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x11a75b) [0x7223ba71a75b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  6: PyObject_Vectorcall()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  7: _PyEval_EvalFrameDefault()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  8: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x110af3) [0x7223ba710af3]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x10f769) [0x7223ba70f769]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  10: PyObject_CallMethod()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  11: (PyModuleRunner::serve()+0x66) [0x5bf9165a4d46]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  12: (PyModuleRunner::PyModuleRunnerThread::entry()+0x130) [0x5bf9165a54c0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  13: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7223b909eb7b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  14: /lib/x86_64-linux-gnu/libc.so.6(+0x1107f8) [0x7223b911c7f8]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:     -3> 2026-06-29T14:05:42.535-0700 72238c9596c0 -1 *** Caught signal (Aborted) **
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  in thread 72238c9596c0 thread_name:telemetry
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  ceph version 20.2.1 (1846e8e84cd244e621f1395ea824e304691b5a58) tentacle (stable - None)
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x7223b904bdf0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x9495c) [0x7223b90a095c]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  3: gsignal()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  4: abort()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x196) [0x7223b988cb05]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  6: (PyFormatter::close_section()+0x90) [0x5bf9164bafc0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  7: (ActivePyModules::get_perf_schema_python(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1239) [0x5bf9164b1559]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  8: /usr/bin/ceph-mgr(+0x18aff4) [0x5bf9164d4ff4]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x11a75b) [0x7223ba71a75b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  10: PyObject_Vectorcall()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  11: _PyEval_EvalFrameDefault()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  12: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x110af3) [0x7223ba710af3]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  13: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x10f769) [0x7223ba70f769]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  14: PyObject_CallMethod()
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  15: (PyModuleRunner::serve()+0x66) [0x5bf9165a4d46]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  16: (PyModuleRunner::PyModuleRunnerThread::entry()+0x130) [0x5bf9165a54c0]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  17: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x7223b909eb7b]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  18: /lib/x86_64-linux-gnu/libc.so.6(+0x1107f8) [0x7223b911c7f8]
Jun 29 14:05:42 pve1 ceph-mgr[1014782]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jun 29 14:05:42 pve1 systemd[1]: ceph-mgr@pve1.service: Main process exited, code=killed, status=6/ABRT
Jun 29 14:05:42 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:05:42 pve1 systemd[1]: ceph-mgr@pve1.service: Consumed 2.855s CPU time, 335.3M memory peak.
Jun 29 14:05:52 pve1 systemd[1]: ceph-mgr@pve1.service: Scheduled restart job, restart counter is at 4.
Jun 29 14:05:52 pve1 systemd[1]: ceph-mgr@pve1.service: Start request repeated too quickly.
Jun 29 14:05:52 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:05:52 pve1 systemd[1]: Failed to start ceph-mgr@pve1.service - Ceph cluster manager daemon.
Jun 29 14:07:35 pve1 systemd[1]: ceph-mgr@pve1.service: Start request repeated too quickly.
Jun 29 14:07:35 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:07:35 pve1 systemd[1]: Failed to start ceph-mgr@pve1.service - Ceph cluster manager daemon.
Jun 29 14:09:39 pve1 systemd[1]: ceph-mgr@pve1.service: Start request repeated too quickly.
Jun 29 14:09:39 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:09:39 pve1 systemd[1]: Failed to start ceph-mgr@pve1.service - Ceph cluster manager daemon.
Jun 29 14:20:41 pve1 systemd[1]: ceph-mgr@pve1.service: Start request repeated too quickly.
Jun 29 14:20:41 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:20:41 pve1 systemd[1]: Failed to start ceph-mgr@pve1.service - Ceph cluster manager daemon.
Jun 29 14:27:40 pve1 systemd[1]: ceph-mgr@pve1.service: Start request repeated too quickly.
Jun 29 14:27:40 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:27:40 pve1 systemd[1]: Failed to start ceph-mgr@pve1.service - Ceph cluster manager daemon.
Jun 29 14:30:40 pve1 systemd[1]: ceph-mgr@pve1.service: Start request repeated too quickly.
Jun 29 14:30:40 pve1 systemd[1]: ceph-mgr@pve1.service: Failed with result 'signal'.
Jun 29 14:30:40 pve1 systemd[1]: Failed to start ceph-mgr@pve1.service - Ceph cluster manager daemon.
 
ok quick fix was

Code:
ceph mgr module force disable telemetry --yes-i-really-mean-it

this allowed me to finish upgrade and get things generally healthy, i wont archive the crashes for a few days incase someone wants them, not sure if i should re-enable the telemetry module....

Code:
root@pve1 14:48:12 ~ # ceph -s
  cluster:
    id:     5e55fd50-d135-413d-bffe-9d0fae0ef5fa
    health: HEALTH_ERR
            Module 'telemetry' has failed: Not found or unloadable
            3 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum pve2,pve1,pve3 (age 24m) [leader: pve2]
    mgr: pve3(active, since 4m), standbys: pve1, pve2
    mds: 3/3 daemons up, 3 standby
    osd: 6 osds: 6 up (since 115s), 6 in (since 2w)
 
  data:
    volumes: 3/3 healthy
    pools:   9 pools, 353 pgs
    objects: 108.36k objects, 216 GiB
    usage:   472 GiB used, 5.0 TiB / 5.4 TiB avail
    pgs:     353 active+clean
 
  io:
    client:   28 KiB/s rd, 57 KiB/s wr, 1 op/s rd, 8 op/s wr
 
root@pve1 14:48:15 ~ #

--edit--
telemetry now enabled, looking back through my command history it looks like i may have run the ceph telemetry preview-all before all the last OSDs was upgraded on node 3 - i assumed this was a passive command - guess not, the instructions were clear to run this at the end, so my bad

--edit2-- actually seems others have hit this on a homogenous cluster too, so seems i might have just hit the issue this person hit

[ceph-users] Re: Repeated crashes of mgr module in Tentacle
 
Last edited: