I upgraded my Proxmox cluster from the latest 8.4 version to 9.0, post upgrade most things went well. However the Ceph cluster has not gone as well. All monitors, OSDs, and metadata servers have all upgraded to Ceph 19.2.3, however, all of my manager services have failed. They ran for a while just fine but then started crashing, I took a look in the logs and at first saw this error:
I was able to correct this with python3-xmltodict, that resolved one issue but not the main one. My manager service is still failing with the following log:
I admit I am stumped, I have now started running into strange behavior, I am attempting to delete a VM that has no storage associated with Ceph at all failing to delete because proxmox can not access Ceph:
I tried to purge a single Proxmox node of all Ceph material (NOT including `/etc/pve/`) and then reinstall, but the seg fault remains.
Code:
Aug 05 21:00:43 pve-02 ceph-mgr[32601]: ERROR:root:Module 'xmltodict' is not installed.
I was able to correct this with python3-xmltodict, that resolved one issue but not the main one. My manager service is still failing with the following log:
Code:
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: -1022> 2025-08-06T21:11:20.227-0500 7457d18586c0 -1 client.0 error registering admin socket command: (17) File exists
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: -203> 2025-08-06T21:11:21.165-0500 7457ce83e6c0 -1 client.0 error registering admin socket command: (17) File exists
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 0> 2025-08-06T21:11:22.819-0500 7457e10a36c0 -1 *** Caught signal (Segmentation fault) **
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: in thread 7457e10a36c0 thread_name:io_context_pool
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: ceph version 19.2.3 (ad1eecf4042e0ce72f382f60c97b709fd6f16a51) squid (stable)
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x3fdf0) [0x745809249df0]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 2: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1598b0) [0x74580a9598b0]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 3: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a1843) [0x74580a9a1843]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 4: _PyType_LookupRef()
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 5: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1a216b) [0x74580a9a216b]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 6: PyObject_GetAttr()
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 7: _PyEval_EvalFrameDefault()
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 8: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x1109dd) [0x74580a9109dd]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 9: /lib/x86_64-linux-gnu/libpython3.13.so.1.0(+0x3d3442) [0x74580abd3442]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 10: /lib/python3/dist-packages/rbd.cpython-313-x86_64-linux-gnu.so(+0xacfed) [0x7457f864ffed]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 11: /lib/librbd.so.1(+0x3cc8ea) [0x7457f7dcc8ea]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 12: /lib/librbd.so.1(+0x3ccfed) [0x7457f7dccfed]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 13: /lib/librbd.so.1(+0x3afec6) [0x7457f7dafec6]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 14: /lib/librbd.so.1(+0x3b0560) [0x7457f7db0560]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 15: /lib/librbd.so.1(+0x2cac93) [0x7457f7ccac93]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 16: /lib/librbd.so.1(+0x12e7bd) [0x7457f7b2e7bd]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 17: /lib/librbd.so.1(+0x2b1c9e) [0x7457f7cb1c9e]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 18: /lib/librbd.so.1(+0x2b4379) [0x7457f7cb4379]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 19: /lib/librados.so.2(+0xd2716) [0x7458090e4716]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 20: /lib/librados.so.2(+0xd3705) [0x7458090e5705]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 21: /lib/librados.so.2(+0xd3f8a) [0x7458090e5f8a]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 22: /lib/librados.so.2(+0xea598) [0x7458090fc598]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 23: /lib/librados.so.2(+0xd7a71) [0x7458090e9a71]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 24: /lib/librados.so.2(+0xedf63) [0x7458090fff63]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 25: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xe1224) [0x7458094e1224]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 26: /lib/x86_64-linux-gnu/libc.so.6(+0x92b7b) [0x74580929cb7b]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: 27: /lib/x86_64-linux-gnu/libc.so.6(+0x1107b8) [0x74580931a7b8]
Aug 06 21:11:22 pve-01 ceph-mgr[617162]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aug 06 21:11:22 pve-01 systemd[1]: ceph-mgr@pve-01.service: Main process exited, code=killed, status=11/SEGV
Aug 06 21:11:22 pve-01 systemd[1]: ceph-mgr@pve-01.service: Failed with result 'signal'.
Aug 06 21:11:22 pve-01 systemd[1]: ceph-mgr@pve-01.service: Consumed 4.515s CPU time, 368.7M memory peak.
Aug 06 21:11:33 pve-01 systemd[1]: ceph-mgr@pve-01.service: Scheduled restart job, restart counter is at 3.
Aug 06 21:11:33 pve-01 systemd[1]: ceph-mgr@pve-01.service: Start request repeated too quickly.
Aug 06 21:11:33 pve-01 systemd[1]: ceph-mgr@pve-01.service: Failed with result 'signal'.
Aug 06 21:11:33 pve-01 systemd[1]: Failed to start ceph-mgr@pve-01.service - Ceph cluster manager daemon.
Aug 06 21:12:59 pve-01 systemd[1]: ceph-mgr@pve-01.service: Start request repeated too quickly.
Aug 06 21:12:59 pve-01 systemd[1]: ceph-mgr@pve-01.service: Failed with result 'signal'.
Aug 06 21:12:59 pve-01 systemd[1]: Failed to start ceph-mgr@pve-01.service - Ceph cluster manager daemon.
Aug 06 21:21:45 pve-01 systemd[1]: ceph-mgr@pve-01.service: Start request repeated too quickly.
Aug 06 21:21:45 pve-01 systemd[1]: ceph-mgr@pve-01.service: Failed with result 'signal'.
Aug 06 21:21:45 pve-01 systemd[1]: Failed to start ceph-mgr@pve-01.service - Ceph cluster manager daemon.
I admit I am stumped, I have now started running into strange behavior, I am attempting to delete a VM that has no storage associated with Ceph at all failing to delete because proxmox can not access Ceph:
Code:
Logical volume "snap_vm-118-disk-0_no-os" successfully removed.
Logical volume "snap_vm-118-disk-0_minimal-config" successfully removed.
Logical volume "vm-118-disk-0" successfully removed.
Logical volume "snap_vm-118-disk-1_minimal-config" successfully removed.
Logical volume "snap_vm-118-disk-1_no-os" successfully removed.
Logical volume "vm-118-disk-1" successfully removed.
TASK ERROR: rbd error: rbd: listing images failed: (2) No such file or directory
I tried to purge a single Proxmox node of all Ceph material (NOT including `/etc/pve/`) and then reinstall, but the seg fault remains.