So, I got an update. Things are looking rather bleak as of right now, unfortunately.
I've managed to find a Python traceback in the systemd journal, which occurs if you try to enable the dashboard via
ceph mgr module enable dashboard
:
Code:
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 7fecdc91e000 -1 mgr[py] Traceback (most recent call last):
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/__init__.py", line 60, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .module import Module, StandbyModule # noqa: F401
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/module.py", line 30, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .controllers import Router, json_error_page
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 1, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ._api_router import APIRouter
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/controllers/_api_router.py", line 1, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ._router import Router
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/controllers/_router.py", line 7, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ._base_controller import BaseController
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 11, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from ..services.auth import AuthManager, JwtManager
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/usr/share/ceph/mgr/dashboard/services/auth.py", line 12, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: import jwt
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/lib/python3/dist-packages/jwt/__init__.py", line 1, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .api_jwk import PyJWK, PyJWKSet
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/lib/python3/dist-packages/jwt/api_jwk.py", line 6, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .algorithms import get_default_algorithms
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/lib/python3/dist-packages/jwt/algorithms.py", line 6, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from .utils import (
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/lib/python3/dist-packages/jwt/utils.py", line 7, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/lib/python3/dist-packages/cryptography/hazmat/primitives/asymmetric/ec.py", line 11, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from cryptography.hazmat._oid import ObjectIdentifier
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: File "/lib/python3/dist-packages/cryptography/hazmat/_oid.py", line 7, in <module>
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: from cryptography.hazmat.bindings._rust import (
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: ImportError: PyO3 modules may only be initialized once per interpreter process
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 7fecdc91e000 -1 mgr[py] Class not found in module 'dashboard'
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.438+0200 7fecdc91e000 -1 mgr[py] Error loading module 'dashboard': (2) No such file or directory
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.470+0200 7fecdc91e000 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.502+0200 7fecdc91e000 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member
Sep 04 18:39:51 ceph-01 ceph-mgr[15669]: 2023-09-04T18:39:51.502+0200 7fecdc91e000 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: dashboard
This traceback reveals the dependency chain between the Ceph dashboard and PyO3.
To sum the dependencies up:
- The Ceph dashboard uses
PyJWT
for authentication, which is a Python library for JSON Web Token.
PyJWT
in turn uses cryptography
, a Python library for cryptographic primitives.
- A part of
cryptography
's source is written in Rust, and in order to be able to interoperate with that Rust code, it makes use of PyO3
.
- (That Rust code calls OpenSSL among other things, which is written in C, so it really is turtles all the way down.)
My initial question was whether the issue was because of
PyO3
itself or
how it was used, so I went down the dependency chain. I eventually found
issue 9016 on
cryptography
's side, and it seems that the Ceph dashboard isn't the only thing that's affected by this.
Following issue 9016 all the way down, it seems that the
AUR maintainer for Ceph
has also stumbled across this while rebuilding the aur-ceph
package for Ceph v18. The maintainer made a
separate issue specifically for this problem. Check it out if you want to have all the details.
To summarize all the issues above, it seems like the
change in
PyO3
that
@spirit had found is indeed the cause for all of this. Basically, if your Python application uses sub-interpreters,
which Python supports in its C-API, any module that uses bindings made via
PyO3
blows up and throws an
ImportError
if it's loaded twice. The
ceph mgr
uses such a sub-interpreter model for running
mgr
modules. In the case of Ceph, this makes perfect sense (from my point of view) as it's better to keep every module isolated - if module A crashes, you don't want it to take down module B, C, D, etc. as well, but you still want to share some data (e.g. libraries) between them. (I'm not yet entirely sure what Ceph does there under the hood, but that's my assumption at least.)
However, to cut the
PyO3
developers some slack, their decision does make sense on their end, because they'd have to otherwise redesign
PyO3
. Redesigning an entire library is quite a herculean effort, as you can imagine. Again, all the details can be found in the
AUR maintainer's tracking issue.
So, what happens now? There are a couple possible options.
I'm going to keep looking into this to see if I can at least fix it on our side, but such a fix probably wouldn't be permanent or would have to be maintained for future versions of Ceph. This, however, doesn't fix the root cause of
PyO3
not allowing sub-interpreters.
So, another idea is to contribute to
PyO3
upstream, as that wouldn't just fix it in our case, but would "trickle down" to all libraries that use
PyO3
. Since I'm experienced in both Python and Rust, I could probably bridge the gap there. That being said, I'm not
too familiar with all the nitty-gritty details of the Python C-API and internals of the CPython implementation, but it's nothing that I can't teach myself. I'm not yet sure what the
PyO3
maintainers' stance on fixing this is yet, but we'll see.
I'm confident that a fix is possible, but I cannot yet say whether it will come from our side (specifically for PVE), from
PyO3
's side, or if Ceph will work it out on their end,
or when this will be fixed. For what it's worth, I've also
reported my findings to the Ceph mailing list as a response to the existing thread, so maybe it will also gain some traction there. Maybe Ceph already does have a fix for this, but since the AUR maintainer is struggling with the same issue on Ceph 18.2, I doubt they do.