ceph warning post upgrade to v8

I ran into this today and my fix was very simple... `ceph mgr module disable restful` I can't see anything that this breaks and now the PyO3 issues for me are no-more... and no forcing to enable of the dashboard plugin.
 
Interesting fact for those who might be interested.

by analysing logs, I can see the PyO3 related bug is not always affecting dashboard, but could affect restful module (or any other module depending on PyO3 I suppose)

- dashboard is running on 2 of my 3-node cluster (and I cannot make it working on the third node, even with destroy/create, reinstall package, etc...)
- dashboard is working when it is loaded before restful

@Max Carrara, is there a way to order (or re-order) modules loading (as a workaround) ?


here is ceph-mgr log on one node where dashboard is working:
Code:
2023-10-26T23:24:08.833+0200 7f73e1032000  1 mgr[py] Loading python module 'stats'
2023-10-26T23:24:09.005+0200 7f73e1032000  1 mgr[py] Loading python module 'volumes'
2023-10-26T23:24:09.497+0200 7f73e1032000 -1 mgr[py] Module volumes has missing NOTIFY_TYPES member
2023-10-26T23:24:09.497+0200 7f73e1032000  1 mgr[py] Loading python module 'pg_autoscaler'
2023-10-26T23:24:09.633+0200 7f73e1032000 -1 mgr[py] Module pg_autoscaler has missing NOTIFY_TYPES member
2023-10-26T23:24:09.633+0200 7f73e1032000  1 mgr[py] Loading python module 'mirroring'
2023-10-26T23:24:09.761+0200 7f73e1032000  1 mgr[py] Loading python module 'iostat'
2023-10-26T23:24:09.857+0200 7f73e1032000 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member
2023-10-26T23:24:09.857+0200 7f73e1032000  1 mgr[py] Loading python module 'alerts'
2023-10-26T23:24:09.973+0200 7f73e1032000 -1 mgr[py] Module alerts has missing NOTIFY_TYPES member
2023-10-26T23:24:09.973+0200 7f73e1032000  1 mgr[py] Loading python module 'rbd_support'
2023-10-26T23:24:10.133+0200 7f73e1032000 -1 mgr[py] Module rbd_support has missing NOTIFY_TYPES member
2023-10-26T23:24:10.133+0200 7f73e1032000  1 mgr[py] Loading python module 'test_orchestrator'
2023-10-26T23:24:10.369+0200 7f73e1032000 -1 mgr[py] Module test_orchestrator has missing NOTIFY_TYPES member
2023-10-26T23:24:10.369+0200 7f73e1032000  1 mgr[py] Loading python module 'nfs'
2023-10-26T23:24:10.637+0200 7f73e1032000 -1 mgr[py] Module nfs has missing NOTIFY_TYPES member
2023-10-26T23:24:10.637+0200 7f73e1032000  1 mgr[py] Loading python module 'dashboard'
2023-10-26T23:24:11.465+0200 7f73e1032000  1 mgr[py] Loading python module 'progress'
2023-10-26T23:24:11.565+0200 7f73e1032000 -1 mgr[py] Module progress has missing NOTIFY_TYPES member
2023-10-26T23:24:11.565+0200 7f73e1032000  1 mgr[py] Loading python module 'osd_perf_query'
2023-10-26T23:24:11.693+0200 7f73e1032000 -1 mgr[py] Module osd_perf_query has missing NOTIFY_TYPES member
2023-10-26T23:24:11.693+0200 7f73e1032000  1 mgr[py] Loading python module 'insights'
2023-10-26T23:24:11.789+0200 7f73e1032000  1 mgr[py] Loading python module 'orchestrator'
2023-10-26T23:24:12.005+0200 7f73e1032000 -1 mgr[py] Module orchestrator has missing NOTIFY_TYPES member
2023-10-26T23:24:12.005+0200 7f73e1032000  1 mgr[py] Loading python module 'osd_support'
2023-10-26T23:24:12.113+0200 7f73e1032000 -1 mgr[py] Module osd_support has missing NOTIFY_TYPES member
2023-10-26T23:24:12.113+0200 7f73e1032000  1 mgr[py] Loading python module 'telegraf'
2023-10-26T23:24:12.213+0200 7f73e1032000 -1 mgr[py] Module telegraf has missing NOTIFY_TYPES member
2023-10-26T23:24:12.213+0200 7f73e1032000  1 mgr[py] Loading python module 'influx'
2023-10-26T23:24:12.309+0200 7f73e1032000 -1 mgr[py] Module influx has missing NOTIFY_TYPES member
2023-10-26T23:24:12.309+0200 7f73e1032000  1 mgr[py] Loading python module 'restful'
2023-10-26T23:24:12.541+0200 7f73e1032000 -1 mgr[py] Module not found: 'restful'
2023-10-26T23:24:12.541+0200 7f73e1032000 -1 mgr[py] Traceback (most recent call last):
  File "/usr/share/ceph/mgr/restful/__init__.py", line 1, in <module>
    from .module import Module
  File "/usr/share/ceph/mgr/restful/module.py", line 21, in <module>
    from OpenSSL import crypto
  File "/lib/python3/dist-packages/OpenSSL/__init__.py", line 8, in <module>
    from OpenSSL import SSL, crypto
  File "/lib/python3/dist-packages/OpenSSL/SSL.py", line 19, in <module>
    from OpenSSL.crypto import (
  File "/lib/python3/dist-packages/OpenSSL/crypto.py", line 21, in <module>
    from cryptography import utils, x509
  File "/lib/python3/dist-packages/cryptography/x509/__init__.py", line 6, in <module>
    from cryptography.x509 import certificate_transparency
  File "/lib/python3/dist-packages/cryptography/x509/certificate_transparency.py", line 10, in <module>
    from cryptography.hazmat.bindings._rust import x509 as rust_x509
ImportError: PyO3 modules may only be initialized once per interpreter process

2023-10-26T23:24:12.545+0200 7f73e1032000 -1 mgr[py] Class not found in module 'restful'
2023-10-26T23:24:12.545+0200 7f73e1032000 -1 mgr[py] Error loading module 'restful': (2) No such file or directory


here is ceph-mgr log on another node where dashboard is NOT working:
Code:
2023-10-27T00:01:24.360+0200 7f02a29b1000  1 mgr[py] Loading python module 'balancer'
2023-10-27T00:01:24.444+0200 7f02a29b1000 -1 mgr[py] Module balancer has missing NOTIFY_TYPES member
2023-10-27T00:01:24.444+0200 7f02a29b1000  1 mgr[py] Loading python module 'orchestrator'
2023-10-27T00:01:24.612+0200 7f02a29b1000 -1 mgr[py] Module orchestrator has missing NOTIFY_TYPES member
2023-10-27T00:01:24.612+0200 7f02a29b1000  1 mgr[py] Loading python module 'influx'
2023-10-27T00:01:24.692+0200 7f02a29b1000 -1 mgr[py] Module influx has missing NOTIFY_TYPES member
2023-10-27T00:01:24.692+0200 7f02a29b1000  1 mgr[py] Loading python module 'osd_support'
2023-10-27T00:01:24.760+0200 7f02a29b1000 -1 mgr[py] Module osd_support has missing NOTIFY_TYPES member
2023-10-27T00:01:24.760+0200 7f02a29b1000  1 mgr[py] Loading python module 'nfs'
2023-10-27T00:01:24.936+0200 7f02a29b1000 -1 mgr[py] Module nfs has missing NOTIFY_TYPES member
2023-10-27T00:01:24.936+0200 7f02a29b1000  1 mgr[py] Loading python module 'telegraf'
2023-10-27T00:01:25.008+0200 7f02a29b1000 -1 mgr[py] Module telegraf has missing NOTIFY_TYPES member
2023-10-27T00:01:25.008+0200 7f02a29b1000  1 mgr[py] Loading python module 'mirroring'
2023-10-27T00:01:25.092+0200 7f02a29b1000  1 mgr[py] Loading python module 'prometheus'
2023-10-27T00:01:25.412+0200 7f02a29b1000 -1 mgr[py] Module prometheus has missing NOTIFY_TYPES member
2023-10-27T00:01:25.412+0200 7f02a29b1000  1 mgr[py] Loading python module 'selftest'
2023-10-27T00:01:25.484+0200 7f02a29b1000 -1 mgr[py] Module selftest has missing NOTIFY_TYPES member
2023-10-27T00:01:25.484+0200 7f02a29b1000  1 mgr[py] Loading python module 'restful'
2023-10-27T00:01:25.740+0200 7f02a29b1000  1 mgr[py] Loading python module 'alerts'
2023-10-27T00:01:25.824+0200 7f02a29b1000 -1 mgr[py] Module alerts has missing NOTIFY_TYPES member
2023-10-27T00:01:25.824+0200 7f02a29b1000  1 mgr[py] Loading python module 'status'
2023-10-27T00:01:25.936+0200 7f02a29b1000 -1 mgr[py] Module status has missing NOTIFY_TYPES member
2023-10-27T00:01:25.936+0200 7f02a29b1000  1 mgr[py] Loading python module 'crash'
2023-10-27T00:01:26.036+0200 7f02a29b1000 -1 mgr[py] Module crash has missing NOTIFY_TYPES member
2023-10-27T00:01:26.036+0200 7f02a29b1000  1 mgr[py] Loading python module 'rbd_support'
2023-10-27T00:01:26.124+0200 7f02a29b1000 -1 mgr[py] Module rbd_support has missing NOTIFY_TYPES member
2023-10-27T00:01:26.124+0200 7f02a29b1000  1 mgr[py] Loading python module 'dashboard'
2023-10-27T00:01:26.408+0200 7f02a29b1000 -1 mgr[py] Module not found: 'dashboard'
  File "/usr/share/ceph/mgr/dashboard/__init__.py", line 60, in <module>
    from .module import Module, StandbyModule  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/ceph/mgr/dashboard/module.py", line 30, in <module>
    from .controllers import Router, json_error_page
  File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 1, in <module>
    from ._api_router import APIRouter
  File "/usr/share/ceph/mgr/dashboard/controllers/_api_router.py", line 1, in <module>
    from ._router import Router
  File "/usr/share/ceph/mgr/dashboard/controllers/_router.py", line 7, in <module>
    from ._base_controller import BaseController
  File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 11, in <module>
    from ..services.auth import AuthManager, JwtManager
  File "/usr/share/ceph/mgr/dashboard/services/auth.py", line 12, in <module>
    import jwt
  File "/lib/python3/dist-packages/jwt/__init__.py", line 1, in <module>
    from .api_jwk import PyJWK, PyJWKSet
  File "/lib/python3/dist-packages/jwt/api_jwk.py", line 6, in <module>
    from .algorithms import get_default_algorithms
  File "/lib/python3/dist-packages/jwt/algorithms.py", line 6, in <module>
    from .utils import (
  File "/lib/python3/dist-packages/jwt/utils.py", line 7, in <module>
    from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
  File "/lib/python3/dist-packages/cryptography/hazmat/primitives/asymmetric/ec.py", line 11, in <module>
    from cryptography.hazmat._oid import ObjectIdentifier
  File "/lib/python3/dist-packages/cryptography/hazmat/_oid.py", line 7, in <module>
    from cryptography.hazmat.bindings._rust import (
ImportError: PyO3 modules may only be initialized once per interpreter process

2023-10-27T00:01:26.412+0200 7f02a29b1000 -1 mgr[py] Class not found in module 'dashboard'
2023-10-27T00:01:26.412+0200 7f02a29b1000 -1 mgr[py] Error loading module 'dashboard': (2) No such file or directory
 
  • Like
Reactions: Max Carrara
@niko2 @mattkenn4545 Thanks for your insights!

Your suspicions are indeed correct - both dashboard and restful transitively depend on PyO3 down the line. So yes, if dashboard is loaded before restful, then restful is the affected module. I'm however not sure if reordering those modules would make for a "proper" workaround.

To give you all another little update: I'm currently working on a workaround that should (hopefully) work for all ceph mgr modules. I'm on the right track I believe, I just need to get Python to play along. I'll keep you posted.

For the curious: The way Ceph uses its sub-interpreter model should actually be fine for PyO3 - it's just that PyO3 can't guarantee that it'll be fine across all such models. So, I'm now trying to change the way ceph mgr looks up Python modules, so we can provide our own versions of the Python modules that use PyO3, but specifically only for Ceph.

Once I get a proof of concept working, it's just a matter of packaging it all. So, I hope to get some results rather soon. Fingers crossed.
 
@Max Carrara Hi,

Any news regarding this problem? It seems that PyO3 0.20 release fixed this problem according to this pull request: https://github.com/PyO3/pyo3/pull/3446

The PR does indeed fix things in cases where PyO3 modules are always imported by the same sub-interpreter, but there are two issues regarding this:
  1. It will take a while for this fix to trickle into the Debian ecosystem, because package versions are stable and not always on the latest version of upstream projects (in this case the latest PyO3 version).
  2. ceph-mgr spawns a sub-interpreter for each mgr module; this means that there's a sub-interpreter for restful and another one for dashboard, for example. Therefore I highly doubt that this will work for Ceph, but if possible, I will test that at some point. But even if it does end up working we'd have to ensure that PyO3 0.20 is compatible with cryptography, which is yet another can of worms.
In either case, so far I've managed to tell Ceph which alternative Python modules to use, now I just need to get it to play along (it still finds the "old" cryptography in certain cases for god knows what reason and then complains).

If everything goes according to plan, you might see a workaround being shipped in the coming weeks.
 
  • Like
Reactions: Drallas and niko2
The PR does indeed fix things in cases where PyO3 modules are always imported by the same sub-interpreter, but there are two issues regarding this:
  1. It will take a while for this fix to trickle into the Debian ecosystem, because package versions are stable and not always on the latest version of upstream projects (in this case the latest PyO3 version).
@Max Carrara Hi,

Thank you. Waiting for your solution. Right now we:

1. Downgraded python3-cryptography and python3-openssl to these versions:
https://snapshot.debian.org/archive.../p/pyopenssl/python3-openssl_21.0.0-1_all.deb https://snapshot.debian.org/archive...graphy/python3-cryptography_3.4.8-2_amd64.deb
Now ceph 18.2.0 works well. However, it's still not safe as it used to be.

2. Filed a bug reported to the Debian team and discussed with them on the IRC channels. They are going to backport this pull request to the 0.17 version to the bookworm so we will not have to test cryptography with PyO3 0.20.
 
@Max Carrara Hi,

Thank you. Waiting for your solution. Right now we:

1. Downgraded python3-cryptography and python3-openssl to these versions:
https://snapshot.debian.org/archive.../p/pyopenssl/python3-openssl_21.0.0-1_all.deb https://snapshot.debian.org/archive...graphy/python3-cryptography_3.4.8-2_amd64.deb
Now ceph 18.2.0 works well. However, it's still not safe as it used to be.

2. Filed a bug reported to the Debian team and discussed with them on the IRC channels. They are going to backport this pull request to the 0.17 version to the bookworm so we will not have to test cryptography with PyO3 0.20.
Thank you very much! These are exciting news.

I'll go ahead and test the PR - maybe backport it for PyO3 0.17. If this ends up working for our issue here we will package it all up for PVE and see if we can also get it into Debian stable.

My coworker had just mentioned this to me as well - I think he might have even been the one you were talking to. ;)

Thanks a lot for your help! I'll keep you all posted again, as always.
 
Thank you very much! These are exciting news.

I'll go ahead and test the PR - maybe backport it for PyO3 0.17. If this ends up working for our issue here we will package it all up for PVE and see if we can also get it into Debian stable.

My coworker had just mentioned this to me as well - I think he might have even been the one you were talking to. ;)

Thanks a lot for your help! I'll keep you all posted again, as always.
@Max Carrara
Hi,

Yes, i had some conversations on IRC channels. Actually i even tried to:
1. Apply patches from the PR to the 0.17 - not even applicable, too many differences
2. Apply patches from the PR to the 0.19 - applied, but not buildable on Bookworm, types mismatch (and possible other problems): https://paste.debian.net/1297423/ Requires examining this more deeply with rust knowledge.
3. Even with backporting this patch it's not guaranteed ceph will work correctly. This PR just returns the same interpreter ID if someone tries to load the module again. And this can cause problems, i suppose. Like deadlocks.

So right now we are going to rebuild old pyo3 0.16 and python-cryptography 3.8.4 under Bookworm and use them with Ceph.
 
1. Apply patches from the PR to the 0.17 - not even applicable, too many differences
Doing that as we speak while resolving any other errors and conflicts I come across. Looking pretty good so far, as most tests (via cargo test) pass, except a few that require a couple more changes.

The second commit of the PR requires some changes that were added in later versions of PyO3; luckily, most of those only require minimal adaptations and can otherwise be taken as-is.
https://paste.debian.net/1297423/ Requires examining this more deeply with rust knowledge.
This issue e.g. can be solved by using the newer error_on_minusone function which has simply been made generic. Some other things are more complicated, however.

Once that's done I'll see if I can build python3-cryptography and python3-openssl.

I'll also be in touch on the Debian bugtracker and IRC channels.
 
Should we expect that when the fix/workaround drops, that it'll just be part of a normal apt upgrade cycle?

I also noticed a 17.2.6 to 17.2.7 upgrade to Ceph being available, but im guessing its unrelated...?
 
3. Even with backporting this patch it's not guaranteed ceph will work correctly. This PR just returns the same interpreter ID if someone tries to load the module again. And this can cause problems, i suppose. Like deadlocks.
AFAIC the changes should be fine in Ceph's case, as every mgr module is isolated in its own sub-interpreter and its Python modules aren't shared between sub-interpreters (AFAICT). As long as the (sub-)interpreter stays the same, PyO3 modules won't raise an ImportError and will continue to function, at least according to the changes in the PR.

I'll do some thorough testing once I'm done backporting the PR and building the packages to make sure that Ceph really doesn't share any data between its sub-interpreters. PyO3 should still raise an ImportError if that's the case - if it still does, then that means Ceph's sub-interpreter model isn't sound, which would be a completely different can of worms. So let's hope for the best.
 
If you upgrade to 17.2.7, dashboard look will totally change and metrics will disapppear
To restore previous (working) dashboard, execute :
Code:
ceph dashboard feature disable dashboard

FYI, the new landing page depends on prometheus metrics.
 
Last edited:
that's unrelated though, and in any case, does not help for bookworm (or bookworm based distros like PVE) ;)
 
I've got another update. Unfortunately, we're back to square one.

I had attempted two workarounds, both of which unfortunately didn't work.

The first consisted of supplying another version of cryptography specifically only for Ceph (modifying the PYTHONPATH environment variable wherever possible) - that didn't work, because cryptography refuses to be imported if a different version of itself is already in use (and I think it's obvious that any more meddling with a cryptographic library in that regard might introduce security issues).

The second consisted of backporting the Pull Request @hiddenman had posted here, then rebuilding and repackaging PyO3 and cryptography with the help of @fabian. That unfortunately didn't work either, as Ceph's sub-interpreter model still isn't compatible with PyO3; enabling the dashboard still fails.

So, what happens now?

I'll raise the issue over at Ceph's bugtracker and see if I can make some more noise there upstream. (I have to wait until they approve my account registration though.) There's an existing issue but I'm not sure if the folks over at Ceph are aware of what's going on (or maybe the AUR maintainer let them know already?). Either way, sooner or later more people will be affected by this, including Ceph's own containerized distribution, so it's best to make sure upstream is aware of this. One way I can see this being fixed on Ceph's side is a rewrite of their Python interpreter embedding that doesn't use sub-interpreters, but e.g. instead spawns a thread plus a separate interpreter for each mgr module. This is not trivial to implement at all and will result in Ceph eating a couple more resources. One way or another Ceph would have to get rid of Python sub-interpreters.

At least we now know that there's no feasible workaround for this (and I would rather not mess with the internals of a cryptographic library). I will still be working on contributing to the PyO3 situation upstream, but that's most likely going to take a really long time until it's done, and will also probably not be able to be backported (due to the fact that the project's entire threading model has to be rewritten from scratch).

This is honestly a quite unsatisfying outcome, but hey, sometimes things just don't work out. Unless there's some other possible workaround that I'm not yet aware of, the dashboard will remain broken for quite some time.
 
  • Like
Reactions: herzkerl and niko2
@Max Carrara Hi,

Thank you for your attempts to fix this. Yesterday i built everything with your backported patched and yes, this didn't help.
So the only workaround for now is to use old python-cryptography (3.4.8) . It works fine (because do not use rust and rust bindings at all).
 
  • Like
Reactions: niko2
Thank you for your attempts to fix this.
Happy to help! Thanks for pointing me to that pull request; even if it didn't work out in the end, there was a slight chance that it would.

It works fine (because do not use rust and rust bindings at all).
cryptography uses Rust since version 3.4, it's just that the version of PyO3 used back then did not include the check for the presence of Python sub-interpreters.

Note that downgrading packages, especially those that ship cryptographic libraries, is not recommended at all.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!