ceph warning post upgrade to v8

Vasilisc · Jan 18, 2024

I updated my cluster and the problem was fixed. Thanks for the great job.
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)

Max Carrara · Jan 18, 2024

arubenstein said:
Is this available as part of the regular update process?

Yes.

Regular users will find that updates for Ceph Quincy are already out; Reef should follow soon, too.

pigpen · Jan 18, 2024

Max Carrara said:
Hello again everybody! This time I've got fantastic news.

In my previous post I had mentioned that the dashboard will only be able to be used if TLS is turned off. This is no longer the case; the dashboard will work again as intended. So, no reverse proxy or other workarounds needed. The patch series was recently applied, which means that you should eventually see updates trickle in.

Some more details: Besides the backport of the PyJWT replacement, I've found that there are only a couple usages of another module that uses PyO3. That module was PyOpenSSL. All other SSL/TLS-related functions use Python's built-in ssl module from the standard library. This module however doesn't expose everything OpenSSL can do, which is probably why PyOpenSSL helper functions were brought in.

One of those usages was a check during the dashboard's startup, that made sure that the TLS certificate and key match. In my opinion, it's very unlikely for such misconfiguration to happen, and if it does, your browser will warn you anyway.

The only other caveat is that the ceph dashboard create-self-signed-cert command will no longer work. Instead, you'll have to manually provide a self-signed certificate and key - when you try to use the command, you will be shown a little help message on how to achieve that. It's almost frictionless. Just make sure the cert and key match up, or your browser will complain (due to the removal of the aforementioned check). You will only need this command during setup of the dashboard anyway, so for existing users, you should see your dashboard come up again once updates are out and installed. If it doesn't come up or there's some other problem, please ping me!

Hello Max,

This is great news! Thanks for digging into this!
What are the names of the updated packages we'll need to look out for? Is there a mention of the dashboard problem in their changelog?

devedse · Jan 18, 2024

Hi all,

I just did a full upgrade of my whole proxmox cluster and tried enabling the dashboard. I still run into the same error though:

Code:

root@proxmox1:~# pveversion
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-7-pve)
root@proxmox1:~# ceph version
ceph version 18.2.1 (850293cdaae6621945e1191aa8c28ea2918269c3) reef (stable)
root@proxmox1:~# ceph mgr module enable dashboard
Error ENOENT: module 'dashboard' reports that it cannot run on the active manager daemon: PyO3 modules may only be initialized once per interpreter process (pass --force to force enablement)

Could it be that the latest functionality is not yet available through apt-get?

Edit:
I didn't read well, apparently only reef is released

, I'll wait a bit.

NovemberHotel · Jan 22, 2024

Our usecase is mostly monitoring via de RESTful API. Thanks Proxmox team and community contributors for following up on this issue and caring although it is deep within some sub dependency, thumbs up. We'll have patience for all the updates to roll out.

caseyjmorton · Jan 23, 2024

Is there any estimate on when the update will be released for Reef?

Max Carrara · Jan 24, 2024

caseyjmorton said:
Is there any estimate on when the update will be released for Reef?

Most likely somewhat soon, we usually wait for an update upstream before we push out a new update. If that's not the case, we'll roll an update with the current patches out ourselves.

devedse · Jan 31, 2024

@Max Carrara , has anything been released already? I'm waiting quite eagerly on the update

Max Carrara · Feb 1, 2024

Hi again everybody! I've got some more news.

What follows is a bunch of details around what's been going on in the meantime - thank you all for your patience! I hope that this post will clarify a couple more things.

First of all, I was apparently mistaken - I had thought that my series was already applied for Ceph Quincy as well, and therefore assumed it was included in the last update, which it unfortunately wasn't - my apologies. (This January has been quite wonky for me, to say the least.) I quickly realized this while talking to @BDub38 regarding the update and pushed out a separate series for our Quincy branch.

Additionally, I also sent out a fix that's relevant for both Quincy and Reef that prevents the dashboard from crashing if no TLS keypair was configured at all yet, which relieves some administrative friction, as I call it. In detail, if the dashboard crashed (because of the missing keypair), the ceph dashboard subcommand would cease to work as well - even if a sub-subcommand doesn't talk to the dashboard's API at all.

This means that users that had not set up the dashboard at all yet still have access to the altered ceph dashboard create-self-signed-cert command, which will then instruct them how to set up their TLS keypair (as mentioned before in this post). That was kind of the point of altering the command that way anyways; it was just a test case I had missed.

Furthermore, I'm also currently working on a fix for bug #4759. In short, Ceph's crash reporting service ceph-crash isn't able to actually send the crash logs and reports it gathers to the cluster (as of this upstream pull request) due to the way we handle Ceph's configuration for our hyper-converged clusters. While the patch series I sent in to adress this bug still needs some more polishing here and there, the overall fix works.

I'm sharing this because fixing this bug is actually indirectly related to the dashboard: Due to that upstream change (among other things), ceph-crash cannot authenticate with the cluster and also move its reports to a certain directory anymore. So, if you've been prudently checking your systemd journal, you might've found that there are quite a lot of messages from ceph-crash not being able to post crash reports. Because the dashboard crashing produces such reports, your logs will end up being flooded sooner or later. The more often it crashes, the more messages in your logs. Every 10 minutes. Quite a nuisance.

So, because there are several things that still need to be done, expect that we'll update our packages as soon as upstream releases new updates. This means that we won't have to rebuild, package, and distribute our two branches as often, which saves you (and us) quite some bandwidth and processing power - and what's more important, much less can go wrong (e.g. too hastily shipping fixes that introduce new bugs). On top of that, we can also ensure our patches still apply and work as expected when upstream releases an update (which again means less can go wrong).

In summary, all fixes will most likely be packaged and shipped once upstream releases updates. This means it's easier for us to test if patches apply and everything is working as expected, while also guaranteeing that things stay stable and that nothing breaks. (In case upstream still takes a little longer, we might release an update on our side in the meantime though.)

So, things are looking very good so far! You just gotta be a little bit more patient!

devedse · Feb 1, 2024

Thanks @Max Carrara , keep us posted

avacado · Feb 11, 2024

devedse said:
Thanks @Max Carrara , keep us posted

I was able to install the dashboard on v8.1.4. Thank you!

avacado · Feb 14, 2024

I guess I spoke too soon. I was able to install only once, now the same problem.

djtobyy · Feb 19, 2024

@Max Carrara
Is There any update on bug #4759 ?
Are you guys working on it because in the bugreport there is no update since more than a month.

Max Carrara · Feb 19, 2024

djtobyy said:
@Max Carrara
Is There any update on bug #4759 ?
Are you guys working on it because in the bugreport there is no update since more than a month.

Yes, it's still cooking! You can find a recent version of my patch series over here. It's on v3 right now, so I've updated it twice already.

The patches regarding the dashboard have already been merged to both our Reef and our Quincy branch, so they will definitely be included in the next update - whenever that is.

devedse · Mar 22, 2024

@Max Carrara , I saw that Proxmox version 8.1.5 is now released. Does this include the change required for this?

Madscientist · Mar 27, 2024

any update from the proxmox team for the solution ? waited a long now........

Max Carrara · Mar 27, 2024

Madscientist said:
any update from the proxmox team for the solution ? waited a long now........

We've been cooking! Because Ceph Reef recently got an update, we applied all patches related to our Ceph mirrors - Ceph Reef v18.2.2 was recently rebuilt and is on the testing repositories right now. If you have the testing repositories enabled (e.g. in a virtualized cluster) you can already take a look at the dashboard. Goes without saying that you shouldn't run the testing repos in production, of course.

I expect Ceph Quincy to follow soon - upstream has got a milestone for v17.2.8, so I estimate there will be an update coming rather soon-ish (fingers crossed). Once that happens, we'll update our mirror, rebuild and test everything - then it'll land on the testing repositories as well.

I know it's been taking a little while, but it's all making progress - keeping things safe and stable just takes some time.

In the meantime I'm still working on version 5 of the ceph-crash saga (#4759) - that won't take too long anymore either, but fortunately is not a blocker for the dashboard. That will just clear up any of the PyO3-related crash reports that still might sit around on some people's systems.

Max Carrara · Apr 3, 2024

The changes for Ceph Reef have been packaged for the no-subscription repository now - which means that you should be able to see updates coming in! As always, the enterprise repository will receive those changes a little later.

Ceph Quincy is still on its way - but if you're on the no-sub repo and really cannot wait any longer, maybe now is a good time to upgrade from Quincy to Reef

Meowcat285 · Apr 3, 2024

Max Carrara said:
The changes for Ceph Reef have been packaged for the no-subscription repository now - which means that you should be able to see updates coming in! As always, the enterprise repository will receive those changes a little later.

Ceph Quincy is still on its way - but if you're on the no-sub repo and really cannot wait any longer, maybe now is a good time to upgrade from Quincy to Reef

Awesome! I am installing the update now and will report how it goes.

Meowcat285 · Apr 3, 2024

Meowcat285 said:
Awesome! I am installing the update now and will report how it goes.

Yep, the dashboard works perfectly now. Thanks!

ceph warning post upgrade to v8

Well-Known Member

Active Member

New Member

New Member

New Member

Member

Active Member

New Member

Active Member

New Member

New Member

New Member

Member

Active Member

New Member

Member

Active Member

Active Member

Member

Member