Critical Ceph 19.2.2 update not yet in repo?

sseidel

Renowned Member
Jul 8, 2015
51
8
73
Hi,

Ceph 19.2.2 was released in April (!) and it contains a critical bugfix. My cluster was affected by the bug (which can only be fixed by wiping and re-creating all affected OSDs).

Why is it not in the repo yet? Could someone from the team take a look and update the repo?

Thanks,

Stefan
 
Ceph 19.2.2 was released in April (!) and it contains a critical bugfix. My cluster was affected by the bug (which can only be fixed by wiping and re-creating all affected OSDs).

This one?:
Known Issues & Breaking Changes
OSDs deployed on Ceph Squid crash

Ceph Squid currently has an issue where newly created OSDs are crashing. This seems to affect EC pools in particular and only OSDs that where newly created using Ceph 19.2 Squid.

We already published a patched Ceph version (19.2.1-pve3) that works around this issue for newly created OSDs by changing the faulty default setting. Updating your Squid cluster is advised.

Alternatively, you can also manually change the problematic bluestore_elastic_shared_blobs setting to 0 using the following command: ceph config set osd bluestore_elastic_shared_blobs 0.

If you have deployed new OSDs using a Ceph Squid version prior to 19.2.1-pve3, i.e. any version including and between 19.2.0-pve1 and 19.2.1-pve2, you should destroy and recreate each OSD after either upgrading to 19.2.1-pve3 or later, or manually changing this setting as described above. You can do so one at a time, waiting for the cluster to recover to a healthy state in between.
https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.4
 
  • Like
Reactions: sseidel
I see. So instead of including the patch that fixes a critical data corruption issue, the Proxmox team decided to just change a setting. Great.
 
I only asked, if you mean this bug, since you do not provide any reference to the bug you are actually talking about.

I only now checked the Ceph 19.2.2 changelog and it fixes (only one) other bug: [1]:
Notable Changes
  • This hotfix release resolves an RGW data loss bug when CopyObject is used to copy an object onto itself. S3 clients typically do this when they want to change the metadata of an existing object. Due to a regression caused by an earlier fix for https://tracker.ceph.com/issues/66286, any tail objects associated with such objects are erroneously marked for garbage collection. RGW deployments on Squid are encouraged to upgrade as soon as possible to minimize the damage. The experimental rgw-gap-list tool can help to identify damaged objects.
Changelog
  • squid: rgw: keep the tails when copying object to itself (pr#62711, cbodley)
https://docs.ceph.com/en/latest/releases/squid/#v19-2-2-squid

The bug I quoted in my other post is still marked as "Fix Under Review": [2] and the associated PR is still open: [3].

[1] https://tracker.ceph.com/issues/70783
[2] https://tracker.ceph.com/issues/70390
[3] https://github.com/ceph/ceph/pull/62816