CEPH 17.2.8 BluestoreDB bug

DEZERTIR

New Member
Aug 28, 2025
10
1
3
Hello everyone, I need your support.
We recently updated our cluster to versions PVE 8.4.16 and CEPH 17.2.8, and only after the update did we read an article that said we need to urgently upgrade from this version because there's a critical error with bluestore. Can you tell me if the error is still relevant? And should we panic?

Link to the article - https://docs.clyso.com/blog/critica...escription​,cluster stability and data safety.
 
Sorry, I didn't specify, it's Quincy's version, and this is the latest version of this thread.
 
Sorry, I didn't specify, it's Quincy's version, and this is the latest version of this thread.
Quincy is already EOL upgrade to Reef and then maybe even to squid. The Bug you mentioned is only shown for ceph 18 at least on clysos side. But there are other for ceph 19. I personally dont have any issues with 19.x so far.
 
Last edited:
I am on Proxmox 8.4.16 on the paid subscritiption. Proxmox VE 8.4 should still be supported until August 2026. https://forum.proxmox.com/threads/proxmox-ve-support-lifecycle.35755/

I just upgraded a cluster and to my surprise after rebooting the first server its Ceph Monitor failed to start! So I held off restarting the other servers and started investigating.

I just read "Upgrade to version 17.2.9 or 18.2.7. Do not deploy or remain on version 17.2.8." https://docs.clyso.com/blog/critical-bugs-ceph-reef-squid/#:~:text=Upgrade to version 17.2.9 or 18.2.7. Do not deploy or remain on version 17.2.8

However 17.2.8 is the current and latest version in the Proxmox 8.4 paid license repository?

Edit: I guess I have to urgently https://pve.proxmox.com/wiki/Ceph_Quincy_to_Reef and then make sure that a Ceph major release support duration expiry is not missed again.
 
Last edited:
Fix: [1] is in 17.2.8-pve2: [2].

But: As already was said multiple times, Ceph 17/Quincy is already EOL: [3] [5] and Ceph 18/Reef will soon be fully too: [4] [5].

PS.: If not already obvious, PVE and Ceph have different support (even from Proxmox) lifecycles...

[1] https://git.proxmox.com/?p=ceph.git;a=commit;h=71ce71edd912bf31ab1ce63723f43911814c6e3a
[2] https://git.proxmox.com/?p=ceph.git;a=commitdiff;h=dff5d121918807afac5d101ea65f4b00ad7b56d8
[3] https://forum.proxmox.com/threads/c...se-and-ceph-17-2-quincy-soon-to-be-eol.156433
[4] https://forum.proxmox.com/threads/c...nd-ceph-18-2-reef-soon-to-be-fully-eol.178960
[5] https://docs.ceph.com/en/latest/releases/
 
  • Like
Reactions: kwinz
Thanks everyone! It's relieving that Proxmox backported the data corruption fix to 17.2.8-pve2

On my end I now successfully dealt with my issue by:
1. Following https://pve.proxmox.com/wiki/Ceph_Quincy_to_Reef (18.x)
2. at this point a ceph monitor on one node pve4 still refused to start even after rebooting and trying various things, preventing me from completing the update to Reef with ceph osd require-osd-release reef
Code:
Feb 15 11:06:43 pve4 ceph-mon[2072]: *** Caught signal (Aborted) **
Feb 15 11:06:43 pve4 ceph-mon[2072]:  in thread 747c4a490e40 thread_name:ceph-mon
Feb 15 11:06:43 pve4 ceph-mon[2072]:  ceph version 18.2.7 (4cac8341a72477c60a6f153f3ed344b49870c932) reef (stable)
Feb 15 11:06:43 pve4 ceph-mon[2072]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x747c4b85a050]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8aeec) [0x747c4b8a8eec]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  3: gsignal()
Feb 15 11:06:43 pve4 ceph-mon[2072]:  4: abort()
Feb 15 11:06:43 pve4 ceph-mon[2072]:  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x178) [0x747c4c0aa881]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  6: /usr/lib/ceph/libceph-common.so.2(+0x2aa9c4) [0x747c4c0aa9c4]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  7: (LogMonitor::update_from_paxos(bool*)+0x22d1) [0x5cd668c0db61]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  8: (Monitor::refresh_from_paxos(bool*)+0x10c) [0x5cd668b7e36c]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  9: (Monitor::preinit()+0x95d) [0x5cd668baad8d]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  10: main()
Feb 15 11:06:43 pve4 ceph-mon[2072]:  11: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x747c4b84524a]
Feb 15 11:06:43 pve4 ceph-mon[2072]:  12: __libc_start_main()
Feb 15 11:06:43 pve4 ceph-mon[2072]:  13: _start()
[...]
Feb 15 11:06:43 pve4 systemd[1]: ceph-mon@pve4.service: Main process exited, code=killed, status=6/ABRT
Feb 15 11:06:43 pve4 systemd[1]: ceph-mon@pve4.service: Failed with result 'signal'.
[...]
Feb 15 11:07:24 pve4 systemd[1]: Failed to start ceph-mon@pve4.service - Ceph cluster monitor daemon.
I successfully resolved this issue on this one node by destroying the broken monitor and recreating it:
Code:
pveceph mon destroy pve4
pveceph mon create
3. After that I continued with a further Ceph upgrade to https://pve.proxmox.com/wiki/Ceph_Reef_to_Squid (19.x)
4. And finally I successfully upgraded the whole cluster one by one from PVE 8 to PVE 9 https://pve.proxmox.com/wiki/Upgrade_from_8_to_9
 
Last edited:
  • Like
Reactions: Neobin
Hi,
@kwinz as I think about building a PVE and ceph cluster with 3 nodes, I wanted to ask you, if you had any outages while fixing all your issues?

Having issues can always happen, but a full data outage would be really bad.
Knowing your experience would be really nice. :)
 
  • Like
Reactions: kwinz
Hi,
@kwinz as I think about building a PVE and ceph cluster with 3 nodes, I wanted to ask you, if you had any outages while fixing all your issues?

Having issues can always happen, but a full data outage would be really bad.
Knowing your experience would be really nice. :)

No outages with the one Ceph monitor down, I still had full read and write functionality. And besides this hickup last week it has been solid for years.
 
Last edited: