Ceph 17.2 Quincy Available as Stable Release

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Hi Community!

After the overall good feedback in the Ceph Quincy preview thread, and no new issues popping up when testing the first point release (17.2.1), we are confident to mark the Proxmox VE integration and build of Ceph Quincy as stable and supported when used with pve-manager in version 7.2-7 or later.

Upgrades from Pacific to Quincy:
You can find the upgrade how to here: https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy

New Installation of Quincy:
Use the updated ceph installation wizard available with the recently released pve-manager version 7.2-7.

Please also remember that Ceph 15.2 Octopus is going to be end of life (EOL) after 2022-07 (end of this month), and you should upgrade any Ceph Octopus setups to Ceph Pacific rather sooner than later. See the respective upgrade how-to: https://pve.proxmox.com/wiki/Ceph_Octopus_to_Pacific
 

jasonsansone

Active Member
May 17, 2021
121
28
28
Oklahoma City, OK
www.sansonehowell.com
One (possibly) minor bug:

The GUI throws a health warning that "Telemetry requires re-opt-in", even though I have opted in multiple times. This occurs on multiple nodes. A reboot doesn't clear the flag.

View attachment 38744


I needed to re-enable specific channels. "ceph telemetry enable channel all" cleared the flag, but "ceph telemetry on --license sharing-1-0" did not.
 

itNGO

Well-Known Member
Jun 12, 2020
573
126
48
44
Germany
it-ngo.com
Using the Guide,made the Update on a 3-Node-Cluster with PVE 7.2-7.
Lost all VMs and CEPH-Pool was shown offline. Had to SHUT-OFF all VMs and reboot all nodes to get them running again.
Now on CEPH 17.2.1.... not that smooth at all....
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Lost all VMs and CEPH-Pool was shown offline. Had to SHUT-OFF all VMs and reboot all nodes to get them running again.
Now on CEPH 17.2.1.... not that smooth at all....
From what version did you upgrade? And are you sure you followed the guide and its order 1:1?
That a reboot just solves a quite bad sounding situation like loss of pools sounds a bit odd to me.
 

itNGO

Well-Known Member
Jun 12, 2020
573
126
48
44
Germany
it-ngo.com
From what version did you upgrade? And are you sure you followed the guide and its order 1:1?
That a reboot just solves a quite bad sounding situation like loss of pools sounds a bit odd to me.
This guide.... https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy
step by step.... After restarting first "ceph-mon.target" it went very dark in the cluster.
Waited 10 Minutes... nothing changed... so we decided to continue update and made a reboot as last resort of all 3 nodes at once.

 

Attachments

  • syslog.txt
    97.4 KB · Views: 13
Last edited:

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Did you upgrade all cluster nodes before doing so? And did you restart on every node or stopped after the first one?

Also, can you please check the journalctl output from that timeframe? /var/log/apt/history.log and the output during /var/log/apt/term.log could be interesting too.
 
Oct 17, 2019
92
24
13
39
We upgraded our 6-node production cluster to Quincy shortly after this thread was posted.

As always, carefully follow the well written instructions provided by the Proxmox team and you will be rewarded with continuity of operations.

No issues. Upgrade was quick and easy. Thank you!

-Eric

PS: I use the GUI where applicable (where there is an equivalent GUI "button" to use in place of a command). Works the same either way but it's easier for me to keep track of what I'm doing from the GUI when restarting a whole bunch of monitors, managers, OSDs, etc.
 
Last edited:

vesalius

Active Member
Aug 19, 2020
249
59
33
Similar to @AllanM, updated a 3 node cluster to Quincy following the guide. Running well for days now.

Only minimal hiccup, completely my fault and thankfully zero consequences/issues, was being impatient at the end when restarting the OSDs. I restarted node 2 OSD before ceph status on node 1 returned
Code:
HEALTH_OK
    
    or
    
HEALTH_WARN
noout flag(s) set

which went against what the guides instructed me to do.
 
  • Like
Reactions: t.lamprecht
Jun 4, 2019
7
2
8
47
Hi!
I'd like to ask about parameter: osd_memory_target_autotune
Is this turn on?
If yes, what is default value of autotune_memory_target_ratio?
 

aaron

Proxmox Staff Member
Staff member
Jun 3, 2019
3,100
512
118
Do you see it enabled somewhere?

https://docs.ceph.com/en/latest/releases/quincy/#general

It is part of cephadm to define how much memory the OSD services should use. Since cephadm is not used when Ceph is installed on Proxmox VE, I would expect, that even setting it to anything will not have any impact.
 

mijanek

Active Member
Sep 3, 2013
88
7
28
Thanks for the nice guide. Also mine upgrade went trough without any issue. Upgraded from the current version 16.2.9 to 17.2.1 (PVE 7.2-7)
 
  • Like
Reactions: Alexksn

Alexksn

New Member
May 11, 2021
1
1
3
30
Thanks for the guide. I upgraded trough without any issue(except for the ~Telemetry requires re-opt-in~) . Upgraded from the current version 16.2.9 to 17.2.1 (PVE 7.2-7)
 
  • Like
Reactions: aaron
Dec 10, 2014
88
4
28
Hi All,

I have a cluster of 5 nodes with Proxmox 7.1-12 and Ceph 16.2.7. This weekend I would like to upgrade Proxmox to 7.2 and Ceph to 17.2.1. My Ceph Cluster is made of 3 pools:
  • device_health_metrics with 1 Placement Group
  • Ceph-1-NVMe-Pool with 1024 Placement Groups
  • Ceph-1-SSD-Pool with 1024 Placement Groups
The questions are:
  1. With the 2 pools, one for NVMe disks and one for SSD disks, can I have some issue during the upgrade procedure?
  2. Do I have to upgrade before the 3 monitor nodes and then others or before the other nodes and the the monitor nodes?
  3. After upgrade Ceph, do I have to restart the nodes?
Thank you
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Hi,
With the 2 pools, one for NVMe disks and one for SSD disks, can I have some issue during the upgrade procedure?
Amount of pools and its device classes do not really have an impact on the upgrade, which is cluster wide.
Do I have to upgrade before the 3 monitor nodes and then others or before the other nodes and the the monitor nodes?
Well, before you restart the manager services and OSDs, so you can upgrade the packages already previous to that on all nodes, and then continue following the upgrade how-to, which steps are meant to execute strictly in order as listed:
https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy
After upgrade Ceph, do I have to restart the nodes?
No, for a pure ceph upgrade that's not required.
 

hybrid512

Active Member
Jun 6, 2013
76
4
28
Hi,

I upgraded to Quincy and everything went fine except that I have this issue with the devicehealth module :

Code:
2022-08-14T14:06:33.206+0200 7f16e2549700  0 [devicehealth INFO root] creating main.db for devicehealth
2022-08-14T14:06:33.206+0200 7f16e2549700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.proxmox: table Device already exists
2022-08-14T14:06:33.206+0200 7f16e2549700 -1 devicehealth.serve:
2022-08-14T14:06:33.206+0200 7f16e2549700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 338, in serve
    if self.db_ready() and self.enable_monitoring:
  File "/usr/share/ceph/mgr/mgr_module.py", line 1218, in db_ready
    return self.db is not None
  File "/usr/share/ceph/mgr/mgr_module.py", line 1230, in db
    self._db = self.open_db()
  File "/usr/share/ceph/mgr/mgr_module.py", line 1211, in open_db
    self.configure_db(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1201, in configure_db
    self.load_schema(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1191, in load_schema
    self.maybe_upgrade(db, int(row['value']))
  File "/usr/share/ceph/mgr/mgr_module.py", line 1168, in maybe_upgrade
    cur = db.executescript(self.SCHEMA)
sqlite3.OperationalError: table Device already exists


I tried deleting/re-creating ceph mgr but that didn't fixed anything.
Apparently, there was an issue updating the sqlite db for this module but I couldn't find any sqlite db related so I couldn't even try to force a rebuild from scratch.
The ceph cluster is operating normaly, I just have this error repeating itself each time the mgr is restarted.
How can I just force recreating this db from scratch ? I don't care about previous data, I just want to fix this issue.

Best regards.
 

aaron

Proxmox Staff Member
Staff member
Jun 3, 2019
3,100
512
118
I tried deleting/re-creating ceph mgr but that didn't fixed anything.
Apparently, there was an issue updating the sqlite db for this module but I couldn't find any sqlite db related so I couldn't even try to force a rebuild from scratch.
The ceph cluster is operating normaly, I just have this error repeating itself each time the mgr is restarted.
How can I just force recreating this db from scratch ? I don't care about previous data, I just want to fix this issue.
Hmm... searching for that error barely gives any results. You could try the following:
  1. Stop all MGRs
  2. rename the Pool .mgr to something else: ceph osd pool rename .mgr .mgr-old
  3. Start the MGRs again.
A new .mgr pool should be created, and hopefully the error won't show up again. Feel free to remove the old, renamed .mgr pool now.
 

hybrid512

Active Member
Jun 6, 2013
76
4
28
Hmm... searching for that error barely gives any results. You could try the following:
  1. Stop all MGRs
  2. rename the Pool .mgr to something else: ceph osd pool rename .mgr .mgr-old
  3. Start the MGRs again.
A new .mgr pool should be created, and hopefully the error won't show up again. Feel free to remove the old, renamed .mgr pool now.
Hi @aaron ,

Worked ike a charm, thank you soooo much !
One last question : in ProxMox UI, I have a warning about crashed mgr modules, it is related to the error I had but now everything is fine, how can I clear this log ?

Best regards
 

aaron

Proxmox Staff Member
Staff member
Jun 3, 2019
3,100
512
118
ETA on 17.2.3?
As this version only addresses a problem on how it was built on the RH/Centos platform (see the release notes and further links there) and because we compile the versions we ship ourselves, I don't think we need to rush that release as we would not gain anything.
One last question : in ProxMox UI, I have a warning about crashed mgr modules, it is related to the error I had but now everything is fine, how can I clear this log ?
It should go away soon by itself. If it is the only crash warning that you see, you can archive it easily by running
Code:
ceph crash archive-all
See the docs on the Ceph crash module.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!