Ceph 17.2 Quincy Available as Stable Release

t.lamprecht · Jul 7, 2022

Hi Community!

After the overall good feedback in the Ceph Quincy preview thread, and no new issues popping up when testing the first point release (17.2.1), we are confident to mark the Proxmox VE integration and build of Ceph Quincy as stable and supported when used with pve-manager in version 7.2-7 or later.

Upgrades from Pacific to Quincy:
You can find the upgrade how to here: https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy

New Installation of Quincy:
Use the updated ceph installation wizard available with the recently released pve-manager version 7.2-7.

Please also remember that Ceph 15.2 Octopus is going to be end of life (EOL) after 2022-07 (end of this month), and you should upgrade any Ceph Octopus setups to Ceph Pacific rather sooner than later. See the respective upgrade how-to: https://pve.proxmox.com/wiki/Ceph_Octopus_to_Pacific

jasonsansone · Jul 7, 2022

One (possibly) minor bug:

The GUI throws a health warning that "Telemetry requires re-opt-in", even though I have opted in multiple times. This occurs on multiple nodes. A reboot doesn't clear the flag.

Screen Shot 2022-07-07 at 7.35.28 AM.png

jasonsansone · Jul 7, 2022

jasonsansone said:
One (possibly) minor bug:

The GUI throws a health warning that "Telemetry requires re-opt-in", even though I have opted in multiple times. This occurs on multiple nodes. A reboot doesn't clear the flag.

View attachment 38744

I needed to re-enable specific channels. "ceph telemetry enable channel all" cleared the flag, but "ceph telemetry on --license sharing-1-0" did not.

itNGO · Jul 7, 2022

Using the Guide,made the Update on a 3-Node-Cluster with PVE 7.2-7.
Lost all VMs and CEPH-Pool was shown offline. Had to SHUT-OFF all VMs and reboot all nodes to get them running again.
Now on CEPH 17.2.1.... not that smooth at all....

t.lamprecht · Jul 7, 2022

itNGO said:
Lost all VMs and CEPH-Pool was shown offline. Had to SHUT-OFF all VMs and reboot all nodes to get them running again.
Now on CEPH 17.2.1.... not that smooth at all....

From what version did you upgrade? And are you sure you followed the guide and its order 1:1?
That a reboot just solves a quite bad sounding situation like loss of pools sounds a bit odd to me.

itNGO · Jul 7, 2022

t.lamprecht said:
From what version did you upgrade? And are you sure you followed the guide and its order 1:1?
That a reboot just solves a quite bad sounding situation like loss of pools sounds a bit odd to me.

This guide.... https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy
step by step.... After restarting first "ceph-mon.target" it went very dark in the cluster.
Waited 10 Minutes... nothing changed... so we decided to continue update and made a reboot as last resort of all 3 nodes at once.

t.lamprecht · Jul 7, 2022

Did you upgrade all cluster nodes before doing so? And did you restart on every node or stopped after the first one?

Also, can you please check the journalctl output from that timeframe? /var/log/apt/history.log and the output during /var/log/apt/term.log could be interesting too.

AllanM · Jul 19, 2022

We upgraded our 6-node production cluster to Quincy shortly after this thread was posted.

As always, carefully follow the well written instructions provided by the Proxmox team and you will be rewarded with continuity of operations.

No issues. Upgrade was quick and easy. Thank you!

-Eric

PS: I use the GUI where applicable (where there is an equivalent GUI "button" to use in place of a command). Works the same either way but it's easier for me to keep track of what I'm doing from the GUI when restarting a whole bunch of monitors, managers, OSDs, etc.

vesalius · Jul 19, 2022

Similar to @AllanM, updated a 3 node cluster to Quincy following the guide. Running well for days now.

Only minimal hiccup, completely my fault and thankfully zero consequences/issues, was being impatient at the end when restarting the OSDs. I restarted node 2 OSD before ceph status on node 1 returned

Code:

HEALTH_OK
    
    or
    
HEALTH_WARN
noout flag(s) set

which went against what the guides instructed me to do.

Andrzej Zawadzki · Jul 28, 2022

Hi!
I'd like to ask about parameter: osd_memory_target_autotune
Is this turn on?
If yes, what is default value of autotune_memory_target_ratio?

aaron · Jul 29, 2022

Do you see it enabled somewhere?

https://docs.ceph.com/en/latest/releases/quincy/#general

It is part of cephadm to define how much memory the OSD services should use. Since cephadm is not used when Ceph is installed on Proxmox VE, I would expect, that even setting it to anything will not have any impact.

mijanek · Jul 30, 2022

Thanks for the nice guide. Also mine upgrade went trough without any issue. Upgraded from the current version 16.2.9 to 17.2.1 (PVE 7.2-7)

Alexksn · Aug 5, 2022

Thanks for the guide. I upgraded trough without any issue(except for the ~Telemetry requires re-opt-in~) . Upgraded from the current version 16.2.9 to 17.2.1 (PVE 7.2-7)

zeuxprox · Aug 10, 2022

Hi All,

I have a cluster of 5 nodes with Proxmox 7.1-12 and Ceph 16.2.7. This weekend I would like to upgrade Proxmox to 7.2 and Ceph to 17.2.1. My Ceph Cluster is made of 3 pools:

device_health_metrics with 1 Placement Group
Ceph-1-NVMe-Pool with 1024 Placement Groups
Ceph-1-SSD-Pool with 1024 Placement Groups

The questions are:

With the 2 pools, one for NVMe disks and one for SSD disks, can I have some issue during the upgrade procedure?
Do I have to upgrade before the 3 monitor nodes and then others or before the other nodes and the the monitor nodes?
After upgrade Ceph, do I have to restart the nodes?

Thank you

t.lamprecht · Aug 10, 2022

Hi,

zeuxprox said:
With the 2 pools, one for NVMe disks and one for SSD disks, can I have some issue during the upgrade procedure?

Amount of pools and its device classes do not really have an impact on the upgrade, which is cluster wide.

zeuxprox said:
Do I have to upgrade before the 3 monitor nodes and then others or before the other nodes and the the monitor nodes?

Well, before you restart the manager services and OSDs, so you can upgrade the packages already previous to that on all nodes, and then continue following the upgrade how-to, which steps are meant to execute strictly in order as listed:
https://pve.proxmox.com/wiki/Ceph_Pacific_to_Quincy

zeuxprox said:
After upgrade Ceph, do I have to restart the nodes?

No, for a pure ceph upgrade that's not required.

hybrid512 · Aug 14, 2022

Hi,

I upgraded to Quincy and everything went fine except that I have this issue with the devicehealth module :

Code:

2022-08-14T14:06:33.206+0200 7f16e2549700  0 [devicehealth INFO root] creating main.db for devicehealth
2022-08-14T14:06:33.206+0200 7f16e2549700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'devicehealth' while running on mgr.proxmox: table Device already exists
2022-08-14T14:06:33.206+0200 7f16e2549700 -1 devicehealth.serve:
2022-08-14T14:06:33.206+0200 7f16e2549700 -1 Traceback (most recent call last):
  File "/usr/share/ceph/mgr/devicehealth/module.py", line 338, in serve
    if self.db_ready() and self.enable_monitoring:
  File "/usr/share/ceph/mgr/mgr_module.py", line 1218, in db_ready
    return self.db is not None
  File "/usr/share/ceph/mgr/mgr_module.py", line 1230, in db
    self._db = self.open_db()
  File "/usr/share/ceph/mgr/mgr_module.py", line 1211, in open_db
    self.configure_db(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1201, in configure_db
    self.load_schema(db)
  File "/usr/share/ceph/mgr/mgr_module.py", line 1191, in load_schema
    self.maybe_upgrade(db, int(row['value']))
  File "/usr/share/ceph/mgr/mgr_module.py", line 1168, in maybe_upgrade
    cur = db.executescript(self.SCHEMA)
sqlite3.OperationalError: table Device already exists

I tried deleting/re-creating ceph mgr but that didn't fixed anything.
Apparently, there was an issue updating the sqlite db for this module but I couldn't find any sqlite db related so I couldn't even try to force a rebuild from scratch.
The ceph cluster is operating normaly, I just have this error repeating itself each time the mgr is restarted.
How can I just force recreating this db from scratch ? I don't care about previous data, I just want to fix this issue.

Best regards.

aaron · Aug 16, 2022

hybrid512 said:
I tried deleting/re-creating ceph mgr but that didn't fixed anything.
Apparently, there was an issue updating the sqlite db for this module but I couldn't find any sqlite db related so I couldn't even try to force a rebuild from scratch.
The ceph cluster is operating normaly, I just have this error repeating itself each time the mgr is restarted.
How can I just force recreating this db from scratch ? I don't care about previous data, I just want to fix this issue.

Hmm... searching for that error barely gives any results. You could try the following:

Stop all MGRs
rename the Pool .mgr to something else: ceph osd pool rename .mgr .mgr-old
Start the MGRs again.

A new .mgr pool should be created, and hopefully the error won't show up again. Feel free to remove the old, renamed .mgr pool now.

jasonsansone · Aug 16, 2022

ETA on 17.2.3?

hybrid512 · Aug 16, 2022

aaron said:
Hmm... searching for that error barely gives any results. You could try the following:

Stop all MGRs

rename the Pool .mgr to something else: ceph osd pool rename .mgr .mgr-old

Start the MGRs again.

A new .mgr pool should be created, and hopefully the error won't show up again. Feel free to remove the old, renamed .mgr pool now.

Hi @aaron ,

Worked ike a charm, thank you soooo much !
One last question : in ProxMox UI, I have a warning about crashed mgr modules, it is related to the error I had but now everything is fine, how can I clear this log ?

Best regards

aaron · Aug 17, 2022

jasonsansone said:
ETA on 17.2.3?

As this version only addresses a problem on how it was built on the RH/Centos platform (see the release notes and further links there) and because we compile the versions we ship ourselves, I don't think we need to rush that release as we would not gain anything.

hybrid512 said:
One last question : in ProxMox UI, I have a warning about crashed mgr modules, it is related to the error I had but now everything is fine, how can I clear this log ?

It should go away soon by itself. If it is the only crash warning that you see, you can archive it easily by running

Code:

ceph crash archive-all

See the docs on the Ceph crash module.

Ceph 17.2 Quincy Available as Stable Release

Proxmox Staff Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

Renowned Member

Attachments

Proxmox Staff Member

Well-Known Member

Renowned Member

Active Member

Proxmox Staff Member

Renowned Member

Member

Renowned Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Active Member

Proxmox Staff Member