Ceph 17.2 Quincy Available as Stable Release

I am seeing high cpu usage on some osd's after the upgrade.

Bash:
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                         
3797093 ceph      20   0 3188152   1,6g  46092 S 367,5   1,3   2608:51 /usr/bin/ceph-osd -f --cluster ceph --id 33 --setuser ceph --setgroup ceph                                                       
3797325 ceph      20   0 3415772   1,9g  46480 S 284,8   1,5   1983:16 /usr/bin/ceph-osd -f --cluster ceph --id 34 --setuser ceph --setgroup ceph                                                       
3797084 ceph      20   0 2232080   1,1g  46376 S 172,5   0,8   1701:45 /usr/bin/ceph-osd -f --cluster ceph --id 53 --setuser ceph --setgroup ceph                                                       
3796849 ceph      20   0 1819216 947488  46212 S  97,0   0,7   1192:46 /usr/bin/ceph-osd -f --cluster ceph --id 35 --setuser ceph --setgroup ceph                                                       
3796833 ceph      20   0 2236252 980692  46168 S  95,7   0,7   1236:45 /usr/bin/ceph-osd -f --cluster ceph --id 58 --setuser ceph --setgroup ceph                                                       
3796813 ceph      20   0 2238076   1,2g  46392 S  93,7   0,9   1456:35 /usr/bin/ceph-osd -f --cluster ceph --id 54 --setuser ceph --setgroup ceph                                                       
1541740 root      20   0  268024 109672  23696 S   3,3   0,1  34:58.89 pvestatd                                                                                                                         
3788004 ceph      20   0  846588 513556  34828 S   2,3   0,4  36:12.78 /usr/bin/ceph-mon -f --cluster ceph --id r2n0 --setuser ceph --setgroup ceph                                                     
1512979 root      rt   0  561984 167672  51368 S   1,7   0,1  69:40.46 /usr/sbin/corosync -f                                                                                                           
3788159 ceph      20   0 1389424 399908  40532 S   1,3   0,3  49:42.81 /usr/bin/ceph-mgr -f --cluster ceph --id r2n0 --setuser ceph --setgroup ceph                                                     
3796824 ceph      20   0 2244688   1,0g  46188 S   1,3   0,8   1341:19 /usr/bin/ceph-osd -f --cluster ceph --id 57 --setuser ceph --setgroup ceph                                                       
3796837 ceph      20   0 2269668 953992  45944 S   1,3   0,7 525:54.00 /usr/bin/ceph-osd -f --cluster ceph --id 50 --setuser ceph --setgroup ceph

Anyone else? It is still recovering as i added an extra node. That may be the cause.
I have tried restarting the osds but the same ones goes up to max cpu again.
 
Any update for when a new release will come out, seems Proxmox still on 17.2.1 CEPH released 17.2.4 recently.
 
17.2.4 is already available on the test repository since yesterday, the previous release were not relevant for Proxmox VE.
I can see 17.2.4 is now live, however Ceph released 17.2.5 as they missed some patches into 17.2.4.

Has Proxmox back ported these into the 17.2.4 or shall we expected a 17.2.5 soon?
 
Stupid question - do we need to follow the hint from the Quincy release notes?

  • Cephadm: osd_memory_target_autotune is enabled by default, which sets mgr/cephadm/autotune_memory_target_ratio to 0.7 of total RAM. This is unsuitable for hyperconverged infrastructures. For hyperconverged Ceph, please refer to the documentation or set mgr/cephadm/autotune_memory_target_ratio to 0.2.
 
Proxmox does not (to my knowledge) bundle and deploy cephadm.
It is available in the Proxmox Ceph repository, but mainly because we do not actively take steps to not ship it ;) Cephadm is incompatible as a deployment tool with Proxmox VE that deploys Ceph on a PVE cluster. So any cephadm related notes can usually be ignored.
 
  • Like
Reactions: flames
According to the upgrade how to, the first step in section "Upgrading all CephFS MDS daemons" is disabling allow_standby_replay.

However, there is no word about re-enabling it again afterwards. And no reasoning why this should or should not be done.

We traditionally had 1 MDS in standby_replay and would like to know if this is still allowed to enable in Quincy - or if it would cause trouble?
 
According to the upgrade how to, the first step in section "Upgrading all CephFS MDS daemons" is disabling allow_standby_replay.

However, there is no word about re-enabling it again afterwards. And no reasoning why this should or should not be done.

We traditionally had 1 MDS in standby_replay and would like to know if this is still allowed to enable in Quincy - or if it would cause trouble?
I think here is answer for that question (with explanation):
https://docs.ceph.com/en/latest/cephfs/upgrading/
 
Wondering why Proxmox does not enable osd_memory_target_autotune by default. Its advised if you're running Quincy >17.2.0 and for hyperconverged setups like Proxmox, we can scale it to between 0.1 and 0.2 to be on the safe side. Do you see any way to take advantage of autoscaling in the future @aaron ?
 
Last edited:
Wondering why Proxmox does not enable osd_memory_target_autotune by default. Its advised if you're running Quincy >17.2.0 and for hyperconverged setups like Proxmox, we can scale it to between 0.1 and 0.2 to be on the safe side. Do you see any way to take advantage of autoscaling in the future @aaron ?

Please correct me if I am wrong, but AFAIU it needs cephadm to work. cephadm is the current Ceph deployment tool, but Proxmox VE has its own way of deploying Ceph. Therefore, installing it is not a good idea as there surely will be conflicts or easy ways to shoot yourself in the foot.

Without it, enabling the osd_memory_target_autotune setting should not have any effect on the osd_memory_target value. At least it did not in my tests.

And then there is the following big warning in the cephadm documentation:
Warning
Cephadm sets osd_memory_target_autotune to true by default which is unsuitable for hyperconverged infrastructures.
 
Please correct me if I am wrong, but AFAIU it needs cephadm to work. cephadm is the current Ceph deployment tool, but Proxmox VE has its own way of deploying Ceph. Therefore, installing it is not a good idea as there surely will be conflicts or easy ways to shoot yourself in the foot.

Without it, enabling the osd_memory_target_autotune setting should not have any effect on the osd_memory_target value. At least it did not in my tests.

You're indeed correct. So in the longrun it could be an idea to develop a ceph-ansible/cephadm inspired, proprietary Proxmox approach, to automatically calculate and adjust osd_memory_target values. Wdyt?

And then there is the following big warning in the cephadm documentation:
Thats why i've been referring to it using values in between 0.1 <> 0.2 ;-)
 
Last edited:
You're indeed correct. So in the longrun it could be an idea to develop a ceph-ansible/cephadm inspired, proprietary Proxmox approach, to automatically calculate and adjust osd_memory_target values. Wdyt?
Not sure at this point, but please, feel free to open an enhancement request in our bugtracker. This way we can keep track of it and discuss technicalities there as well.
 
  • Like
Reactions: fstrankowski
Today we did proxmox and ceph update and after reboot 1 of 3 servers the OSD stopped to work with error:
Code:
Jan 30 13:38:49 pve3 systemd[1]: Started Ceph object storage daemon osd.4.
Jan 30 13:38:49 pve3 ceph-osd[18974]: 2023-01-30T13:38:49.422+0000 7ff335e73240 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
Jan 30 13:38:49 pve3 ceph-osd[18974]: 2023-01-30T13:38:49.422+0000 7ff335e73240 -1 AuthRegistry(0x559e15be0140) no keyring found at /var/lib/ceph/osd/ceph-4/keyring, disabling cephx
Jan 30 13:38:49 pve3 ceph-osd[18974]: 2023-01-30T13:38:49.422+0000 7ff335e73240 -1 auth: unable to find a keyring on /var/lib/ceph/osd/ceph-4/keyring: (2) No such file or directory
Jan 30 13:38:49 pve3 ceph-osd[18974]: 2023-01-30T13:38:49.422+0000 7ff335e73240 -1 AuthRegistry(0x7ffd6b1f75c0) no keyring found at /var/lib/ceph/osd/ceph-4/keyring, disabling cephx
Jan 30 13:38:49 pve3 ceph-osd[18974]: failed to fetch mon config (--no-mon-config to skip)
Jan 30 13:38:49 pve3 systemd[1]: ceph-osd@4.service: Main process exited, code=exited, status=1/FAILURE
Jan 30 13:38:49 pve3 systemd[1]: ceph-osd@4.service: Failed with result 'exit-code'.

However i can see osd.4 and osd.2 in "ceph auth list". Is there a way to recreate the keyring files?
I tried to manually create with
[osd.2]
key=...


but it did not work.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!