Ceph Nautilus and Octopus Security Update for "insecure global_id reclaim" CVE-2021-20288

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
6,132
2,525
303
South Tyrol/Italy
shop.proxmox.com
We released updates for all of our currently supported Ceph releases to fix a security issue where Ceph was not ensuring that reconnecting/renewing clients were presenting an existing ticket when reclaiming their global_id value. An attacker that was able to authenticate could claim a global_id in use by a different client and potentially disrupt other cluster services.

Affected Versions:
  • for server: all previous versions
  • for clients:
    • kernel: none
    • user-space: all since (and including) Luminous 12.2.0
Attacker Requirements:
  • have a valid authentication key for the cluster
  • know or guess the global_id of another client
  • run a modified version of the Ceph client code to reclaim another client’s global_id
  • construct appropriate client messages or requests to disrupt service or exploit Ceph daemon assumptions about global_id uniqueness
This means that the risk on a default Proxmox VE managed ceph setup is rather low, we still recommend upgrading in a timely manner.

Available Fixes:
  • Ceph Octopus: 15.2.11
  • Ceph Nautlis: 14.2.20
Applying the Fixes:

After you upgrade your ceph server installation to the package versions including the fixes, you need to restart all monitors, managers, metadata-services (MDS) and OSDs!

You will then still see two HEALTH warnings:
  1. client is using insecure global_id reclaim
  2. mons are allowing insecure global_id reclaim
To address those you need to first either ensure all VMs using ceph on a storage without KRBD run the newer client library. For that, either fully restart the VMs (reboot over API or stop ad start), or migrate them to another node in the cluster that has that ceph update already installed.
You also need to restart the pvestatd and pvedaemon Proxmox VE daemons accessing the ceph cluster periodically to gather status data or to execute API calls. Either use the web-interface (Node -> System) or the command-line:
Bash:
systemctl try-reload-or-restart pvestatd.service pvedaemon.service

Next you can resolve the monitor warning by enforcing the stricter behavior that is possible now.
Execute the following command on one of the nodes in the Proxmox VE Ceph cluster:

Bash:
ceph config set mon auth_allow_insecure_global_id_reclaim false

Note: As said, that will cut-off any old client after the ticket validity times out (72h)

If you operate an external cluster and the Proxmox VE side only uses the client, you can still add our Ceph repository and run a normal upgrade process (apt update && apt dist-upgrade) to get the fixed client package versions.

See also:
https://docs.ceph.com/en/latest/security/CVE-2021-20288/
 
Last edited:
You're right, there we use our perl FFI to the ceph RADOS library, which needs to be reloaded to get the updated version.
I edited the post to reflect that. And yes, the two services you mentioned are correct, and should be enough.
 
am i seeing this correctly? if the message "clients are using insecure global_id reclaim" is disappearing, it is safe to run this command: "ceph config set mon auth_allow_insecure_global_id_reclaim false" ? (I have restarted all pve/ceph cluster nodes and all vms were migrated during this procedure)
 
am i seeing this correctly? if the message "clients are using insecure global_id reclaim" is disappearing, it is safe to run this command: "ceph config set mon auth_allow_insecure_global_id_reclaim false" ? (I have restarted all pve/ceph cluster nodes and all vms were migrated during this procedure)

Yes.
 
We run KRBD by default so my understanding is that we do not have to do anything besides updating and then restarting Ceph on each node, after which we can then cut off non-compliant clients.

running KRBD?
Code:
cat /etc/pve/storage.cfg
rbd: rbd_ssd
        content rootdir,images
        krbd 1
        pool rbd_ssd

[root@kvm1a ~]# rbd showmapped
id  pool     namespace  image          snap  device
0   rbd_ssd             vm-107-disk-0  -     /dev/rbd0
1   rbd_ssd             vm-106-disk-0  -     /dev/rbd1
2   rbd_ssd             vm-104-disk-0  -     /dev/rbd2
3   rbd_ssd             vm-107-disk-1  -     /dev/rbd3
4   rbd_ssd             vm-106-disk-1  -     /dev/rbd4

Code:
apt-get update; apt-get -y dist-upgrade; apt-get autoremove; apt-get autoclean;
ceph -s
systemctl restart ceph.target
  # do this on each node, waiting for mon/mgr/mds/osd/rgw to all fully recover, each time


Validate that the only warning relates to insecure clients being allowed:
Code:
[root@kvm1a ~]# ceph health detail
HEALTH_WARN mons are allowing insecure global_id reclaim; noout flag(s) set
[WRN] AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure global_id reclaim
    mon.kvm1a has auth_allow_insecure_global_id_reclaim set to true
    mon.kvm1b has auth_allow_insecure_global_id_reclaim set to true
    mon.kvm1c has auth_allow_insecure_global_id_reclaim set to true
[WRN] OSDMAP_FLAGS: noout flag(s) set

Bulk disable on all nodes:
Code:
for f in `ls -1A /etc/pve/nodes`; do ssh $f "ceph config set mon auth_allow_insecure_global_id_reclaim false"; done
 
Last edited:
by default so my understanding is that we do not have to do anything besides updating and then restarting Ceph on each node, after which we can then cut off non-compliant clients.
Yes, note the PVE daemons that need to get restarted too, though. Also, you can ensure that enabling the restriction is safe by checking if the client health warning is gone.

systemctl restart ceph.target
I'd in general recommend a more cautious approach, restarting all those daemons at once can lead to issues, especially if there are unexpected problems with the new version.
Bulk disable on all nodes:
No, this needs only done once per ceph server setup, the monitors are clustered after all.
 
  • Like
Reactions: David Herselman
How can I update 15.2.8 to 15.2.11.Have any documents?
How did you install ceph server?
If you installed it over the PVE web-interface or using the pveceph CLI tool you should have our ceph repositories already setup.
Then it is enough to do a standard package update, either via the web-interface (Node -> Updates) or using the CLI:
Bash:
apt update
apt full-upgrade

If you do not have any repo setup (where did you get ceph from then?) you'd need to re-add that first, for the 15.2 release it's the Octopus repository: https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_package_repositories_ceph
 
Hi @t.lamprecht

Thank you for your clear instructions, the upgrade has worked well here, but I've realized using 'lsof -n | grep DEL' that even after restarting monitors, managers, metadata-services (MDS) and OSDs that I hade processes using deleted / updated CEPH libraries.

In the end since our cluster has the required redundancy I decided to do a full reboot of each cluster node after all, and after every node was updated and rebooted sequentially, disabling auth_allow_insecure_global_id_reclaim everything was fine.

Just as hint: Posting this in the forums is good, but it will disappear in the forum history after some time unless pinned - and not be of interest to most in a couple of months. Thus: What about posting the instructions in the Wiki and then link to them in the forum post? This way it will remain available in the wiki for those folks who only update their Ceph much later on (i.e. when upgrading from Ceph Luminous)

Nonetheless: Thank you very much for the proactive information!
 
  • Like
Reactions: pukkita
Thank you for your clear instructions, the upgrade has worked well here, but I've realized using 'lsof -n | grep DEL' that even after restarting monitors, managers, metadata-services (MDS) and OSDs that I hade processes using deleted / updated CEPH libraries.
How exactly did you restart the ceph services? If an in-place re-exec (reload) was used this could be normal, as having them still open does not necessarily mean that symbols from them are in use. Anyway, a full reboot certainly resulted in a fresh start.

Just as hint: Posting this in the forums is good, but it will disappear in the forum history after some time unless pinned - and not be of interest to most in a couple of months. Thus: What about posting the instructions in the Wiki and then link to them in the forum post? This way it will remain available in the wiki for those folks who only update their Ceph much later on (i.e. when upgrading from Ceph Luminous)
Thanks for your input. My basic assumptions where the following:
* people updating frequently, as recommended, will get the warning, search for them (with potentially "proxmox" keyword added) and find this post.
* It is an issue which the standard Proxmox VE managed ceph server setup is normally not vulnerable too, after all only our daemons (which have access anyway), VMs over librbd/krbd (which access is controlled by our stack too) and containers, which can only use the unaffected KRBD client any way.
* the instructions share basically all but the "disable access for old clients" step with a common ceph upgrade, where the web-interface already shows which services are still running an outdated version.

That was IMO enough reasons that an un-sticky post is fine. But, I pondered adding a more general stick post, hinting the upcoming EOL of Nautilus in ~ July where this issue and post could be mentioned too.

I'll link to this forum post in the Nautilus and Octopus upgrade sections, so that people coming from older releases get better visibility of this.
 
  • Like
Reactions: kofik
Hi

I've restarted the osd/msd etc. systemd .targets (and I did specifically not use ceph.target as discouraged in this thread).

Nonetheless, thank you very much, considering this release has only been out for early this week, that very quick and I appreciated the instructions!
 
How did you install ceph server?
If you installed it over the PVE web-interface or using the pveceph CLI tool you should have our ceph repositories already setup.
Then it is enough to do a standard package update, either via the web-interface (Node -> Updates) or using the CLI:
Bash:
apt update
apt full-upgrade

If you do not have any repo setup (where did you get ceph from then?) you'd need to re-add that first, for the 15.2 release it's the Octopus repository: https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_package_repositories_ceph
thanks,it's done.
 
Hello. About the upgrade from 14.2.16 to 14.2.20, should I take any special action or is there any gotcha with cephfs clients? I have several cephfs clients running stock Debian 9 and Debian 10 kernels with the in-kernel driver. Thanks,
 
Hi,
Hello. About the upgrade from 14.2.16 to 14.2.20, should I take any special action or is there any gotcha with cephfs clients? I have several cephfs clients running stock Debian 9 and Debian 10 kernels with the in-kernel driver. Thanks,
No, my post should apply for them too, and as you say you use the in-kernel client then you should be safe to restrict access after upgrading. But, as mentioned before, its good to check that the client is using insecure global_id reclaim warning is gone before doing so, as only that gives a guarantee that all currently connected clients are already using the safer auth.
 
Note: As said, that will cut-off any old client after the ticket validity times out (72h)
To be sure I understand this correctly, what does "cut-off" mean exactly ?
Does that mean the client is shutdown or another problem shows up after 72 hours ?
Or that the connection is renewed and automatically working with krbd enabled ?
I just want to be sure to understand this right.
 
Last edited:
Does that mean the client is shutdown or another problem shows up after 72 hours ?
That means any client not yet upgraded to a version which fixed the problematic behavior will not be able to talk with the ceph cluster any more, i.e., cut-off from participating in that ceph setup.

Or that the connection is renewed and automatically working with krbd enabled ?
No, problematic clients can neither open a new connection nor renewed existing ones from the time the auth_allow_insecure_global_id_reclaim is set to false, only the existing ticket they got, or renewed, before the restriction was enabled, is still valid for another 72 hours (by default).

Note, krbd (= kernel) clients never had the problematic behavior in the first place, but user-space clients cannot automatically switch to the in-kernel one transparently.
 
That means any client not yet upgraded to a version which fixed the problematic behavior will not be able to talk with the ceph cluster any more, i.e., cut-off from participating in that ceph setup.


No, problematic clients can neither open a new connection nor renewed existing ones from the time the auth_allow_insecure_global_id_reclaim is set to false, only the existing ticket they got, or renewed, before the restriction was enabled, is still valid for another 72 hours (by default).

Note, krbd (= kernel) clients never had the problematic behavior in the first place, but user-space clients cannot automatically switch to the in-kernel one transparently.
Executed all instructions as far as I know precisely.
Live-migrated the VMs to another node (5 total) and back.
Is that enough to be sure no problems will show up ?
 
Is that enough to be sure no problems will show up ?
See what I wrote already in this post:
But, as mentioned before, its good to check that the client is using insecure global_id reclaim warning is gone before doing so, as only that gives a guarantee that all currently connected clients are already using the safer auth.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!