Ceph went down after reinstall 1 OSD:

elmacus

Well-Known Member
Mar 20, 2011
71
3
48
Cluster Ceph 4 nodes, 24 OSD (mixed ssd and hdd), ceph Nautilus 14.2.1 (via proxmox 6, 7 nodes).
Autoscale PG is ON, 5 pools, 1 big pool with all the VM's 512 PG (all ssd). This size did not change when i turned on Autoscale on SSD pool, only the smaller for HDD and test.
All OSD installed in Luminous.
I took out and destroyed a NVME P900 (set as SSD), it was a GPT and name osd.22.
Created it again as new, it turns up as LVM, stores like 22% of 447 GB there, circa 100 GB.

After some 5 min the size of pool RBD report is down from 3.77 TB data to 100 GB. PG num from 512 to 4.
And then whole CEPH went unresponsive, no VM responded, for 1 min.

Then i turn of autoscale for this pool and set it manually to 512 again, and BAAM, all Proxmox cluster went down and restarted, took 10 min. Probably HA went down due to CEPH traffic was skyhigh, same network (shouldnt).

Now everything works again, but size of SSD pool is used: 0.52 % of 95.61 GB. Should be 3.77 TB of 10 TB.
All files is there, nothing missing.

So is this a bug in Ceph that autoscale can go wrong when reinstalling 1 disk ?

Any hints on how to restore the report of size of RBD in this pool ?
 
All disks was ceph-disk, then i upgraded like:
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus#Restart_the_OSD_daemon_on_all_nodes

According to Ceph:
https://docs.ceph.com/docs/master/ceph-volume/#migrating

Should i (and everyone that upgrades from Luminous to Nautilus) reinstall ALL OSD to ceph-volume ?
https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#rados-replacing-an-osd

I guess that ONE reinstall of OSD trigged (bug or not) something so RBD only sees this new disk ceph-volume and not all old ceph-disk type, and therefore autoscale PG only can now see ceph-volume disks ?
 
Could be that this is fixed in 14.2.2:
https://ceph.io/releases/v14-2-2-nautilus-released/

Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where deploying a single new (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was originally deployed pre-Nautilus) breaks the pool utilization stats reported by ceph df.
Until all OSDs have been reprovisioned or updated (via ceph-bluestore-tool repair), the pool stats will show values that are lower than the
true value.
This is resolved in 14.2.2, such that the cluster only switches to using the more accurate per-pool stats after all OSDs are 14.2.2 (or later), are BlueStore, and (if they were created prior to Nautilus) have been updated via the repair function.

So Proxmox team, when can we expect Ceph 14.2.2 ?
 
So the fix until 14.2.2 is installed, on every OSD in system, stop 1 at a time, then run:
ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-22 (change the number to correct OSD.)

Watch disk usage go up with: ceph df

Or reinstall all disks, zap and create via gui. ugh.

Proxmox, you should warn users in the wiki for this.
 
With Ceph 14.2.4.1, are there any gotchas for using pg autoscale? Is the correct method to use Ceph commands to enable it, or is there a more Proxmox-friendly way of doing it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!