Instructions to reallocate SSDs in Ceph

Gecko · Nov 22, 2023

I have a three node Proxmox cluster. All three nodes are participating in the Ceph cluster, which is operating as a 3-way mirror.

When you look at my configuration, know that I had some drives die and had to pull them out of the cluster. This is what lead to the storage allocation imbalance.
Cluster01 = 28 TB
Cluster02 = 17.5 TB
Cluster03 = 28 TB

The replacement drives are 15 TB each. There is no clean way to insert the new drives into one/two of the servers, so I want to remove from the Ceph cluster, wipe, physically move, and add to the Ceph cluster most (all) of the drives from Cluster01 & 02 and redistribute them back into Cluster01, 02, & 03. The problem is that I don't know the correct series of steps to accomplish this. I am hoping someone here has a good series of steps for me to follow. I don't need a series of steps that covers each and every drive, but more of an overview with precise steps for handling one drive. From there I can extrapolate what to do for all the other drives.

So, to remove a disk from a Ceph cluster and prepare it for use on a different Ceph cluster member do I need to...

Set/unset Ceph's 'Manage Global Flags'?
Use the 'Stop' button?
Use the 'Out' button?
Use the 'Destroy' button?
Wipe specific sectors of the disk using the 'dd' command?
Anything else?

itNGO · Nov 22, 2023

OUT.... wait until rebalance is done...
STOP ..... wait until rebalance is done....
DESTROY..... wait until rebalance is done....

Remove DISK
Insert new. Add to CEPH. Wait until rebalance is done.... and CEPH is "green" again!

AND! ONE AT A TIME!

Gecko · Nov 22, 2023

Your suggested steps will involve a lot of rebalancing, which is a lot of wear and tear on the drives, not to mention the shear amount of time. Is there a way to tackle a whole server at a time? Perhaps something like...

Disable backfill, rebalancing, recovery
OUT all drives on Cluster01
STOP all drives on Cluster01
DESTROY all drives on Cluster01
Shut down Cluster01
Remove all drives from Cluster01
Install 2x 15 TB drives into Cluster01
Boot Cluster01
Import 15 TB drives on Cluster01 into Ceph
Enable backfill, rebalancing, recovery
Wait for rebalance to complete

...then with the pile of newly unused drives, I can shut down Cluster02 and install drives there.

alexskysilk · Nov 22, 2023

Gecko said:
Your suggested steps will involve a lot of rebalancing,

There's no helping that. you will end up with the same amount of rebalancing in either scenario. Your method will leave you exposed (only two nodes in a 3 node domain) which means any OSD fault in the interim can lock up the whole cluster and possibly cause dataloss (PGs with only 2 OSDs disagreeing means there's no "truth".) This is why @itNGO mentioned, in CAPS, to do it one OSD at a time- to minimize that potential.

If the data is transient enough to make the risk worthwhile, you'd be better served by blowing away the cluster and remaking it and restoring from backup.

sb-jw · Nov 22, 2023

There is also the hard way: stop and destroy OSD, all in one go.
This means that the PGs are undersized and the data has to be copied again. The new drive should be reconnected as quickly as possible and included as an OSD again. If you can do all of this within a few minutes, the CEPH will recover normally and would only want to restore a small amount of data to the new OSD.

This would basically be the path that creates the least amount of data movement. That's how I handle it in practice and haven't had any problems with it so far.

BUT BE CAREFUL: This can have a significant impact on the CEPH, although this should be less noticeable with SSDs / NVMe. With Replica 3, data loss is also unlikely.

But, I can't guarantee that nothing will happen here. You should be aware of what you are doing here and that you are potentially putting your data at risk.

itNGO · Nov 22, 2023

It's your Cluster, but in my opinion, I would never follow a guide, where Dataloss is possible without having a good and fresh backup.
The rebalancing is the least problem. Even if the SSDs/NVMe have only 0.8 DWPD they aren't that full at the moment. Won't be more than 1/2% wear if not less....

Better save, than sorry..... no one here will give you the data back when you follow the guide and loose any data....

sb-jw · Nov 22, 2023

itNGO said:
It's your Cluster, but in my opinion, I would never follow a guide, where Dataloss is possible without having a good and fresh backup.

I have never experienced data loss on any CEPH cluster, no matter how hard I hit it. I've even brought a CEPH cluster back to life in which all mons were irretrievably destroyed. Anyone who actually killed a CEPH in such a way that data was lost has my utmost respect (fire, flood, physical destruction, etc. excluded...

).

Basically, when something like this happens, I always point out a possible loss of data in order to relieve myself of responsibility for it. It has less to do with the fact that I don't trust my approach - but I know what my cluster can do, what hardware is installed and what I'm doing.

Gecko · Nov 24, 2023

Here's what I did...

Shutdown Cluster01
Install 2x 15 TB disks
Boot Cluster01
Wait for rebalance to complete
While waiting...
1. Verify 2x 15 TB disks show available in Server > Disks
2. Inspect SMART data for 2x 15 TB Disks
Rebalance is complete
Create New OSDs in rapid succession...
1. Create OSD for 15 TB Drive #1 on Cluster01
2. Create OSD for 15 TB Drive #2 on Cluster01
Wait for rebalance to complete
Rebalance is complete
OUT all four SAS/SATA drives in rapid succession on Cluster01
Wait for rebalance to complete
Rebalance is complete
STOP all four SAS/SATA drives on Cluster01
No rebalance triggered
DESTROY all four SAS/SATA drives on Cluster01
No rebalance triggered

At this point the Ceph cluster is in good shape for me to shut down Cluster01, remove the SAS/SATA disks, then install them into Cluster02.

Search

Search

Instructions to reallocate SSDs in Ceph

Gecko

Active Member

itNGO

Renowned Member

Gecko

Active Member

alexskysilk

Distinguished Member

sb-jw

Famous Member

itNGO

Renowned Member

sb-jw

Famous Member

Gecko

Active Member