Instructions to reallocate SSDs in Ceph

Gecko

Active Member
Apr 10, 2018
17
1
43
45
Colorado
I have a three node Proxmox cluster. All three nodes are participating in the Ceph cluster, which is operating as a 3-way mirror.

When you look at my configuration, know that I had some drives die and had to pull them out of the cluster. This is what lead to the storage allocation imbalance.
Cluster01 = 28 TB
Cluster02 = 17.5 TB
Cluster03 = 28 TB

The replacement drives are 15 TB each. There is no clean way to insert the new drives into one/two of the servers, so I want to remove from the Ceph cluster, wipe, physically move, and add to the Ceph cluster most (all) of the drives from Cluster01 & 02 and redistribute them back into Cluster01, 02, & 03. The problem is that I don't know the correct series of steps to accomplish this. I am hoping someone here has a good series of steps for me to follow. I don't need a series of steps that covers each and every drive, but more of an overview with precise steps for handling one drive. From there I can extrapolate what to do for all the other drives.

So, to remove a disk from a Ceph cluster and prepare it for use on a different Ceph cluster member do I need to...
  • Set/unset Ceph's 'Manage Global Flags'?
  • Use the 'Stop' button?
  • Use the 'Out' button?
  • Use the 'Destroy' button?
  • Wipe specific sectors of the disk using the 'dd' command?
  • Anything else?
1700668476163.png
 
OUT.... wait until rebalance is done...
STOP ..... wait until rebalance is done....
DESTROY..... wait until rebalance is done....

Remove DISK
Insert new. Add to CEPH. Wait until rebalance is done.... and CEPH is "green" again!

AND! ONE AT A TIME!
 
Your suggested steps will involve a lot of rebalancing, which is a lot of wear and tear on the drives, not to mention the shear amount of time. Is there a way to tackle a whole server at a time? Perhaps something like...
  1. Disable backfill, rebalancing, recovery
  2. OUT all drives on Cluster01
  3. STOP all drives on Cluster01
  4. DESTROY all drives on Cluster01
  5. Shut down Cluster01
  6. Remove all drives from Cluster01
  7. Install 2x 15 TB drives into Cluster01
  8. Boot Cluster01
  9. Import 15 TB drives on Cluster01 into Ceph
  10. Enable backfill, rebalancing, recovery
  11. Wait for rebalance to complete
...then with the pile of newly unused drives, I can shut down Cluster02 and install drives there.
 
Your suggested steps will involve a lot of rebalancing,
There's no helping that. you will end up with the same amount of rebalancing in either scenario. Your method will leave you exposed (only two nodes in a 3 node domain) which means any OSD fault in the interim can lock up the whole cluster and possibly cause dataloss (PGs with only 2 OSDs disagreeing means there's no "truth".) This is why @itNGO mentioned, in CAPS, to do it one OSD at a time- to minimize that potential.

If the data is transient enough to make the risk worthwhile, you'd be better served by blowing away the cluster and remaking it and restoring from backup.
 
  • Like
Reactions: itNGO
There is also the hard way: stop and destroy OSD, all in one go.
This means that the PGs are undersized and the data has to be copied again. The new drive should be reconnected as quickly as possible and included as an OSD again. If you can do all of this within a few minutes, the CEPH will recover normally and would only want to restore a small amount of data to the new OSD.

This would basically be the path that creates the least amount of data movement. That's how I handle it in practice and haven't had any problems with it so far.

BUT BE CAREFUL: This can have a significant impact on the CEPH, although this should be less noticeable with SSDs / NVMe. With Replica 3, data loss is also unlikely.

But, I can't guarantee that nothing will happen here. You should be aware of what you are doing here and that you are potentially putting your data at risk.
 
It's your Cluster, but in my opinion, I would never follow a guide, where Dataloss is possible without having a good and fresh backup.
The rebalancing is the least problem. Even if the SSDs/NVMe have only 0.8 DWPD they aren't that full at the moment. Won't be more than 1/2% wear if not less....

Better save, than sorry..... no one here will give you the data back when you follow the guide and loose any data....
 
It's your Cluster, but in my opinion, I would never follow a guide, where Dataloss is possible without having a good and fresh backup.
I have never experienced data loss on any CEPH cluster, no matter how hard I hit it. I've even brought a CEPH cluster back to life in which all mons were irretrievably destroyed. Anyone who actually killed a CEPH in such a way that data was lost has my utmost respect (fire, flood, physical destruction, etc. excluded... ;) ).

Basically, when something like this happens, I always point out a possible loss of data in order to relieve myself of responsibility for it. It has less to do with the fact that I don't trust my approach - but I know what my cluster can do, what hardware is installed and what I'm doing.
 
Here's what I did...
  1. Shutdown Cluster01
  2. Install 2x 15 TB disks
  3. Boot Cluster01
  4. Wait for rebalance to complete
  5. While waiting...
    1. Verify 2x 15 TB disks show available in Server > Disks
    2. Inspect SMART data for 2x 15 TB Disks
  6. Rebalance is complete
  7. Create New OSDs in rapid succession...
    1. Create OSD for 15 TB Drive #1 on Cluster01
    2. Create OSD for 15 TB Drive #2 on Cluster01
  8. Wait for rebalance to complete
  9. Rebalance is complete
  10. OUT all four SAS/SATA drives in rapid succession on Cluster01
  11. Wait for rebalance to complete
  12. Rebalance is complete
  13. STOP all four SAS/SATA drives on Cluster01
  14. No rebalance triggered
  15. DESTROY all four SAS/SATA drives on Cluster01
  16. No rebalance triggered
At this point the Ceph cluster is in good shape for me to shut down Cluster01, remove the SAS/SATA disks, then install them into Cluster02.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!