OSDs migration from Filestore to Bluestore

guipve

New Member
Dec 3, 2018
8
0
1
47
Hello,

We have made tests of Proxmox/Ceph on quite old HDDs (3 nodes, 3 HDDs per node, dedicated Gb network for Ceph). We have created the OSDs with Filestore and stored the journal on each OSD. As expected, the performances are not good.

Question 1 :
What would be the best to do now to improve the performances with minimum cost :
1. Keep Filestore and add SSD disk for journal (1 SSD per node - a regular SSD like Samsung 860 Evo)
2. Upgrade all the OSDs to Bluestore
3. Upgrade the OSDs to Bluestore + add SSD disk for DB/WAL (1 SSD per node - a regular SSD like Samsung 860 Evo)

Question 2 :
In order to do any of that, we will have to re-create each OS. Can we do it all form the GUI ? If yes, what would be the scenario :
OSD 1 :
- click on Stop
- wait for the cluster to get back to an healthy state
- click on Out
- wait for the cluster to get back on a healthy state
- click on Destroy
- click on add OSD
- ... ?

Or, not to touch the Crush map, would it make more sense to destroy it and recreate with the same ID (what would then be the commands and the steps to do it with minimum risk) ?

If anyone has some hints, that would be great.

Thanks,
Gui
 
3. Upgrade the OSDs to Bluestore + add SSD disk for DB/WAL
With "old" spinners (depending on model, maybe no improvement) that setup might help, but don't use EVOs, the have a bad sync rate.

In order to do any of that, we will have to re-create each OS. Can we do it all form the GUI ?
You don't need to recreate the OS. Depending on the release you are on, a destroy OSD and re-creation with the new settings might be all that is needed. To say in advance, Proxmox VE 4.x is EoL and you need at least Ceph Jewel to start using Bluestore OSDs (better luminous).
 
Hi Alwin,

Thanks for the quick answer. I guess we will then only update the OSDs to Bluestore.

We have the latest releases :
PVE : pve-manager/5.2-12/ba196e4b (running kernel: 4.15.17-1-pve)
Ceph : ceph version 12.2.8

So, the only thing to do is :
- select the OSD in the GUI
- click on destroy (not Stop or Out before ?)
- wait for the cluster to be healthy again (or not ?)
- create a new OSD with bluestore this time (do we have to change some IDs ?)
- wait for the cluster to be healthy again (or not ?)
- redo that for each OSD

Thanks for your help,
gui
 
So, the only thing to do is :
- select the OSD in the GUI
- click on destroy
- wait for the cluster to be healthy again (or not ?)
- create a new OSD with bluestore this time (do we have to change some IDs ?)
- wait for the cluster to be healthy again (or not ?)
- redo that for each OSD
Forget the things in the brackets. ;) Destroy OSD, wait till healthy, create OSD with bluestore, wait till healthy and repeat.
 
Got it :)
Yet, Destroy is not clickable. Shouldn't I use Stop or Out (or both) before ?
 
Yet, Destroy is not clickable. Shouldn't I use Stop or Out (or both) before ?
OFC, this needs to be done before. Also make sure you have enough disk space available on the other disks on the host. Else the recovery will not complete.
 
Thanks, it worked like a charm :)

Yet, what I have done is : took the OSD out, wait for the cluster to be healthy, then stop the OSD, destroy the OSD, re-add it with Bluestore and wait again for the cluster get back in a healthy state (so had to wait twice to get back in a healthy state, which is quite long).

When I look at the Ceph doc, I can see that, to replace an OSD (not to remove it) it is just needed to destroy it (http://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-osds/) - what I have done was mainly the procedure to remove the OSD.

So, with the PVE GUI, can I just stop the OSD (in order to have the Destroy button available), Destroy it and re-add it ? Just wondering if in doing so with the GUI, it is equivalent to what is mentionned in the Ceph documentation to replace an OSD (or switch it to Bluestore).

Thanks a lot !
 
So, with the PVE GUI, can I just stop the OSD (in order to have the Destroy button available), Destroy it and re-add it ? Just wondering if in doing so with the GUI, it is equivalent to what is mentionned in the Ceph documentation to replace an OSD (or switch it to Bluestore).
When you destroy an OSD, you may recreate the OSD in one go. Most admins like to wait tough. Once you're comfortable with replacing OSDs and how the cluster is behaving, you may even replace complete nodes (depending on crush and ceph settings).
 
Thanks, we could achieve the migration with no problem. We have played with the backfill fields to speed it up.

As expected, on simple test (dd copy), the write peformances improved by around 20%. Yet, the reading perf on the same tests dropped by 70%. Before digging, would you see any obvious reasons (as we didn't change anything on the default installation settings of Proxmox, so no tuning) ?
 
The 20% improvement in writes, is from the fact that there are no double writes anymore. The 70% loss come from the fact that bluestore uses its own cache, filestore had the whole page cache available for reading. In the end, only faster disks will give you better performance (eg. SSD pool).

EDIT: try fiddling with the bluestore cache, it may help.
 
Thanks, it makes sense indeed.
Any specifics to play with to run some tuning test with the cache (not sure I get all these ratios things so far) ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!