Hello,
So the time has arrived to upgrade our ceph cluster because of degrading I/O performance.
I believe we've stretched our 6 OSDs quite enough
Huge problem!
When I added a new OSD in the mix the cluster immediately started doing back fill and recovery on placement groups in order to populate the new OSD.
This caused disastrous I/O usage across the cluster and the VMs became unmanageable.
At this point I've decided to actually read the manual
Here are some of my best practice recommendations:
1. Always use a separate physical network for recovery.
In the ceph config file specify the two networks like so:
If you have replication of 3 the cluster network will use ~3x the bandwidth the public network will, so keep that in mind.
2. Don't spend money on SSDs for journaling. I've failed to see any improvements doing this.
If you have money to spend, I'd suggest investing in LSI CacheCade or something similar.
3. This is how I solved the I/O crisis when adding/removing OSDs:
In the [osd] section of the ceph config file add these parameters:
The defaults are 10 for backfills and 15 I believe for recovery.
I've tried these settings and compared to the default settings on various setups and I've seen no difference in recovery speeds to account for the huge I/O usage.
The only time I've seen benefits in raising these was when I've provisioned the new CacheCade storage nodes. You only need to look out for network bottlenecks in this case.
Also a good practice when adding a new OSD is to start with a weight of 0.2 and increase after rebuild is done.
This only minimizes impact in case you introduce a faulty or very slow drive in the cluster by mistake and you decide to remove it shortly.
It's also good to know this if you plan to slowly move data to an OSD node and you want to keep an eye on bandwidth consumption.
Here is an example:
Adding a 2TB hard drive, proxmox will see it as an 1.8 TB OSD and assign a default weight of 1.8
Imediately after activating the new OSD do this in the terminal:
After rebuild is done you can increase the weight until you reach 1.8
It would be nice if proxmox would permit setting a custom weight before starting the OSD and also it would be nice to be able to modify the weight in the GUI.
Some drop-down or text field config tools for ceph config would also help a lot of unskilled admins.
Also it would be nice to see a dedicated section for ceph in the forums.
Best Regards,
Marius
So the time has arrived to upgrade our ceph cluster because of degrading I/O performance.
I believe we've stretched our 6 OSDs quite enough
Huge problem!
When I added a new OSD in the mix the cluster immediately started doing back fill and recovery on placement groups in order to populate the new OSD.
This caused disastrous I/O usage across the cluster and the VMs became unmanageable.
At this point I've decided to actually read the manual
Here are some of my best practice recommendations:
1. Always use a separate physical network for recovery.
In the ceph config file specify the two networks like so:
Code:
[COLOR=#000000][FONT=tahoma]public network = 172.16.1.0/24
[/FONT][/COLOR][COLOR=#000000][FONT=tahoma]cluster network = 172.16.2.0/24
[/FONT][/COLOR]
2. Don't spend money on SSDs for journaling. I've failed to see any improvements doing this.
If you have money to spend, I'd suggest investing in LSI CacheCade or something similar.
3. This is how I solved the I/O crisis when adding/removing OSDs:
In the [osd] section of the ceph config file add these parameters:
Code:
osd max backfills = 1
osd recovery max active = 1
I've tried these settings and compared to the default settings on various setups and I've seen no difference in recovery speeds to account for the huge I/O usage.
The only time I've seen benefits in raising these was when I've provisioned the new CacheCade storage nodes. You only need to look out for network bottlenecks in this case.
Also a good practice when adding a new OSD is to start with a weight of 0.2 and increase after rebuild is done.
This only minimizes impact in case you introduce a faulty or very slow drive in the cluster by mistake and you decide to remove it shortly.
It's also good to know this if you plan to slowly move data to an OSD node and you want to keep an eye on bandwidth consumption.
Here is an example:
Adding a 2TB hard drive, proxmox will see it as an 1.8 TB OSD and assign a default weight of 1.8
Imediately after activating the new OSD do this in the terminal:
Code:
ceph osd crush reweight osd.7 0.2
After rebuild is done you can increase the weight until you reach 1.8
It would be nice if proxmox would permit setting a custom weight before starting the OSD and also it would be nice to be able to modify the weight in the GUI.
Some drop-down or text field config tools for ceph config would also help a lot of unskilled admins.
Also it would be nice to see a dedicated section for ceph in the forums.
Best Regards,
Marius
Last edited: