Ceph: Adding new disk / node and mitigate load impacts

fitbrian

New Member
Jul 3, 2021
12
1
3
Czechia
Hello guys,

it's not usual I ask for help publicly but I need your help and real experiences. I am pretty much a rookie with Ceph.

We have a ceph cluster in this setup:

- 3 nodes:
- node1: 2 used disks, 1 un-used (2 OSD)
- node2: 3 used disks (3 OSD)
- node3: 2 used disks (2 OSD)

There are 3 pools:

  • vmdata:
    • RBD for VMs
    • Size/min: 3/2
    • # placement groups (PG): 16
  • cephfs_data:
    • CephFS
    • Size/min: 3/2
    • # placement groups (PG): 64
  • cephfs_metadata:
    • size/min: 3/2
    • # placement groups (PG): 8

The problem is, we are currently running out of space on cephfs and we would like to add an unused drive on node1 to the ceph cluster. The problem is that the services and traffic are quiet heavy and I am afraid about the speed impact on the whole cluster during data sync.

So my first question is, how to smoothly add the unused drive to the ceph cluster without a negative effect on the cluster performance? What to tune and for what values to keep syncing of newly added hard drive really calm and easy. It is really crucial to keep cluster performance unaffected as much as possible. The other question is how to step in case of adding a whole new node (with a bunch of hard drives) to the ceph cluster also with the same target - unaffected cluster performance during sync.

My last question is what else is needed to change in cluster / PGs configuration regarding to newly added node? Change size/min ? Change # of PGs?

Thank you so much and appreciate your help.
 
So my first question is, how to smoothly add the unused drive to the ceph cluster without a negative effect on the cluster performance?

Many tunables control recovery and backfill speed. 1, 2. Since you care more about load on the network and less about the speed of rebalancing, adjust down from defaults.

The other question is how to step in case of adding a whole new node (with a bunch of hard drives) to the ceph cluster also with the same target - unaffected cluster performance during sync.

You can do the same thing. You can also set each OSD that is newly added to a very low reweight. Slowly raising the reweight will slowly adjust data around the cluster.

My last question is what else is needed to change in cluster / PGs configuration regarding to newly added node? Change size/min ? Change # of PGs?

If you have autoscale turned on, you don't need to manually do anything. If you do not have autoscale, you may need to adjust pgs.


However, you have a disaster waiting to happen. If you are running a hyper-converged infrastructure with compute and storage on the same hardware, you need the capacity to handle both. You are concerned that the networking bandwidth can't handle your VM traffic and the Ceph traffic. What happens when one or more OSD inevitably fail triggering a recovery? Sure, you can tune it to be slow... but that just leaves the cluster at risk longer. I would consider either upgrading the entire network or adding a dedicated storage network for Ceph in the future. You don't want to choking either your VM or storage traffic.
 
Many tunables control recovery and backfill speed. 1, 2. Since you care more about load on the network and less about the speed of rebalancing, adjust down from defaults.



You can do the same thing. You can also set each OSD that is newly added to a very low reweight. Slowly raising the reweight will slowly adjust data around the cluster.



If you have autoscale turned on, you don't need to manually do anything. If you do not have autoscale, you may need to adjust pgs.


However, you have a disaster waiting to happen. If you are running a hyper-converged infrastructure with compute and storage on the same hardware, you need the capacity to handle both. You are concerned that the networking bandwidth can't handle your VM traffic and the Ceph traffic. What happens when one or more OSD inevitably fail triggering a recovery? Sure, you can tune it to be slow... but that just leaves the cluster at risk longer. I would consider either upgrading the entire network or adding a dedicated storage network for Ceph in the future. You don't want to choking either your VM or storage traffic.
Well, thank you so much for your help and explanation. I have tried to add unused disk by follows your steps and I can confirm, everything is working correctly. Also thanks for your thoughts about network capacity. We will consider it and discuss it. Appreciate your help.
 
  • Like
Reactions: jasonsansone

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!