Any tips on balancing osd data usage?

jsterr · Jan 18, 2022

Hello Proxmox Community

The usage differs from 52 to 73 % (20%) - is there a way to keep it more balanced? I prev. had a problem that one osd had 95% usage, that caused some serious problems but I was able to fix it - but I want to make sure that it does not happen again.

Any tips? Thank u for your help, I appreciate some links that push me into the right direction.

VictorSTS · Jan 18, 2022

That's normal behaviour. Data distribution amog Ceph OSDs can be adjusted manually using ceph osd reweight, but I feel easier to run ceph osd reweight-by-utilization from time to time depending on how often data changes in you cluster.

Here's an extract from Ceph's documentation (https://docs.ceph.com/en/latest/rados/operations/control/):

Balance OSD fullness by reducing the override weight of OSDs which are overly utilized. Note that these override aka reweight values default to 1.00000 and are relative only to each other; they not absolute. It is crucial to distinguish them from CRUSH weights, which reflect the absolute capacity of a bucket in TiB. By default this command adjusts override weight on OSDs which have + or - 20% of the average utilization, but if you include a threshold that percentage will be used instead.

ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]

To limit the step by which any OSD’s reweight will be changed, specify max_change which defaults to 0.05. To limit the number of OSDs that will be adjusted, specify max_osds as well; the default is 4. Increasing these parameters can speed leveling of OSD utilization, at the potential cost of greater impact on client operations due to more data moving at once.

To determine which and how many PGs and OSDs will be affected by a given invocation you can test before executing.

ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]

Adding --no-increasing to either command prevents increasing any override weights that are currently < 1.00000. This can be useful when you are balancing in a hurry to remedy full or nearful OSDs or when some OSDs are being evacuated or slowly brought into service.

Deployments utilizing Nautilus (or later revisions of Luminous and Mimic) that have no pre-Luminous cients may instead wish to instead enable the balancer module for ceph-mgr.

Keep in mind that using this will produce data move from fuller OSD's to emptier ones, and may impact your service.

ilia987 · May 29, 2022

I have the same problem,
lowest at 50% and highest at 89.
running the command "ceph osd reweight-by-utilization" initiate some re balancing, running it few more times until it looks better.

can it be automated ?

VictorSTS · May 30, 2022

ilia987 said:
I have the same problem,
lowest at 50% and highest at 89.
running the command "ceph osd reweight-by-utilization" initiate some re balancing, running it few more times until it looks better.

can it be automated ?

How many OSD (ceph osd tree) and pools (ceph osd pool ls detail) do you have?

Unless your data changes a lot (like deleting and writing a completely say 50% of your capacity), chances are that once you reach a good reweight it will help to keep your data shared among your OSDs.

You can place a cron to start that command, but I just prefer to monitor the whole process and control when, how and why I use a reweight.

Search

Search

Any tips on balancing osd data usage?

jsterr

Renowned Member

VictorSTS

Famous Member

ilia987

Well-Known Member

VictorSTS

Famous Member

We value your privacy