Any tips on balancing osd data usage?

jsterr

Renowned Member
Jul 24, 2020
787
223
68
33
Hello Proxmox Community :)

1642498240861.png

The usage differs from 52 to 73 % (20%) - is there a way to keep it more balanced? I prev. had a problem that one osd had 95% usage, that caused some serious problems but I was able to fix it - but I want to make sure that it does not happen again.

1642498532920.png

Any tips? Thank u for your help, I appreciate some links that push me into the right direction.
 
Last edited:
That's normal behaviour. Data distribution amog Ceph OSDs can be adjusted manually using ceph osd reweight, but I feel easier to run ceph osd reweight-by-utilization from time to time depending on how often data changes in you cluster.

Here's an extract from Ceph's documentation (https://docs.ceph.com/en/latest/rados/operations/control/):



Balance OSD fullness by reducing the override weight of OSDs which are overly utilized. Note that these override aka reweight values default to 1.00000 and are relative only to each other; they not absolute. It is crucial to distinguish them from CRUSH weights, which reflect the absolute capacity of a bucket in TiB. By default this command adjusts override weight on OSDs which have + or - 20% of the average utilization, but if you include a threshold that percentage will be used instead.

ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]

To limit the step by which any OSD’s reweight will be changed, specify max_change which defaults to 0.05. To limit the number of OSDs that will be adjusted, specify max_osds as well; the default is 4. Increasing these parameters can speed leveling of OSD utilization, at the potential cost of greater impact on client operations due to more data moving at once.

To determine which and how many PGs and OSDs will be affected by a given invocation you can test before executing.

ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]

Adding --no-increasing to either command prevents increasing any override weights that are currently < 1.00000. This can be useful when you are balancing in a hurry to remedy full or nearful OSDs or when some OSDs are being evacuated or slowly brought into service.

Deployments utilizing Nautilus (or later revisions of Luminous and Mimic) that have no pre-Luminous cients may instead wish to instead enable the balancer module for ceph-mgr.



Keep in mind that using this will produce data move from fuller OSD's to emptier ones, and may impact your service.
 
I have the same problem,
lowest at 50% and highest at 89.
running the command "ceph osd reweight-by-utilization" initiate some re balancing, running it few more times until it looks better.

can it be automated ?
 
I have the same problem,
lowest at 50% and highest at 89.
running the command "ceph osd reweight-by-utilization" initiate some re balancing, running it few more times until it looks better.

can it be automated ?

How many OSD (ceph osd tree) and pools (ceph osd pool ls detail) do you have?

Unless your data changes a lot (like deleting and writing a completely say 50% of your capacity), chances are that once you reach a good reweight it will help to keep your data shared among your OSDs.

You can place a cron to start that command, but I just prefer to monitor the whole process and control when, how and why I use a reweight.
 
  • Like
Reactions: herzkerl

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!