That's normal behaviour. Data distribution amog Ceph OSDs can be adjusted manually using
ceph osd reweight
, but I feel easier to run
ceph osd reweight-by-utilization
from time to time depending on how often data changes in you cluster.
Here's an extract from Ceph's documentation (
https://docs.ceph.com/en/latest/rados/operations/control/):
Balance OSD fullness by reducing the override weight of OSDs which are overly utilized. Note that these override aka reweight values default to 1.00000 and are relative only to each other; they not absolute. It is crucial to distinguish them from CRUSH weights, which reflect the absolute capacity of a bucket in TiB. By default this command adjusts override weight on OSDs which have + or - 20% of the average utilization, but if you include a threshold that percentage will be used instead.
ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]
To limit the step by which any OSD’s reweight will be changed, specify max_change which defaults to 0.05. To limit the number of OSDs that will be adjusted, specify max_osds as well; the default is 4. Increasing these parameters can speed leveling of OSD utilization, at the potential cost of greater impact on client operations due to more data moving at once.
To determine which and how many PGs and OSDs will be affected by a given invocation you can test before executing.
ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]
Adding --no-increasing to either command prevents increasing any override weights that are currently < 1.00000. This can be useful when you are balancing in a hurry to remedy full or nearful OSDs or when some OSDs are being evacuated or slowly brought into service.
Deployments utilizing Nautilus (or later revisions of Luminous and Mimic) that have no pre-Luminous cients may instead wish to instead enable the balancer module for ceph-mgr.
Keep in mind that using this will produce data move from fuller OSD's to emptier ones, and may impact your service.