Scenario
I have a Proxmox VE 8 cluster with 6 nodes, using CEPH as distributed storage. The cluster consists of 48 OSDs, distributed across 4 servers with SSDs and 2 with HDDs.
Monday night, three OSDs reached 100% capacity and crashed:
- osd.16 (pve118)
- osd.23 (pve118)
- osd.24 (pve119)


Troubleshooting Steps Taken
- Removed and re-added the problematic OSDs.
- Enabled rebalancing (ceph osd unset norebalance and ceph osd unset norecover).
- Tried to force PG relocation with ceph osd pg-upmap-items, but PGs did not move.
- Executed ceph osd reweight on full OSDs, but the issue persists.
- Considered using ceph-bluestore-tool bluefs-bdev-migrate, but I’m unsure if it applies in this case.
Current Cluster Status
Output of ceph -s:
Code:
root@pve118:~# ceph -s
cluster:
id: 52d10d07-2f32-41e7-b8cf-7d7282af69a2
health: HEALTH_WARN
2 nearfull osd(s)
Degraded data redundancy: 1668878/14228676 objects degraded (11.729%), 45 pgs degraded, 45 pgs undersized
23 pgs not deep-scrubbed in time
23 pgs not scrubbed in time
2 pool(s) nearfull
services:
mon: 6 daemons, quorum pve118,pve119,pve114,pve142,pve143,pve117 (age 5d)
mgr: pve119(active, since 17h), standbys: pve118, pve117, pve143, pve142, pve114
osd: 48 osds: 48 up (since 23h), 48 in (since 23h); 83 remapped pgs
data:
pools: 3 pools, 289 pgs
objects: 4.74M objects, 17 TiB
usage: 47 TiB used, 156 TiB / 203 TiB avail
pgs: 1668878/14228676 objects degraded (11.729%)
3073835/14228676 objects misplaced (21.603%)
161 active+clean
81 active+clean+remapped
45 active+undersized+degraded
2 active+clean+remapped+scrubbing+deep
io:
client: 85 KiB/s rd, 7.9 MiB/s wr, 6 op/s rd, 351 op/s wr
root@pve118:~#
Output of ceph osd df shows some OSDs at 90%+ usage, while others are below 20%.
Code:
root@pve118:~# ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 202.57080 - 203 TiB 47 TiB 47 TiB 858 KiB 134 GiB 156 TiB 23.07 1.00 - root default
-9 6.98633 - 7.0 TiB 1.4 TiB 1.4 TiB 77 KiB 5.4 GiB 5.5 TiB 20.65 0.90 - host pve114
0 ssd 0.87329 1.00000 894 GiB 193 GiB 192 GiB 20 KiB 683 MiB 701 GiB 21.56 0.93 25 up osd.0
1 ssd 0.87329 1.00000 894 GiB 194 GiB 193 GiB 12 KiB 853 MiB 701 GiB 21.64 0.94 21 up osd.1
2 ssd 0.87329 1.00000 894 GiB 128 GiB 128 GiB 3 KiB 505 MiB 766 GiB 14.37 0.62 5 up osd.2
3 ssd 0.87329 1.00000 894 GiB 193 GiB 192 GiB 10 KiB 745 MiB 701 GiB 21.56 0.93 35 up osd.3
4 ssd 0.87329 1.00000 894 GiB 129 GiB 128 GiB 12 KiB 576 MiB 766 GiB 14.39 0.62 10 up osd.4
5 ssd 0.87329 1.00000 894 GiB 193 GiB 192 GiB 9 KiB 576 MiB 701 GiB 21.58 0.94 10 up osd.5
6 ssd 0.87329 1.00000 894 GiB 320 GiB 319 GiB 6 KiB 736 MiB 574 GiB 35.80 1.55 5 up osd.6
7 ssd 0.87329 1.00000 894 GiB 128 GiB 127 GiB 5 KiB 821 MiB 766 GiB 14.31 0.62 20 up osd.7
-3 6.98633 - 7.0 TiB 4.3 TiB 4.3 TiB 166 KiB 15 GiB 2.7 TiB 61.28 2.66 - host pve117
8 ssd 0.87329 0.95000 894 GiB 686 GiB 683 GiB 28 KiB 2.5 GiB 209 GiB 76.67 3.32 20 up osd.8
9 ssd 0.87329 1.00000 894 GiB 137 GiB 136 GiB 24 KiB 939 MiB 757 GiB 15.36 0.67 11 up osd.9
10 ssd 0.87329 0.95000 894 GiB 684 GiB 682 GiB 15 KiB 2.3 GiB 210 GiB 76.52 3.32 15 up osd.10
11 ssd 0.87329 0.95001 894 GiB 687 GiB 685 GiB 27 KiB 1.9 GiB 207 GiB 76.87 3.33 15 up osd.11
12 ssd 0.87329 0.95000 894 GiB 681 GiB 680 GiB 26 KiB 1.4 GiB 213 GiB 76.20 3.30 10 up osd.12
13 ssd 0.87329 0.95001 894 GiB 685 GiB 683 GiB 19 KiB 2.1 GiB 209 GiB 76.64 3.32 30 up osd.13
14 ssd 0.87329 1.00000 894 GiB 138 GiB 136 GiB 11 KiB 1.8 GiB 756 GiB 15.42 0.67 26 up osd.14
15 ssd 0.87329 0.95000 894 GiB 685 GiB 683 GiB 16 KiB 2.0 GiB 209 GiB 76.59 3.32 15 up osd.15
-5 6.98633 - 7.0 TiB 2.7 TiB 2.7 TiB 105 KiB 10 GiB 4.3 TiB 38.25 1.66 - host pve118
16 ssd 0.87329 1.00000 894 GiB 217 MiB 169 MiB 1 KiB 48 MiB 894 GiB 0.02 0.00 25 up osd.16
17 ssd 0.87329 0.95001 894 GiB 818 GiB 816 GiB 16 KiB 1.9 GiB 77 GiB 91.44 3.96 11 up osd.17
18 ssd 0.87329 0.95001 894 GiB 819 GiB 817 GiB 18 KiB 2.1 GiB 75 GiB 91.61 3.97 16 up osd.18
19 ssd 0.87329 1.00000 894 GiB 274 GiB 272 GiB 12 KiB 1.6 GiB 620 GiB 30.64 1.33 12 up osd.19
20 ssd 0.87329 1.00000 894 GiB 139 GiB 137 GiB 14 KiB 1.6 GiB 755 GiB 15.54 0.67 46 up osd.20
21 ssd 0.87329 1.00000 894 GiB 138 GiB 137 GiB 16 KiB 677 MiB 757 GiB 15.38 0.67 16 up osd.21
22 ssd 0.87329 1.00000 894 GiB 549 GiB 546 GiB 27 KiB 2.5 GiB 346 GiB 61.34 2.66 14 up osd.22
23 ssd 0.87329 1.00000 894 GiB 174 MiB 130 MiB 1 KiB 44 MiB 894 GiB 0.02 0 20 up osd.23
-7 6.98633 - 7.0 TiB 3.1 TiB 3.1 TiB 135 KiB 14 GiB 3.9 TiB 44.04 1.91 - host pve119
24 ssd 0.87329 1.00000 894 GiB 125 MiB 98 MiB 1 KiB 26 MiB 894 GiB 0.01 0 10 up osd.24
25 ssd 0.87329 0.95000 894 GiB 686 GiB 684 GiB 20 KiB 2.5 GiB 208 GiB 76.77 3.33 10 up osd.25
26 ssd 0.87329 1.00000 894 GiB 409 GiB 407 GiB 10 KiB 2.2 GiB 485 GiB 45.75 1.98 8 up osd.26
27 ssd 0.87329 1.00000 894 GiB 408 GiB 406 GiB 13 KiB 2.2 GiB 486 GiB 45.67 1.98 23 up osd.27
28 ssd 0.87329 0.95000 894 GiB 684 GiB 682 GiB 32 KiB 2.1 GiB 210 GiB 76.52 3.32 20 up osd.28
29 ssd 0.87329 1.00000 894 GiB 413 GiB 411 GiB 20 KiB 1.9 GiB 481 GiB 46.17 2.00 8 up osd.29
30 ssd 0.87329 1.00000 894 GiB 412 GiB 410 GiB 23 KiB 2.1 GiB 482 GiB 46.10 2.00 33 up osd.30
31 ssd 0.87329 1.00000 894 GiB 137 GiB 136 GiB 16 KiB 895 MiB 757 GiB 15.36 0.67 11 up osd.31
-16 87.31274 - 87 TiB 18 TiB 18 TiB 184 KiB 47 GiB 70 TiB 20.35 0.88 - host pve142
32 hdd 10.91409 1.00000 11 TiB 2.4 TiB 2.4 TiB 26 KiB 5.9 GiB 8.5 TiB 22.03 0.95 19 up osd.32
33 hdd 10.91409 1.00000 11 TiB 1.7 TiB 1.7 TiB 25 KiB 4.6 GiB 9.2 TiB 15.91 0.69 13 up osd.33
34 hdd 10.91409 1.00000 11 TiB 2.0 TiB 2.0 TiB 16 KiB 4.4 GiB 8.9 TiB 18.37 0.80 15 up osd.34
35 hdd 10.91409 1.00000 11 TiB 2.7 TiB 2.7 TiB 23 KiB 7.1 GiB 8.2 TiB 24.49 1.06 20 up osd.35
36 hdd 10.91409 1.00000 11 TiB 1.9 TiB 1.9 TiB 22 KiB 4.9 GiB 9.0 TiB 17.14 0.74 14 up osd.36
37 hdd 10.91409 1.00000 11 TiB 1.2 TiB 1.2 TiB 13 KiB 3.3 GiB 9.7 TiB 10.99 0.48 9 up osd.37
38 hdd 10.91409 1.00000 11 TiB 3.5 TiB 3.5 TiB 33 KiB 11 GiB 7.4 TiB 31.84 1.38 26 up osd.38
39 hdd 10.91409 1.00000 11 TiB 2.4 TiB 2.4 TiB 26 KiB 6.0 GiB 8.5 TiB 22.01 0.95 18 up osd.39
-19 87.31274 - 87 TiB 17 TiB 17 TiB 191 KiB 43 GiB 70 TiB 20.04 0.87 - host pve143
40 hdd 10.91409 1.00000 11 TiB 2.7 TiB 2.7 TiB 20 KiB 7.1 GiB 8.2 TiB 24.46 1.06 20 up osd.40
41 hdd 10.91409 1.00000 11 TiB 1.9 TiB 1.9 TiB 12 KiB 4.4 GiB 9.0 TiB 17.18 0.74 15 up osd.41
42 hdd 10.91409 1.00000 11 TiB 1.9 TiB 1.9 TiB 36 KiB 4.7 GiB 9.0 TiB 17.09 0.74 14 up osd.42
43 hdd 10.91409 1.00000 11 TiB 3.2 TiB 3.2 TiB 27 KiB 8.2 GiB 7.7 TiB 29.38 1.27 24 up osd.43
44 hdd 10.91409 1.00000 11 TiB 2.1 TiB 2.1 TiB 20 KiB 4.8 GiB 8.8 TiB 19.57 0.85 16 up osd.44
45 hdd 10.91409 1.00000 11 TiB 1.6 TiB 1.6 TiB 27 KiB 4.1 GiB 9.3 TiB 14.71 0.64 12 up osd.45
46 hdd 10.91409 1.00000 11 TiB 2.1 TiB 2.1 TiB 20 KiB 4.7 GiB 8.8 TiB 19.58 0.85 16 up osd.46
47 hdd 10.91409 1.00000 11 TiB 2.0 TiB 2.0 TiB 29 KiB 4.7 GiB 8.9 TiB 18.33 0.79 15 up osd.47
TOTAL 203 TiB 47 TiB 47 TiB 882 KiB 134 GiB 156 TiB 23.07
MIN/MAX VAR: 0/3.97 STDDEV: 27.96
root@pve118:~#
Questions and Help Request
- How can I force PG reallocation to OSDs with available space?
- Why is pg-upmap-items not working for PG redistribution?
- Is there any other method to free up space on these OSDs and rebalance the cluster?
Any help from the community would be greatly appreciated!
I can provide logs or additional command outputs if needed.
Thanks in advance for your support!
Last edited: