Use a different shared storage for the interim.
I have removed 3 of the osd's to start migrating data over. and we're adding 6 more this afternoon so we have enough room to start moving the data.
Things to know, would be what the status of Ceph ceph -s
is and how the OSDs and PGs ceph osd df tree
are distributed.
ceph -s
cluster:
id: 248fab2c-bd08-43fb-a562-08144c019785
health: HEALTH_WARN
1 pool(s) have no replicas configured
6 daemons have recently crashed
services:
mon: 3 daemons, quorum c4,c6,c5 (age 18h)
mgr: c6(active, since 18h), standbys: c4
osd: 35 osds: 35 up (since 12h), 32 in (since 12h); 145 remapped pgs
data:
pools: 1 pools, 1024 pgs
objects: 3.48M objects, 13 TiB
usage: 14 TiB used, 39 TiB / 54 TiB avail
pgs: 420711/3479255 objects misplaced (12.092%)
878 active+clean
135 active+remapped+backfill_wait
10 active+remapped+backfilling
1 active+clean+scrubbing+deep
io:
client: 42 MiB/s rd, 4.9 MiB/s wr, 124 op/s rd, 204 op/s wr
recovery: 37 MiB/s, 9 objects/s
progress:
Rebalancing after osd.8 marked out
[===================...........]
Rebalancing after osd.7 marked out
[========......................]
Rebalancing after osd.30 marked out
[===================...........]
ceph osd df tree
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 53.84706 - 48 TiB 14 TiB 14 TiB 9.7 MiB 41 GiB 35 TiB 28.55 1.00 - root default
-16 19.27872 - 17 TiB 5.8 TiB 5.8 TiB 3.1 MiB 17 GiB 12 TiB 33.34 1.17 - host c4
1 hdd 1.81940 1.00000 1.8 TiB 824 GiB 822 GiB 257 KiB 2.7 GiB 1.0 TiB 44.25 1.55 58 up osd.1
4 hdd 1.81940 1.00000 1.8 TiB 679 GiB 678 GiB 315 KiB 1.5 GiB 1.2 TiB 36.46 1.28 51 up osd.4
5 hdd 1.81940 1.00000 1.8 TiB 507 GiB 506 GiB 371 KiB 1.3 GiB 1.3 TiB 27.23 0.95 35 up osd.5
6 hdd 1.81940 1.00000 1.8 TiB 732 GiB 731 GiB 274 KiB 1.5 GiB 1.1 TiB 39.30 1.38 56 up osd.6
7 hdd 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 38 up osd.7
9 hdd 1.81940 1.00000 1.8 TiB 584 GiB 582 GiB 303 KiB 1.4 GiB 1.2 TiB 31.33 1.10 41 up osd.9
11 hdd 1.09079 1.00000 1.1 TiB 299 GiB 298 GiB 245 KiB 1.0 GiB 818 GiB 26.80 0.94 22 up osd.11
14 hdd 1.81940 1.00000 1.8 TiB 567 GiB 566 GiB 251 KiB 1.3 GiB 1.3 TiB 30.46 1.07 43 up osd.14
15 hdd 0.90819 1.00000 930 GiB 289 GiB 288 GiB 121 KiB 1024 MiB 641 GiB 31.08 1.09 22 up osd.15
16 hdd 0.90819 1.00000 930 GiB 223 GiB 222 GiB 276 KiB 1024 MiB 707 GiB 23.97 0.84 17 up osd.16
21 hdd 0.90819 1.00000 930 GiB 224 GiB 223 GiB 229 KiB 1024 MiB 706 GiB 24.13 0.84 17 up osd.21
22 hdd 0.90819 1.00000 930 GiB 431 GiB 430 GiB 163 KiB 1.6 GiB 499 GiB 46.37 1.62 33 up osd.22
26 hdd 1.81940 1.00000 1.8 TiB 600 GiB 599 GiB 330 KiB 1.5 GiB 1.2 TiB 32.22 1.13 46 up osd.26
-7 18.19397 - 16 TiB 4.8 TiB 4.8 TiB 2.0 MiB 13 GiB 12 TiB 29.40 1.03 - host c5
0 hdd 0.90970 1.00000 932 GiB 257 GiB 256 GiB 129 KiB 1024 MiB 674 GiB 27.64 0.97 15 up osd.0
3 hdd 0.90970 1.00000 932 GiB 350 GiB 349 GiB 136 KiB 1.2 GiB 581 GiB 37.59 1.32 18 up osd.3
12 hdd 1.81940 1.00000 1.8 TiB 583 GiB 581 GiB 242 KiB 1.6 GiB 1.3 TiB 31.29 1.10 41 up osd.12
13 hdd 1.81940 1.00000 1.8 TiB 513 GiB 511 GiB 255 KiB 1.4 GiB 1.3 TiB 27.52 0.96 38 up osd.13
17 hdd 1.81940 1.00000 1.8 TiB 666 GiB 665 GiB 258 KiB 1.6 GiB 1.2 TiB 35.77 1.25 47 up osd.17
18 hdd 1.81940 1.00000 1.8 TiB 718 GiB 717 GiB 221 KiB 1.5 GiB 1.1 TiB 38.55 1.35 48 up osd.18
23 hdd 1.81940 1.00000 1.8 TiB 308 GiB 307 GiB 162 KiB 1024 MiB 1.5 TiB 16.53 0.58 22 up osd.23
27 hdd 1.81940 1.00000 1.8 TiB 548 GiB 546 GiB 207 KiB 1.5 GiB 1.3 TiB 29.40 1.03 37 up osd.27
28 hdd 1.81940 1.00000 1.8 TiB 725 GiB 724 GiB 302 KiB 1.5 GiB 1.1 TiB 38.93 1.36 50 up osd.28
29 hdd 1.81940 1.00000 1.8 TiB 261 GiB 260 GiB 114 KiB 1024 MiB 1.6 TiB 14.03 0.49 18 up osd.29
30 hdd 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 1 up osd.30
-5 16.37437 - 15 TiB 3.2 TiB 3.2 TiB 4.7 MiB 11 GiB 11 TiB 21.85 0.77 - host c6
2 hdd 1.09160 1.00000 1.1 TiB 275 GiB 274 GiB 469 KiB 1.0 GiB 842 GiB 24.63 0.86 13 up osd.2
8 hdd 1.81940 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 1 up osd.8
10 hdd 1.81940 1.00000 1.8 TiB 385 GiB 384 GiB 554 KiB 1.0 GiB 1.4 TiB 20.66 0.72 27 up osd.10
19 hdd 1.09160 1.00000 1.1 TiB 236 GiB 235 GiB 109 KiB 1024 MiB 882 GiB 21.12 0.74 14 up osd.19
20 hdd 1.09160 1.00000 1.1 TiB 391 GiB 390 GiB 272 KiB 1.2 GiB 727 GiB 34.97 1.22 23 up osd.20
24 hdd 1.09160 1.00000 1.1 TiB 365 GiB 363 GiB 288 KiB 1.3 GiB 753 GiB 32.62 1.14 22 up osd.24
25 hdd 1.09160 1.00000 1.1 TiB 357 GiB 356 GiB 200 KiB 1.0 GiB 761 GiB 31.91 1.12 20 up osd.25
31 hdd 1.81940 1.00000 1.8 TiB 282 GiB 281 GiB 674 KiB 1023 MiB 1.5 TiB 15.16 0.53 20 up osd.31
32 hdd 1.81940 1.00000 1.8 TiB 335 GiB 333 GiB 133 KiB 1.2 GiB 1.5 TiB 17.96 0.63 25 up osd.32
33 hdd 1.81940 1.00000 1.8 TiB 282 GiB 281 GiB 967 KiB 1023 MiB 1.5 TiB 15.13 0.53 21 up osd.33
34 hdd 1.81940 1.00000 1.8 TiB 349 GiB 348 GiB 1.1 MiB 1023 MiB 1.5 TiB 18.75 0.66 24 up osd.34
TOTAL 54 TiB 14 TiB 14 TiB 10 MiB 44 GiB 39 TiB 28.55
MIN/MAX VAR: 0.49/1.62 STDDEV: 8.47
And further you will need to definitely get an understanding of Ceph. Especially for these situations.
https://docs.ceph.com/en/nautilus/
https://pve.proxmox.com/pve-docs/chapter-pveceph.html
Ok so how do I correct it?
The problem for me is most guides don't go into the smaller details of the network config...
So let's use this as a bit of a learning exercise if possible...
What would we need to change in this scenario?
And more or less how can I go about doing so? Even just rough details will help. I've been breaking my head trying to figure that part out D: