Hi,
I am after some advice on the best way to expand our ceph pool. Some steps have already been undertaken, but I need to pause until I understand what to do next.
Initially we had a proxmox ceph cluster with 4 nodes each with 4 x 1TB SSD OSD. I have since added a 5th node with 6 x 1TB SSD OSD and have now added 2 extra OSD to the inital 4 nodes.
So now there are 5 x nodes each with 6 x OSD.
Autoscaler is enabled, but to me it looks like the number of PGS is too low?
>ceph osd pool autoscale-status
>ceph osd df tree
The %USE varies too much, which I guess is due to the low PG per OSD?
NOTE: The version of ceph is Octopus and plans to upgrade will be taken at a later date.
What I am aiming for is to increase the replicas count to 3 (The extra OSDs and node were put in place to accomodate the extra headroom neaded for the higher replicas count). Should I do that first or adjust the number of PG before hand to something higher?
Octopus doesn't support the bulk flag I know, should I adjust the Target Ratio of say the 'ceph-vm' pool to be something like 0.8 before I do anything in the hopes that the autoscaler corrects the PGs? This pool indeed does host the majority of our data, the other pools are relatively unused.
If it is the case that PGs are too low at the moment, I understand that manually increasing the PG count to something like 1024 will trigger an intensive process of splitting existing PGs into small chunks. Is it better do do this before there are 3 copies of each PG? I am trying to minimise the amount of time that IO will be stressed.
If I do not touch the PG count and instead increase the repelicas count to 3, do I run the risk of the process increasing the PG count anyway while at the same time trying to to create the additional copy of the PG? I worry that this will reduce IO performance to client for a prolonged period of time.
Any advice offered would be appreciated.
Cheers,
Brad
I am after some advice on the best way to expand our ceph pool. Some steps have already been undertaken, but I need to pause until I understand what to do next.
Initially we had a proxmox ceph cluster with 4 nodes each with 4 x 1TB SSD OSD. I have since added a 5th node with 6 x 1TB SSD OSD and have now added 2 extra OSD to the inital 4 nodes.
So now there are 5 x nodes each with 6 x OSD.
Autoscaler is enabled, but to me it looks like the number of PGS is too low?
>ceph osd pool autoscale-status
Code:
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
device_health_metrics 44139k 2.0 27787G 0.0000 1.0 1 on
ceph-vm 6303G 2.0 27787G 0.4537 1.0 512 on
cephfs_data 18235M 2.0 27787G 0.0013 1.0 32 on
cephfs_metadata 193.7M 2.0 27787G 0.0000 4.0 32 on
>ceph osd df tree
Code:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME
-1 27.13626 - 27 TiB 8.7 TiB 8.7 TiB 1.3 GiB 44 GiB 18 TiB 32.12 1.00 - root default
-3 5.45517 - 5.5 TiB 1.8 TiB 1.8 TiB 257 MiB 9.1 GiB 3.6 TiB 33.47 1.04 - host vhs0
0 ssd 0.90919 1.00000 931 GiB 365 GiB 363 GiB 51 MiB 1.8 GiB 566 GiB 39.16 1.22 48 up osd.0
1 ssd 0.90919 1.00000 931 GiB 287 GiB 286 GiB 44 MiB 1.5 GiB 644 GiB 30.86 0.96 46 up osd.1
2 ssd 0.90919 1.00000 931 GiB 248 GiB 247 GiB 40 MiB 1.4 GiB 683 GiB 26.66 0.83 34 up osd.2
3 ssd 0.90919 1.00000 931 GiB 257 GiB 255 GiB 40 MiB 1.4 GiB 674 GiB 27.57 0.86 40 up osd.3
21 ssd 0.90919 1.00000 931 GiB 341 GiB 340 GiB 40 MiB 1.3 GiB 590 GiB 36.67 1.14 37 up osd.21
26 ssd 0.90919 1.00000 931 GiB 372 GiB 370 GiB 42 MiB 1.6 GiB 559 GiB 39.91 1.24 45 up osd.26
-5 5.45517 - 5.5 TiB 1.7 TiB 1.7 TiB 244 MiB 8.8 GiB 3.7 TiB 31.49 0.98 - host vhs1
4 ssd 0.90919 1.00000 931 GiB 234 GiB 233 GiB 36 MiB 1.5 GiB 697 GiB 25.14 0.78 40 up osd.4
5 ssd 0.90919 1.00000 931 GiB 290 GiB 288 GiB 44 MiB 1.6 GiB 641 GiB 31.14 0.97 43 up osd.5
6 ssd 0.90919 1.00000 931 GiB 213 GiB 212 GiB 34 MiB 1.3 GiB 718 GiB 22.92 0.71 30 up osd.6
7 ssd 0.90919 1.00000 931 GiB 317 GiB 315 GiB 48 MiB 1.6 GiB 614 GiB 34.00 1.06 45 up osd.7
20 ssd 0.90919 1.00000 931 GiB 331 GiB 329 GiB 39 MiB 1.4 GiB 600 GiB 35.54 1.11 37 up osd.20
25 ssd 0.90919 1.00000 931 GiB 374 GiB 373 GiB 42 MiB 1.4 GiB 557 GiB 40.20 1.25 42 up osd.25
-11 5.45819 - 5.5 TiB 1.8 TiB 1.8 TiB 285 MiB 8.2 GiB 3.6 TiB 33.18 1.03 - host vhs11
16 ssd 0.90970 1.00000 932 GiB 310 GiB 309 GiB 40 MiB 1.3 GiB 622 GiB 33.27 1.04 32 up osd.16
17 ssd 0.90970 1.00000 932 GiB 271 GiB 270 GiB 44 MiB 1.4 GiB 660 GiB 29.14 0.91 33 up osd.17
18 ssd 0.90970 1.00000 932 GiB 320 GiB 318 GiB 59 MiB 1.4 GiB 612 GiB 34.33 1.07 38 up osd.18
19 ssd 0.90970 1.00000 932 GiB 338 GiB 337 GiB 53 MiB 1.5 GiB 593 GiB 36.29 1.13 38 up osd.19
23 ssd 0.90970 1.00000 932 GiB 314 GiB 312 GiB 47 MiB 1.3 GiB 618 GiB 33.69 1.05 37 up osd.23
28 ssd 0.90970 1.00000 932 GiB 302 GiB 300 GiB 41 MiB 1.2 GiB 630 GiB 32.37 1.01 39 up osd.28
-7 5.45517 - 5.5 TiB 1.7 TiB 1.7 TiB 262 MiB 9.0 GiB 3.7 TiB 31.48 0.98 - host vhs2
8 ssd 0.90919 1.00000 931 GiB 272 GiB 271 GiB 41 MiB 1.6 GiB 659 GiB 29.27 0.91 37 up osd.8
9 ssd 0.90919 1.00000 931 GiB 266 GiB 265 GiB 41 MiB 1.7 GiB 665 GiB 28.62 0.89 42 up osd.9
10 ssd 0.90919 1.00000 931 GiB 236 GiB 235 GiB 39 MiB 1.5 GiB 695 GiB 25.37 0.79 34 up osd.10
11 ssd 0.90919 1.00000 931 GiB 237 GiB 235 GiB 36 MiB 1.3 GiB 694 GiB 25.43 0.79 34 up osd.11
22 ssd 0.90919 1.00000 931 GiB 423 GiB 421 GiB 51 MiB 1.6 GiB 508 GiB 45.44 1.41 47 up osd.22
27 ssd 0.90919 1.00000 931 GiB 324 GiB 322 GiB 55 MiB 1.3 GiB 607 GiB 34.76 1.08 37 up osd.27
-9 5.31256 - 5.3 TiB 1.6 TiB 1.6 TiB 286 MiB 9.1 GiB 3.7 TiB 30.96 0.96 - host vhs8
12 ssd 0.87329 1.00000 894 GiB 278 GiB 276 GiB 52 MiB 2.0 GiB 616 GiB 31.12 0.97 37 up osd.12
13 ssd 0.87329 1.00000 894 GiB 266 GiB 264 GiB 66 MiB 1.5 GiB 629 GiB 29.69 0.92 35 up osd.13
14 ssd 0.87329 1.00000 894 GiB 266 GiB 265 GiB 54 MiB 1.7 GiB 628 GiB 29.78 0.93 37 up osd.14
15 ssd 0.87329 1.00000 894 GiB 254 GiB 253 GiB 34 MiB 1.4 GiB 640 GiB 28.43 0.89 32 up osd.15
24 ssd 0.90970 1.00000 932 GiB 291 GiB 290 GiB 36 MiB 1.3 GiB 640 GiB 31.26 0.97 37 up osd.24
29 ssd 0.90970 1.00000 932 GiB 328 GiB 327 GiB 44 MiB 1.3 GiB 603 GiB 35.26 1.10 41 up osd.29
TOTAL 27 TiB 8.7 TiB 8.7 TiB 1.3 GiB 44 GiB 18 TiB 32.12
MIN/MAX VAR: 0.71/1.41 STDDEV: 5.05
The %USE varies too much, which I guess is due to the low PG per OSD?
NOTE: The version of ceph is Octopus and plans to upgrade will be taken at a later date.
What I am aiming for is to increase the replicas count to 3 (The extra OSDs and node were put in place to accomodate the extra headroom neaded for the higher replicas count). Should I do that first or adjust the number of PG before hand to something higher?
Octopus doesn't support the bulk flag I know, should I adjust the Target Ratio of say the 'ceph-vm' pool to be something like 0.8 before I do anything in the hopes that the autoscaler corrects the PGs? This pool indeed does host the majority of our data, the other pools are relatively unused.
If it is the case that PGs are too low at the moment, I understand that manually increasing the PG count to something like 1024 will trigger an intensive process of splitting existing PGs into small chunks. Is it better do do this before there are 3 copies of each PG? I am trying to minimise the amount of time that IO will be stressed.
If I do not touch the PG count and instead increase the repelicas count to 3, do I run the risk of the process increasing the PG count anyway while at the same time trying to to create the additional copy of the PG? I worry that this will reduce IO performance to client for a prolonged period of time.
Any advice offered would be appreciated.
Cheers,
Brad