Hi Everybody,
this story might sound familiar to a similar one I posted recently, but I think, the circumstances are different:
I have a 3-node cluster, which's PGs are unequally distributed across the available nodes.
My suspicion rose, while updating the cluster from 15.x to 16.x. When I rebooted the nodes one after the other, sometimes the cluster went unresponsive.
Currently the status is OK, all nodes are up.
How ever, I investigated little further, and found out, that my PGs are not evenly distributed thought out all cluster nodes.
To confirm my suspicion, I replaced the OSD-NUMs with fictional node-names (01-03) to get a better overview of what's happening:
The result looks then like this:
As one can see, the majority of the PGs are all stored on one single Node instead of beeing distributed on all 3 nodes equally.
Here is my crushmap:
Here my OSDs:
I'd be very greatfull, if anyone could help me with this one!
regards and nice Sunday,
Felix
this story might sound familiar to a similar one I posted recently, but I think, the circumstances are different:
I have a 3-node cluster, which's PGs are unequally distributed across the available nodes.
My suspicion rose, while updating the cluster from 15.x to 16.x. When I rebooted the nodes one after the other, sometimes the cluster went unresponsive.
Currently the status is OK, all nodes are up.
How ever, I investigated little further, and found out, that my PGs are not evenly distributed thought out all cluster nodes.
To confirm my suspicion, I replaced the OSD-NUMs with fictional node-names (01-03) to get a better overview of what's happening:
Code:
ceph pg dump all|cut -d [ -f 2| cut -d ] -f 1| sed -e"s/\b[0-7]\b/01/g;s/\b[89]\b\|\b10\b\|\b11\b\|\b20\b\|\b21\b\|\b23\b/02/g;s/\b1[2-9]\b/03/g" | grep -v -e "01,02,03\|02,01,03\|03,01,02"
The result looks then like this:
As one can see, the majority of the PGs are all stored on one single Node instead of beeing distributed on all 3 nodes equally.
Code:
01,01,01
02,01,01
01,01,03
01,03,03
03,01,01
01,01,03
03,01,01
02,03,02
01,03,02
01,03,01
01,02,02
02,01,01
01,03,01
01,03,02
03,02,03
03,02,01
03,03,03
01,01,03
02,01,01
02,02,01
03,02,01
03,02,02
03,02,01
01,02,02
03,02,01
03,02,01
01,03,01
03,03,02
03,03,02
01,03,03
03,01,01
03,03,02
03,03,02
01,03,03
02,03,02
01,03,02
01,01,03
01,01,03
....
Here is my crushmap:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 23 osd.23 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host srv-virt-01 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 8.736
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.092
item osd.1 weight 1.092
item osd.2 weight 1.092
item osd.3 weight 1.092
item osd.4 weight 1.092
item osd.5 weight 1.092
item osd.6 weight 1.092
item osd.7 weight 1.092
}
host srv-virt-02 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 7.644
alg straw2
hash 0 # rjenkins1
item osd.8 weight 1.092
item osd.9 weight 1.092
item osd.10 weight 1.092
item osd.11 weight 1.092
item osd.20 weight 1.092
item osd.21 weight 1.092
item osd.23 weight 1.092
}
host srv-virt-03 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 8.736
alg straw2
hash 0 # rjenkins1
item osd.12 weight 1.092
item osd.13 weight 1.092
item osd.14 weight 1.092
item osd.17 weight 1.092
item osd.18 weight 1.092
item osd.19 weight 1.092
item osd.15 weight 1.092
item osd.16 weight 1.092
}
root default {
id -1 # do not change unnecessarily
id -9 class hdd # do not change unnecessarily
# weight 25.112
alg straw2
hash 0 # rjenkins1
item srv-virt-01 weight 8.733
item srv-virt-02 weight 7.644
item srv-virt-03 weight 8.735
}
root production {
id -20 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 9.828
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.092
item osd.1 weight 1.092
item osd.3 weight 1.092
item osd.9 weight 1.092
item osd.10 weight 1.092
item osd.11 weight 1.092
item osd.12 weight 1.092
item osd.14 weight 1.092
item osd.17 weight 1.092
}
root backup {
id -30 # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 15.288
alg straw2
hash 0 # rjenkins1
item osd.2 weight 1.092
item osd.4 weight 1.092
item osd.5 weight 1.092
item osd.6 weight 1.092
item osd.7 weight 1.092
item osd.8 weight 1.092
item osd.13 weight 1.092
item osd.15 weight 1.092
item osd.16 weight 1.092
item osd.18 weight 1.092
item osd.19 weight 1.092
item osd.20 weight 1.092
item osd.21 weight 1.092
item osd.23 weight 1.092
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule production_pool {
id 1
type replicated
min_size 2
max_size 6
step take production
step chooseleaf firstn 0 type osd
step emit
}
rule backup_pool {
id 2
type replicated
min_size 2
max_size 3
step take backup
step chooseleaf firstn 0 type osd
step emit
}
# end crush map
Here my OSDs:
Code:
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-30 15.28793 root backup
2 hdd 1.09200 osd.2 up 1.00000 1.00000
4 hdd 1.09200 osd.4 up 1.00000 1.00000
5 hdd 1.09200 osd.5 up 1.00000 1.00000
6 hdd 1.09200 osd.6 up 1.00000 1.00000
7 hdd 1.09200 osd.7 up 1.00000 1.00000
8 hdd 1.09200 osd.8 up 1.00000 1.00000
13 hdd 1.09200 osd.13 up 1.00000 1.00000
15 hdd 1.09200 osd.15 up 1.00000 1.00000
16 hdd 1.09200 osd.16 up 1.00000 1.00000
18 hdd 1.09200 osd.18 up 1.00000 1.00000
19 hdd 1.09200 osd.19 up 1.00000 1.00000
20 hdd 1.09200 osd.20 up 1.00000 1.00000
21 hdd 1.09200 osd.21 up 1.00000 1.00000
23 hdd 1.09200 osd.23 up 1.00000 1.00000
-20 9.82796 root production
0 hdd 1.09200 osd.0 up 1.00000 1.00000
1 hdd 1.09200 osd.1 up 1.00000 1.00000
3 hdd 1.09200 osd.3 up 1.00000 1.00000
9 hdd 1.09200 osd.9 up 1.00000 1.00000
10 hdd 1.09200 osd.10 up 1.00000 1.00000
11 hdd 1.09200 osd.11 up 1.00000 1.00000
12 hdd 1.09200 osd.12 up 1.00000 1.00000
14 hdd 1.09200 osd.14 up 1.00000 1.00000
17 hdd 1.09200 osd.17 up 1.00000 1.00000
-1 25.11197 root default
-3 8.73299 host srv-virt-01
0 hdd 1.09200 osd.0 up 1.00000 1.00000
1 hdd 1.09200 osd.1 up 1.00000 1.00000
2 hdd 1.09200 osd.2 up 1.00000 1.00000
3 hdd 1.09200 osd.3 up 1.00000 1.00000
4 hdd 1.09200 osd.4 up 1.00000 1.00000
5 hdd 1.09200 osd.5 up 1.00000 1.00000
6 hdd 1.09200 osd.6 up 1.00000 1.00000
7 hdd 1.09200 osd.7 up 1.00000 1.00000
-5 7.64400 host srv-virt-02
8 hdd 1.09200 osd.8 up 1.00000 1.00000
9 hdd 1.09200 osd.9 up 1.00000 1.00000
10 hdd 1.09200 osd.10 up 1.00000 1.00000
11 hdd 1.09200 osd.11 up 1.00000 1.00000
20 hdd 1.09200 osd.20 up 1.00000 1.00000
21 hdd 1.09200 osd.21 up 1.00000 1.00000
23 hdd 1.09200 osd.23 up 1.00000 1.00000
-7 8.73499 host srv-virt-03
12 hdd 1.09200 osd.12 up 1.00000 1.00000
13 hdd 1.09200 osd.13 up 1.00000 1.00000
14 hdd 1.09200 osd.14 up 1.00000 1.00000
15 hdd 1.09200 osd.15 up 1.00000 1.00000
16 hdd 1.09200 osd.16 up 1.00000 1.00000
17 hdd 1.09200 osd.17 up 1.00000 1.00000
18 hdd 1.09200 osd.18 up 1.00000 1.00000
19 hdd 1.09200 osd.19 up 1.00000 1.00000
Code:
ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 hdd 1.09200 1.00000 1.1 TiB 589 GiB 587 GiB 96 MiB 2.0 GiB 528 GiB 52.73 1.10 62 up
4 hdd 1.09200 1.00000 1.1 TiB 587 GiB 585 GiB 191 MiB 2.0 GiB 531 GiB 52.53 1.10 68 up
5 hdd 1.09200 1.00000 1.1 TiB 507 GiB 505 GiB 139 MiB 1.5 GiB 611 GiB 45.35 0.95 57 up
6 hdd 1.09200 1.00000 1.1 TiB 579 GiB 577 GiB 180 MiB 2.0 GiB 538 GiB 51.83 1.08 66 up
7 hdd 1.09200 1.00000 1.1 TiB 508 GiB 506 GiB 188 MiB 1.8 GiB 610 GiB 45.47 0.95 58 up
8 hdd 1.09200 1.00000 1.1 TiB 569 GiB 566 GiB 197 MiB 2.0 GiB 549 GiB 50.87 1.06 65 up
13 hdd 1.09200 1.00000 1.1 TiB 569 GiB 567 GiB 175 MiB 1.8 GiB 549 GiB 50.89 1.06 64 up
15 hdd 1.09200 1.00000 1.1 TiB 499 GiB 496 GiB 222 MiB 2.1 GiB 619 GiB 44.62 0.93 59 up
16 hdd 1.09200 1.00000 1.1 TiB 568 GiB 566 GiB 144 MiB 2.3 GiB 550 GiB 50.81 1.06 65 up
18 hdd 1.09200 1.00000 1.1 TiB 510 GiB 508 GiB 257 MiB 1.4 GiB 608 GiB 45.60 0.95 60 up
19 hdd 1.09200 1.00000 1.1 TiB 558 GiB 557 GiB 109 MiB 1.8 GiB 559 GiB 49.96 1.04 63 up
20 hdd 1.09200 1.00000 1.1 TiB 498 GiB 496 GiB 146 MiB 1.7 GiB 620 GiB 44.54 0.93 55 up
21 hdd 1.09200 1.00000 1.1 TiB 528 GiB 526 GiB 247 MiB 1.9 GiB 590 GiB 47.26 0.99 63 up
23 hdd 1.09200 1.00000 1.1 TiB 588 GiB 586 GiB 74 MiB 2.0 GiB 530 GiB 52.57 1.10 61 up
0 hdd 1.09200 1.00000 1.1 TiB 498 GiB 496 GiB 11 KiB 1.9 GiB 620 GiB 44.53 0.93 82 up
1 hdd 1.09200 1.00000 1.1 TiB 492 GiB 490 GiB 7 KiB 2.1 GiB 626 GiB 44.03 0.92 81 up
3 hdd 1.09200 1.00000 1.1 TiB 491 GiB 489 GiB 3.2 MiB 2.1 GiB 627 GiB 43.89 0.92 82 up
9 hdd 1.09200 1.00000 1.1 TiB 545 GiB 543 GiB 10 KiB 1.9 GiB 573 GiB 48.74 1.02 90 up
10 hdd 1.09200 1.00000 1.1 TiB 545 GiB 543 GiB 9 KiB 1.7 GiB 573 GiB 48.73 1.02 90 up
11 hdd 1.09200 1.00000 1.1 TiB 527 GiB 525 GiB 12 KiB 2.0 GiB 590 GiB 47.18 0.98 87 up
12 hdd 1.09200 1.00000 1.1 TiB 523 GiB 521 GiB 16 KiB 2.0 GiB 595 GiB 46.78 0.98 86 up
14 hdd 1.09200 1.00000 1.1 TiB 504 GiB 502 GiB 13 KiB 1.8 GiB 614 GiB 45.05 0.94 83 up
17 hdd 1.09200 1.00000 1.1 TiB 536 GiB 534 GiB 8 KiB 1.6 GiB 582 GiB 47.91 1.00 88 up
TOTAL 25 TiB 12 TiB 12 TiB 2.3 GiB 43 GiB 13 TiB 47.91
I'd be very greatfull, if anyone could help me with this one!
regards and nice Sunday,
Felix
Last edited: