Hi,
i'm exeriencing a strange behaviour in pve@6.1-7/ceph@14.2.6:
I have a Lab setup with 5 physical Nodes, each with two OSDs.
This is the Ceph Config + Crushmap:
Config:
Crush:
So nothing fancy in here, all straight forward.
When I take down a first node now, it's two OSD's appear "DOWN", and some minutes later the manager marks it out and redistribution of data throughout the cluster is kicking in:
Nodes reweight set to 0 (OUT), all fine.
Second node gets shutdown, this is what the complete osd tree looks:
So one OSD of the down node get's OUT'ed, the other one won't ever. This inhibits selfhealing because redistribution of Objects on the OSD which wasn't OUT'ed won't ever kick in.
So i totally miss something here?
i'm exeriencing a strange behaviour in pve@6.1-7/ceph@14.2.6:
I have a Lab setup with 5 physical Nodes, each with two OSDs.
This is the Ceph Config + Crushmap:
Config:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.42.42.1/24
fsid = <redacted:fsid>
mon_allow_pool_delete = true
mon_host = 10.42.42.1 10.42.42.2 10.42.42.3 10.42.42.4 10.42.42.5
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.42.42.1/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
Crush:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host node1 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 0.232
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.116
item osd.1 weight 0.116
}
host node2 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 0.232
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.116
item osd.3 weight 0.116
}
host node3 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.232
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.116
item osd.5 weight 0.116
}
host node4 {
id -9 # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 0.232
alg straw2
hash 0 # rjenkins1
item osd.6 weight 0.116
item osd.7 weight 0.116
}
host node5 {
id -11 # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 0.232
alg straw2
hash 0 # rjenkins1
item osd.8 weight 0.116
item osd.9 weight 0.116
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 1.162
alg straw2
hash 0 # rjenkins1
item node1 weight 0.232
item node2 weight 0.232
item node3 weight 0.232
item node4 weight 0.232
item node5 weight 0.232
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
So nothing fancy in here, all straight forward.
When I take down a first node now, it's two OSD's appear "DOWN", and some minutes later the manager marks it out and redistribution of data throughout the cluster is kicking in:
Code:
-11 0.23239 host node5
8 hdd 0.11620 osd.8 down 0 1.00000
9 hdd 0.11620 osd.9 down 0 1.00000
Nodes reweight set to 0 (OUT), all fine.
Second node gets shutdown, this is what the complete osd tree looks:
Code:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.16196 root default
-3 0.23239 host node1
0 hdd 0.11620 osd.0 down 0 1.00000
1 hdd 0.11620 osd.1 down 1.00000 1.00000
-5 0.23239 host node2
2 hdd 0.11620 osd.2 up 1.00000 1.00000
3 hdd 0.11620 osd.3 up 1.00000 1.00000
-7 0.23239 host node3
4 hdd 0.11620 osd.4 up 1.00000 1.00000
5 hdd 0.11620 osd.5 up 1.00000 1.00000
-9 0.23239 host node4
6 hdd 0.11620 osd.6 up 1.00000 1.00000
7 hdd 0.11620 osd.7 up 1.00000 1.00000
-11 0.23239 host node5
8 hdd 0.11620 osd.8 down 0 1.00000
9 hdd 0.11620 osd.9 down 0 1.00000
So one OSD of the down node get's OUT'ed, the other one won't ever. This inhibits selfhealing because redistribution of Objects on the OSD which wasn't OUT'ed won't ever kick in.
So i totally miss something here?