Hi
I'm trying to mount a HA ceph storage. I have 3 nodes with Proxmox 5.2 with ceph Luminous. Nodes 1 & 2 have 5 local disks used as Osd's, node 3 have no disks. I created a pool with size=2 max=3 and pg=256 and all runs smoothly when all nodes are online.
When I reboot a node for maintenance, ceph is getting messages blocking slow requests and all VM running stopped until node is fully recovered. I thought that ceph would serve requests from remaining nodes but no way.
Thanks in advance!
This is my config created by ProxMox:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host rafel {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 4.093
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.910
item osd.1 weight 0.910
item osd.2 weight 0.910
item osd.3 weight 0.910
item osd.9 weight 0.455
}
host tomeu {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 4.093
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.910
item osd.5 weight 0.910
item osd.6 weight 0.910
item osd.7 weight 0.910
item osd.8 weight 0.455
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 8.186
alg straw2
hash 0 # rjenkins1
item rafel weight 4.093
item tomeu weight 4.093
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Logs
I'm trying to mount a HA ceph storage. I have 3 nodes with Proxmox 5.2 with ceph Luminous. Nodes 1 & 2 have 5 local disks used as Osd's, node 3 have no disks. I created a pool with size=2 max=3 and pg=256 and all runs smoothly when all nodes are online.
When I reboot a node for maintenance, ceph is getting messages blocking slow requests and all VM running stopped until node is fully recovered. I thought that ceph would serve requests from remaining nodes but no way.
Thanks in advance!
This is my config created by ProxMox:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host rafel {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 4.093
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.910
item osd.1 weight 0.910
item osd.2 weight 0.910
item osd.3 weight 0.910
item osd.9 weight 0.455
}
host tomeu {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 4.093
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.910
item osd.5 weight 0.910
item osd.6 weight 0.910
item osd.7 weight 0.910
item osd.8 weight 0.455
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 8.186
alg straw2
hash 0 # rjenkins1
item rafel weight 4.093
item tomeu weight 4.093
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Logs