[SOLVED] Ceph Overlapping Roots

willybong

Member
Apr 22, 2020
28
4
8
Hi,
I notice an error inside my ceph-mgr log.

Code:
2022-02-02T15:51:21.405+0100 7f13aaf3e700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-2, -1}
2022-02-02T15:52:21.413+0100 7f13aaf3e700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-2, -1}
2022-02-02T15:52:22.657+0100 7f13aff48700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-2, -1}
2022-02-02T15:53:21.421+0100 7f13aaf3e700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-2, -1}
2022-02-02T15:54:21.429+0100 7f13aaf3e700  0 [pg_autoscaler ERROR root] pool 13 has overlapping roots: {-2, -1}

What does it mean?

Many thanks
 
Did you apply any changes to the crushmap?

If you go to Ceph -> Configuration in the GUI it will show the crushmap on the right side. Could you post it here?
 
Hi aaron,

the crashmap below

Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class ssd
device 17 osd.17 class ssd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pve1 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    id -5 class ssd        # do not change unnecessarily
    # weight 12.590
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.819
    item osd.1 weight 1.819
    item osd.2 weight 1.819
    item osd.3 weight 1.819
    item osd.4 weight 1.747
    item osd.5 weight 1.747
    item osd.18 weight 1.819
}
host pve2 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    id -9 class ssd        # do not change unnecessarily
    # weight 12.590
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 1.819
    item osd.7 weight 1.819
    item osd.8 weight 1.819
    item osd.9 weight 1.819
    item osd.10 weight 1.747
    item osd.11 weight 1.747
    item osd.19 weight 1.819
}
host pve3 {
    id -10        # do not change unnecessarily
    id -11 class hdd        # do not change unnecessarily
    id -12 class ssd        # do not change unnecessarily
    # weight 12.590
    alg straw2
    hash 0    # rjenkins1
    item osd.12 weight 1.819
    item osd.13 weight 1.819
    item osd.14 weight 1.819
    item osd.15 weight 1.819
    item osd.16 weight 1.747
    item osd.17 weight 1.747
    item osd.20 weight 1.819
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    # weight 37.771
    alg straw2
    hash 0    # rjenkins1
    item pve1 weight 12.590
    item pve2 weight 12.590
    item pve3 weight 12.590
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule atlante_hdd {
    id 1
    type replicated
    min_size 1
    max_size 10
    step take default class hdd
    step chooseleaf firstn 0 type host
    step emit
}
rule atlante_ssd {
    id 2
    type replicated
    min_size 1
    max_size 10
    step take default class ssd
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 
Thank you. You do have additional rules to place pools only on HDDs or SSDs.

I assume that you have at least one pool that still has the "replicated_rule" assigned, which does not make a distinction between the device class of the OSDs.

This is why you see this error. The autoscaler cannot decide how many PGs the pools need. Make sure that all pools are assigned a rule that limit them to a device class and the errors should stop.
 
It works now..
I had the device_health_metrics with the default "replicated_rule". I changed it and everything has solved.
Thank you aaron.

Bye
 
  • Like
Reactions: aaron
Good to hear. I went ahead and marked the thread as solved. :)
 
  • Like
Reactions: willybong

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!