Ceph Quincy: "ceph osd crush move ... root=default" -> rbd 100%

Jan 21, 2016
96
8
73
43
Germany
www.pug.org
Hello,

we upgraded our Nautilus till Quincy two weeks ago and wanted to get rid of old settings, which we have since Luminous or older: Splitted SSD / HDD, before Ceph had device-classes:

https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

We executed:


Code:
ceph osd crush move fc-r02-ceph-osd-01 root=default
...
ceph osd crush move fc-r02-ceph-osd-06 root=default

and checked, what happens .. but not deep enough .. as Ceph health was ok, but today I saw:

Code:
root@fc-r02-ceph-osd-01:[~]: ceph df 
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
nvme   1.8 TiB  786 GiB  1.1 TiB   1.1 TiB      57.80
ssd     21 TiB  9.8 TiB   11 TiB    11 TiB      53.86
TOTAL   23 TiB   11 TiB   13 TiB    13 TiB      54.17
 
--- POOLS ---
POOL      ID   PGS   STORED  OBJECTS     USED   %USED  MAX AVAIL
ssd-pool   1  2048  4.2 TiB    1.14M   12 TiB  100.00        0 B
db-pool    4   128   50 MiB        3  151 MiB  100.00        0 B
.mgr       5     1   43 MiB       12  130 MiB       0    2.4 TiB


root@fc-r02-ceph-osd-01:[~]: ceph -s
  cluster:
    id:     cfca8c93-f3be-4b86-b9cb-8da095ca2c26
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum fc-r02-ceph-osd-01,fc-r02-ceph-osd-02,fc-r02-ceph-osd-03,fc-r02-ceph-osd-05,fc-r02-ceph-osd-06 (age 2w)
    mgr: fc-r02-ceph-osd-06(active, since 2w), standbys: fc-r02-ceph-osd-02, fc-r02-ceph-osd-03, fc-r02-ceph-osd-01, fc-r02-ceph-osd-05, fc-r02-ceph-osd-04
    osd: 54 osds: 54 up (since 2w), 54 in (since 2w); 2176 remapped pgs
 
  data:
    pools:   3 pools, 2177 pgs
    objects: 1.14M objects, 4.3 TiB
    usage:   13 TiB used, 11 TiB / 23 TiB avail
    pgs:     5684530/3410754 objects misplaced (166.665%)
             2176 active+clean+remapped
             1    active+clean
 
  io:
    client:   906 KiB/s rd, 13 MiB/s wr, 38 op/s rd, 986 op/s wr

so, just move the bucket to default is not enough .. and we are unsure, how to fix it.

The Tree looks like this:

Code:
root@fc-r02-ceph-osd-01:[~]: ceph osd crush tree --show-shadow
ID   CLASS  WEIGHT    TYPE NAME                       
-39   nvme   1.81938  root default~nvme               
-30   nvme         0      host fc-r02-ceph-osd-01~nvme
-31   nvme   0.36388      host fc-r02-ceph-osd-02~nvme
 36   nvme   0.36388          osd.36                  
-32   nvme   0.36388      host fc-r02-ceph-osd-03~nvme
 40   nvme   0.36388          osd.40                  
-33   nvme   0.36388      host fc-r02-ceph-osd-04~nvme
 37   nvme   0.36388          osd.37                  
-34   nvme   0.36388      host fc-r02-ceph-osd-05~nvme
 38   nvme   0.36388          osd.38                  
-35   nvme   0.36388      host fc-r02-ceph-osd-06~nvme
 39   nvme   0.36388          osd.39                  
-38   nvme         0  root ssds~nvme                  
-37   nvme         0      datacenter fc-ssds~nvme     
-36   nvme         0          rack r02-ssds~nvme      
-29   nvme         0  root sata~nvme                  
-28   nvme         0      datacenter fc-sata~nvme     
-27   nvme         0          rack r02-sata~nvme      
-24    ssd         0  root ssds~ssd                   
-23    ssd         0      datacenter fc-ssds~ssd      
-21    ssd         0          rack r02-ssds~ssd       
-22    ssd         0  root sata~ssd                   
-19    ssd         0      datacenter fc-sata~ssd      
-20    ssd         0          rack r02-sata~ssd       
-14                0  root sata                       
-18                0      datacenter fc-sata          
-16                0          rack r02-sata           
-13                0  root ssds                       
-17                0      datacenter fc-ssds          
-15                0          rack r02-ssds           
 -4    ssd  22.17122  root default~ssd                
 -7    ssd   4.00145      host fc-r02-ceph-osd-01~ssd 
  0    ssd   0.45470          osd.0                   
  1    ssd   0.45470          osd.1                   
  2    ssd   0.45470          osd.2                   
  3    ssd   0.45470          osd.3                   
  4    ssd   0.45470          osd.4                   
  5    ssd   0.45470          osd.5                   
 41    ssd   0.36388          osd.41                  
 42    ssd   0.45470          osd.42                  
 48    ssd   0.45470          osd.48                  
 -3    ssd   3.61948      host fc-r02-ceph-osd-02~ssd 
  6    ssd   0.45470          osd.6                   
  7    ssd   0.45470          osd.7                   
  8    ssd   0.45470          osd.8                   
  9    ssd   0.45470          osd.9                   
 10    ssd   0.43660          osd.10                  
 29    ssd   0.45470          osd.29                  
 43    ssd   0.45470          osd.43                  
 49    ssd   0.45470          osd.49                  
 -8    ssd   3.63757      host fc-r02-ceph-osd-03~ssd 
 11    ssd   0.45470          osd.11                  
 12    ssd   0.45470          osd.12                  
 13    ssd   0.45470          osd.13                  
 14    ssd   0.45470          osd.14                  
 15    ssd   0.45470          osd.15                  
 16    ssd   0.45470          osd.16                  
 44    ssd   0.45470          osd.44                  
 50    ssd   0.45470          osd.50                  
-10    ssd   3.63757      host fc-r02-ceph-osd-04~ssd 
 30    ssd   0.45470          osd.30                  
 31    ssd   0.45470          osd.31                  
 32    ssd   0.45470          osd.32                  
 33    ssd   0.45470          osd.33                  
 34    ssd   0.45470          osd.34                  
 35    ssd   0.45470          osd.35                  
 45    ssd   0.45470          osd.45                  
 51    ssd   0.45470          osd.51                  
-12    ssd   3.63757      host fc-r02-ceph-osd-05~ssd 
 17    ssd   0.45470          osd.17                  
 18    ssd   0.45470          osd.18                  
 19    ssd   0.45470          osd.19                  
 20    ssd   0.45470          osd.20                  
 21    ssd   0.45470          osd.21                  
 22    ssd   0.45470          osd.22                  
 46    ssd   0.45470          osd.46                  
 52    ssd   0.45470          osd.52                  
-26    ssd   3.63757      host fc-r02-ceph-osd-06~ssd 
 23    ssd   0.45470          osd.23                  
 24    ssd   0.45470          osd.24                  
 25    ssd   0.45470          osd.25                  
 26    ssd   0.45470          osd.26                  
 27    ssd   0.45470          osd.27                  
 28    ssd   0.45470          osd.28                  
 47    ssd   0.45470          osd.47                  
 53    ssd   0.45470          osd.53                  
 -1         23.99060  root default                    
 -6          4.00145      host fc-r02-ceph-osd-01     
  0    ssd   0.45470          osd.0                   
  1    ssd   0.45470          osd.1                   
  2    ssd   0.45470          osd.2                   
  3    ssd   0.45470          osd.3                   
  4    ssd   0.45470          osd.4                   
  5    ssd   0.45470          osd.5                   
 41    ssd   0.36388          osd.41                  
 42    ssd   0.45470          osd.42                  
 48    ssd   0.45470          osd.48                  
 -2          3.98335      host fc-r02-ceph-osd-02     
 36   nvme   0.36388          osd.36                  
  6    ssd   0.45470          osd.6                   
  7    ssd   0.45470          osd.7                   
  8    ssd   0.45470          osd.8                   
  9    ssd   0.45470          osd.9                   
 10    ssd   0.43660          osd.10                  
 29    ssd   0.45470          osd.29                  
 43    ssd   0.45470          osd.43                  
 49    ssd   0.45470          osd.49                  
 -5          4.00145      host fc-r02-ceph-osd-03     
 40   nvme   0.36388          osd.40                  
 11    ssd   0.45470          osd.11                  
 12    ssd   0.45470          osd.12                  
 13    ssd   0.45470          osd.13                  
 14    ssd   0.45470          osd.14                  
 15    ssd   0.45470          osd.15                  
 16    ssd   0.45470          osd.16                  
 44    ssd   0.45470          osd.44                  
 50    ssd   0.45470          osd.50                  
 -9          4.00145      host fc-r02-ceph-osd-04     
 37   nvme   0.36388          osd.37                  
 30    ssd   0.45470          osd.30                  
 31    ssd   0.45470          osd.31                  
 32    ssd   0.45470          osd.32                  
 33    ssd   0.45470          osd.33                  
 34    ssd   0.45470          osd.34                  
 35    ssd   0.45470          osd.35                  
 45    ssd   0.45470          osd.45                  
 51    ssd   0.45470          osd.51                  
-11          4.00145      host fc-r02-ceph-osd-05     
 38   nvme   0.36388          osd.38                  
 17    ssd   0.45470          osd.17                  
 18    ssd   0.45470          osd.18                  
 19    ssd   0.45470          osd.19                  
 20    ssd   0.45470          osd.20                  
 21    ssd   0.45470          osd.21                  
 22    ssd   0.45470          osd.22                  
 46    ssd   0.45470          osd.46                  
 52    ssd   0.45470          osd.52                  
-25          4.00145      host fc-r02-ceph-osd-06     
 39   nvme   0.36388          osd.39                  
 23    ssd   0.45470          osd.23                  
 24    ssd   0.45470          osd.24                  
 25    ssd   0.45470          osd.25                  
 26    ssd   0.45470          osd.26                  
 27    ssd   0.45470          osd.27                  
 28    ssd   0.45470          osd.28                  
 47    ssd   0.45470          osd.47                  
 53    ssd   0.45470          osd.53

The rule:

Code:
root@fc-r02-ceph-osd-01:[~]: ceph osd pool get db-pool crush_rule
crush_rule: fc-r02-ssdpool

root@fc-r02-ceph-osd-01:[~]: ceph osd pool get ssd-pool crush_rule
crush_rule: fc-r02-ssdpool

The Crushmap

Code:
root@fc-r02-ceph-osd-01:[~]:  ceph osd crush rule dump
[
    {
        "rule_id": 0,
        "rule_name": "replicated_rule",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -1,
                "item_name": "default"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 1,
        "rule_name": "fc-r02-ssdpool",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -15,
                "item_name": "r02-ssds"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 2,
        "rule_name": "fc-r02-satapool",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -16,
                "item_name": "r02-sata"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    },
    {
        "rule_id": 3,
        "rule_name": "fc-r02-ssd",
        "type": 1,
        "steps": [
            {
                "op": "take",
                "item": -4,
                "item_name": "default~ssd"
            },
            {
                "op": "chooseleaf_firstn",
                "num": 0,
                "type": "host"
            },
            {
                "op": "emit"
            }
        ]
    }
]

Which steps are required to fix it and move back to the Ceph standards ? I assume, we need to take the datacenter offline, as it will for sure rebalance a lot. question is .. power off the VMs, or not .. as I fear for filesystem crashes.

Any help would be great !
 
Hi,

I've posted also to the Ceph Mailinglist and one answered me .. that:

Code:
ceph osd pool set db-pool  crush_rule fc-r02-ssd
ceph osd pool set ssd-pool  crush_rule fc-r02-ssd

will do the work .. because of "item_name": "default~ssd" ..

We will take most parts offline .. as I have no idea .. what will break.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!