Not solved: Created new class and crush rules now under the pool can't see VM and CT disks

sarsenal

Renowned Member
Mar 5, 2016
25
3
68
50
I am moving to SSDs for my cluster and created a new vm-ssd pool with crush rule vm-ssd and class ssd. I already, class hdd, crush rule vm-hdd ( created) and pool Universe (was done at install) Yes I updated the pull and osd's to use the correct vm-hdd crush rule and hdd class. I am working to move the disks over one by one letting the system rebalance in-between each disk change. I have not moved any images into the pool so all should be in pool Universe for now. But I noticed after the last change I moved Pool Universe to its own crush rule vm-hdd instead of using the default. I have done this on other nodes (different cluster) no issue. This time I noticed I can't see the VM/CT disks under the storage pool now I just see an error. Does this just take time to balance before it is seen? I can move VM/CT between nodes. They work, just in the storage pool I can't see the VM/CT disks listed the summary shows data there. So just hoping it is a balance issue.

Error:
bd error: rbd: listing images failed: (2) No such file or directory (500)

ceph status: (the server was rebooted so it flagged it as crashed)
cluster:
Code:
    id:     55491277-9545-49b0-a9ba-092db3b67887
    health: HEALTH_WARN
            1 daemons have recently crashed
 
  services:
    mon: 3 daemons, quorum save1,save2,save3 (age 6h)
    mgr: save3(active, since 21h), standbys: save2, save1
    mds: 1/1 daemons up, 2 standby
    osd: 26 osds: 26 up (since 6h), 26 in (since 6h); 255 remapped pgs
 
  data:
    volumes: 1/1 healthy
    pools:   5 pools, 417 pgs
    objects: 774.10k objects, 2.8 TiB
    usage:   8.9 TiB used, 39 TiB / 48 TiB avail
    pgs:     1057951/2322303 objects misplaced (45.556%)
             162 active+clean
             161 active+clean+remapped
             92  active+remapped+backfill_wait
             2   active+remapped+backfilling
 
  io:
    client:   35 MiB/s rd, 1.6 MiB/s wr, 83 op/s rd, 129 op/s wr
    recovery: 20 MiB/s, 5 objects/s
 
  progress:
    Global Recovery Event (6h)
      [=====================.......] (remaining: 118m)


My crush map is:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class ssd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class hdd
device 23 osd.23 class hdd
device 24 osd.24 class hdd
device 25 osd.25 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host save1 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        id -9 class ssd         # do not change unnecessarily
        # weight 12.939
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 1.092
        item osd.1 weight 1.092
        item osd.2 weight 1.092
        item osd.3 weight 1.092
        item osd.4 weight 1.092
        item osd.5 weight 1.092
        item osd.6 weight 1.092
        item osd.7 weight 1.092
        item osd.8 weight 1.092
        item osd.9 weight 1.092
        item osd.10 weight 1.092
        item osd.11 weight 0.931
}
host save2 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        id -10 class ssd                # do not change unnecessarily
        # weight 13.099
        alg straw2
        hash 0  # rjenkins1
        item osd.12 weight 1.092
        item osd.13 weight 1.092
        item osd.14 weight 1.092
        item osd.15 weight 1.092
        item osd.16 weight 1.092
        item osd.17 weight 1.092
        item osd.18 weight 1.092
        item osd.19 weight 1.092
        item osd.20 weight 1.092
        item osd.21 weight 1.092
        item osd.22 weight 1.092
        item osd.23 weight 1.092
}
host save3 {
        id -7           # do not change unnecessarily
        id -8 class hdd         # do not change unnecessarily
        id -11 class ssd                # do not change unnecessarily
        # weight 21.828
        alg straw2
        hash 0  # rjenkins1
        item osd.24 weight 10.914
        item osd.25 weight 10.914
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        id -12 class ssd                # do not change unnecessarily
        # weight 47.866
        alg straw2
        hash 0  # rjenkins1
        item save1 weight 12.939
        item save2 weight 13.099
        item save3 weight 21.828
}

# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule vm-ssd {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take default class ssd
        step chooseleaf firstn 0 type host
        step emit
}
rule vm-hdd {
        id 2
        type replicated
        min_size 1
        max_size 10
        step take default class hdd
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

Thanks!
 
Last edited:
Any idea on this? I am still having this issue and can't find a solution or cause?

I tried to move back to the default crush rule and it had no change. for the Ceph pool I still can't see CT or VM images under the storage device. I can from the command line. Is this a bug?
 
Last edited:
Do you see an error if you run rbd ls? There might be an image that is missing and it might be shown in the error message.

Also, please paste config files and such infos inside [code][/code] tags. That makes reading them a lot easier :)
 
rbd ls
rbd: error opening default pool 'rbd'
Ensure that the default pool has been created or specify an alternate pool name.
rbd: listing images failed: (2) No such file or directory

But if I use the pool name Universe it works. So I figure it just defaulting to the wrong pool, but not sure where to change it.

I added: rbd_default_pool = Universe to the ceph.conf file and now the command line works correctly, but not in the PVE gui under Universe storage under VM or CT disks. It still gives me: rbd error: rbd: listing images failed: (2) No such file or directory (500)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!