[SOLVED] Ceph nautilus: rbd error: rbd: listing images failed: (2) No such file or directory (500)

tdoubleb

New Member
Jun 9, 2020
6
2
3
41
Hi,

I have a Proxmox ve 6.2-6 Cluster with ceph 14.2.9.

Since a day I become this error, when I opening my content of the ceph-pool with gui: rbd error: rbd: listing images failed: (2) No such file or directory (500)

Now I can't see my images and migration don't work. There is the same error, in migration-log, when I start a vm-migration with a disk-image in ceph. Because of this error migration don't work. My only change from default-ceph is mgr/balancer/mode: umap and a osd_memory_target to 1073741824.

The rest of ceph is working and other vm's are running. One HDD ist old and must be changed in the next days, but that's all. It's a low budget cluster, but with Proxmox ve 5.x it was running well for a long time. The new Cluster is a fresh Installation of all nodes (backup&restore all VM's, all nodes new installations and a new cluster with it) and not a update from Proxmox ve 5.x. At the beginning this error wasn't there. For now the nodes have only one LAN-Port for all. I now this is not the recommended setup, but it was ok for me before.

Thx for help. I like Proxmox ve.

Ceph-Config

Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.39.240/24
     fsid = eadb31f9-7901-4364-8ff7-aca21602326c
     mon_allow_pool_delete = true
     mon_host = 192.168.39.240 192.168.39.242 192.168.39.241 192.168.39.243 192.168.39.244
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.39.240/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.bp415-ga-1]
     public_addr = 192.168.39.240

[mon.bp415-ga-2]
     public_addr = 192.168.39.242

[mon.bp415-ga-3a]
     public_addr = 192.168.39.244

[mon.bp420-ga-1]
     public_addr = 192.168.39.241



[mon.bp420-ga-2]
     public_addr = 192.168.39.243


Crush Map

Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host bp415-ga-1 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 2.729
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 0.910
    item osd.1 weight 1.819
}
host bp415-ga-2 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 2.729
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 0.910
    item osd.3 weight 1.819
}
host bp415-ga-3a {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 2.729
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 1.819
    item osd.5 weight 0.910
}
host bp420-ga-1 {
    id -9        # do not change unnecessarily
    id -10 class hdd        # do not change unnecessarily
    # weight 0.910
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 0.910
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 9.097
    alg straw2
    hash 0    # rjenkins1
    item bp415-ga-1 weight 2.729
    item bp415-ga-2 weight 2.729
    item bp415-ga-3a weight 2.729
    item bp420-ga-1 weight 0.910
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 
Last edited:
In ceph-mgr-log I found this line

ceph-mgr[2995009]: 2020-06-25 00:00:00.995 7f0408b40700 -1 Fail to open '/proc/3616882/cmdline' error = (2) No such file or directory

And in ceph-mon-log:

ceph-mon[3757600]: 2020-06-25 09:31:17.537 7f51e2527700 -1 mon.bp415-ga-1@0(electing) e7 failed to get devid for : fallback method has serial ''but no model
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!