Problem with VM Disks list in Ceph pool

Oct 23, 2020
83
3
13
31
Hi guys! Today I noticed a problem with displaying the list of disks. I got error rbd error: rbd: listing images failed: (2) No such file or directory (500)
ceph -s:
Code:
cluster:
    id:     77161c77-31b0-4f07-a29d-d65f7bd6e18e
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum 220pve01,2530pve01,2530pve02 (age 2w)
    mgr: 220pve01(active, since 6w)
    osd: 30 osds: 30 up (since 2w), 30 in (since 4M)

  data:
    pools:   2 pools, 1025 pgs
    objects: 1.74M objects, 6.5 TiB
    usage:   20 TiB used, 13 TiB / 33 TiB avail
    pgs:     1025 active+clean

   io:
    client:   0 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr

Am I right that I have to copy /etc/pve/priv/ceph.client.admin.keyring to all nodes? Because I have this file only on 1 node
 
Last edited:
Am I right that I have to copy /etc/pve/priv/ceph.client.admin.keyring to all nodes? Because I have this file only on 1 node
That seems wrong. If the nodes are part of the same Proxmox VE cluster, then this file should be present on all nodes. If it is not, then check the status of your Proxmox VE cluster.
 
That seems wrong. If the nodes are part of the same Proxmox VE cluster, then this file should be present on all nodes. If it is not, then check the status of your Proxmox VE cluster.
Code:
pvecm status
Cluster information
-------------------
Name:             pve-int01
Config Version:   11
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Sep 13 09:16:42 2022
Quorum provider:  corosync_votequorum
Nodes:            9
Node ID:          0x00000004
Ring ID:          1.bc3
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      9
Quorum:           5
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.16.133.134
0x00000002          1 172.16.133.135
0x00000003          1 172.16.133.136
0x00000004          1 172.16.133.137 (local)
0x00000005          1 172.16.133.138
0x00000006          1 172.16.133.139
0x00000007          1 172.16.133.153
0x00000008          1 172.16.133.154
0x00000009          1 172.16.133.156

@aaron Looks like that everything ok. Manual presenting file is it correct way or there are other solutions?
 
Last edited:
Back to the original error:
rbd error: rbd: listing images failed: (2) No such file or directory (500)
Where exactly do you encounter it?

Manual presenting file is it correct way or there are other solutions?
All files in the /etc/pve directory should be available on all cluster nodes. If you create a file in there, it should show up on all other nodes right away.

Does the node where you run into the error, have any Ceph services setup / installed? Then you should see a symlink of the file /etc/ceph/ceph.conf pointing to /etc/pve/ceph.conf.

You can check that by running ls -la /etc/ceph.
 
Where exactly do you encounter it?
In WEB interface, select Ceph pool under host, the select VM Disks
Does the node where you run into the error, have any Ceph services setup / installed?
Yes, Ceph services already installed on all nodes in cluster.

FYI. In cluster 9 nodes, 4 nodes has disks and ceph installed, another 5 nodes are blade servers which use ceph storage.

Then you should see a symlink of the file
I see a symlynk only on node which has ceph.client.admin.keyring
 

Attachments

  • Безымянный.jpg
    Безымянный.jpg
    85.1 KB · Views: 10
Okay, so the other 5 Nodes never got the Ceph packages installed or any Ceph service configured?

In that case, I suspect that they are missing the symlink from /etc/ceph/ceph.conf -> /etc/pve/ceph.conf.

Can you send the output of the following commands from a node where you have the problem and from a node where things work?
Code:
ls -la /etc/ceph
ls -la /etc/pve/ceph.conf
apt list ceph-mds

Unrelated to the current problem, you should set up more MGRs because from the output of the first post, I can see that there is only one. You want at least one or two more that can take over, in case the node with the current MGR fails.
 
5 Nodes never got the Ceph packages installed or any Ceph service configured?
No, all 9 nodes in cluster have already installed Ceph packages

from a node where you have the problem and from a node where things work?
VM Disks list not working in GUI at all nodes. But from cli I can get list of all disks located on Ceph

Output from random node
Code:
root@220pve01:~# ls -la /etc/ceph
total 12
drwxr-xr-x  2 ceph ceph 4096 Jul 28 18:32 .
drwxr-xr-x 98 root root 4096 Aug  9 17:42 ..
lrwxrwxrwx  1 root root   18 Nov 29  2021 ceph.conf -> /etc/pve/ceph.conf
-rw-r--r--  1 root root   92 May 27  2021 rbdmap
root@220pve01:~# ls -la /etc/pve/ceph.conf
-rw-r----- 1 root www-data 652 Mar 24 16:14 /etc/pve/ceph.conf
root@220pve01:~# apt list ceph-mds
Listing... Done
ceph-mds/stable,now 16.2.9-pve1 amd64 [installed]
N: There are 5 additional versions. Please use the '-a' switch to see them.
root@ala220pve01:~#

Output from node with keyring file (only 1 node has keyring file)
Code:
root@2530pve02:~# ls -la /etc/ceph
total 16
drwxr-xr-x  2 ceph ceph 4096 Jul 28 19:18 .
drwxr-xr-x 98 root root 4096 Aug  9 17:42 ..
-rw-------  1 ceph ceph  151 Nov 29  2021 ceph.client.admin.keyring
lrwxrwxrwx  1 root root   18 Nov 29  2021 ceph.conf -> /etc/pve/ceph.conf
-rw-r--r--  1 root root   92 May 27  2021 rbdmap
root@2530pve02:~# ls -la /etc/pve/ceph.conf
-rw-r----- 1 root www-data 652 Mar 24 16:14 /etc/pve/ceph.conf
root@2530pve02:~# apt list ceph-mds
Listing... Done
ceph-mds/stable,now 16.2.9-pve1 amd64 [installed]
N: There are 5 additional versions. Please use the '-a' switch to see them.

About additional MGR good advice, thanks
 
Last edited:
Code:
root@2530pve02:~# ls -la /etc/ceph
total 16
drwxr-xr-x  2 ceph ceph 4096 Jul 28 19:18 .
drwxr-xr-x 98 root root 4096 Aug  9 17:42 ..
-rw-------  1 ceph ceph  151 Nov 29  2021 ceph.client.admin.keyring
lrwxrwxrwx  1 root root   18 Nov 29  2021 ceph.conf -> /etc/pve/ceph.conf

Okay, I see where the misunderstanding was :)

Check the contents of the ceph.conf file. There should be a section like this:
Code:
[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

It will point the Ceph clients to the /etc/pve/priv directory to look for the keyring file. And in there you should see the ceph.client.admin.keyring file on all nodes.
If the keyring file is not located there, then move it there from the /etc/ceph directory on node 2530pve02.
 
I have resolved my problem. After copying ceph.client.admin.keyring on all nodes, I started comparing rbd ls -l Ceph_Pool and VM's in GUI and found 1 VM which configured HDD on Ceph, but in Ceph it wasn't in list. After deleting this VM problem solved.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!