Problem with VM Disks list in Ceph pool

lDemoNl · Sep 12, 2022

Hi guys! Today I noticed a problem with displaying the list of disks. I got error rbd error: rbd: listing images failed: (2) No such file or directory (500)
ceph -s:

Code:

cluster:
    id:     77161c77-31b0-4f07-a29d-d65f7bd6e18e
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum 220pve01,2530pve01,2530pve02 (age 2w)
    mgr: 220pve01(active, since 6w)
    osd: 30 osds: 30 up (since 2w), 30 in (since 4M)

  data:
    pools:   2 pools, 1025 pgs
    objects: 1.74M objects, 6.5 TiB
    usage:   20 TiB used, 13 TiB / 33 TiB avail
    pgs:     1025 active+clean

   io:
    client:   0 B/s rd, 0 B/s wr, 0 op/s rd, 0 op/s wr

Am I right that I have to copy /etc/pve/priv/ceph.client.admin.keyring to all nodes? Because I have this file only on 1 node

aaron · Sep 12, 2022

lDemoNl said:
Am I right that I have to copy /etc/pve/priv/ceph.client.admin.keyring to all nodes? Because I have this file only on 1 node

That seems wrong. If the nodes are part of the same Proxmox VE cluster, then this file should be present on all nodes. If it is not, then check the status of your Proxmox VE cluster.

lDemoNl · Sep 13, 2022

aaron said:
That seems wrong. If the nodes are part of the same Proxmox VE cluster, then this file should be present on all nodes. If it is not, then check the status of your Proxmox VE cluster.

Code:

pvecm status
Cluster information
-------------------
Name:             pve-int01
Config Version:   11
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Sep 13 09:16:42 2022
Quorum provider:  corosync_votequorum
Nodes:            9
Node ID:          0x00000004
Ring ID:          1.bc3
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      9
Quorum:           5
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.16.133.134
0x00000002          1 172.16.133.135
0x00000003          1 172.16.133.136
0x00000004          1 172.16.133.137 (local)
0x00000005          1 172.16.133.138
0x00000006          1 172.16.133.139
0x00000007          1 172.16.133.153
0x00000008          1 172.16.133.154
0x00000009          1 172.16.133.156

@aaron Looks like that everything ok. Manual presenting file is it correct way or there are other solutions?

aaron · Sep 15, 2022

Back to the original error:

lDemoNl said:
rbd error: rbd: listing images failed: (2) No such file or directory (500)

Where exactly do you encounter it?

lDemoNl said:
Manual presenting file is it correct way or there are other solutions?

All files in the /etc/pve directory should be available on all cluster nodes. If you create a file in there, it should show up on all other nodes right away.

Does the node where you run into the error, have any Ceph services setup / installed? Then you should see a symlink of the file /etc/ceph/ceph.conf pointing to /etc/pve/ceph.conf.

You can check that by running ls -la /etc/ceph.

lDemoNl · Sep 15, 2022

aaron said:
Where exactly do you encounter it?

In WEB interface, select Ceph pool under host, the select VM Disks

aaron said:
Does the node where you run into the error, have any Ceph services setup / installed?

Yes, Ceph services already installed on all nodes in cluster.

FYI. In cluster 9 nodes, 4 nodes has disks and ceph installed, another 5 nodes are blade servers which use ceph storage.

aaron said:
Then you should see a symlink of the file

I see a symlynk only on node which has ceph.client.admin.keyring

aaron · Sep 15, 2022

Okay, so the other 5 Nodes never got the Ceph packages installed or any Ceph service configured?

In that case, I suspect that they are missing the symlink from /etc/ceph/ceph.conf -> /etc/pve/ceph.conf.

Can you send the output of the following commands from a node where you have the problem and from a node where things work?

Code:

ls -la /etc/ceph
ls -la /etc/pve/ceph.conf
apt list ceph-mds

Unrelated to the current problem, you should set up more MGRs because from the output of the first post, I can see that there is only one. You want at least one or two more that can take over, in case the node with the current MGR fails.

lDemoNl · Sep 15, 2022

aaron said:
5 Nodes never got the Ceph packages installed or any Ceph service configured?

No, all 9 nodes in cluster have already installed Ceph packages

aaron said:
from a node where you have the problem and from a node where things work?

VM Disks list not working in GUI at all nodes. But from cli I can get list of all disks located on Ceph

Output from random node

Code:

root@220pve01:~# ls -la /etc/ceph
total 12
drwxr-xr-x  2 ceph ceph 4096 Jul 28 18:32 .
drwxr-xr-x 98 root root 4096 Aug  9 17:42 ..
lrwxrwxrwx  1 root root   18 Nov 29  2021 ceph.conf -> /etc/pve/ceph.conf
-rw-r--r--  1 root root   92 May 27  2021 rbdmap
root@220pve01:~# ls -la /etc/pve/ceph.conf
-rw-r----- 1 root www-data 652 Mar 24 16:14 /etc/pve/ceph.conf
root@220pve01:~# apt list ceph-mds
Listing... Done
ceph-mds/stable,now 16.2.9-pve1 amd64 [installed]
N: There are 5 additional versions. Please use the '-a' switch to see them.
root@ala220pve01:~#

Output from node with keyring file (only 1 node has keyring file)

Code:

root@2530pve02:~# ls -la /etc/ceph
total 16
drwxr-xr-x  2 ceph ceph 4096 Jul 28 19:18 .
drwxr-xr-x 98 root root 4096 Aug  9 17:42 ..
-rw-------  1 ceph ceph  151 Nov 29  2021 ceph.client.admin.keyring
lrwxrwxrwx  1 root root   18 Nov 29  2021 ceph.conf -> /etc/pve/ceph.conf
-rw-r--r--  1 root root   92 May 27  2021 rbdmap
root@2530pve02:~# ls -la /etc/pve/ceph.conf
-rw-r----- 1 root www-data 652 Mar 24 16:14 /etc/pve/ceph.conf
root@2530pve02:~# apt list ceph-mds
Listing... Done
ceph-mds/stable,now 16.2.9-pve1 amd64 [installed]
N: There are 5 additional versions. Please use the '-a' switch to see them.

About additional MGR good advice, thanks

aaron · Sep 16, 2022

Code:

root@2530pve02:~# ls -la /etc/ceph
total 16
drwxr-xr-x  2 ceph ceph 4096 Jul 28 19:18 .
drwxr-xr-x 98 root root 4096 Aug  9 17:42 ..
-rw-------  1 ceph ceph  151 Nov 29  2021 ceph.client.admin.keyring
lrwxrwxrwx  1 root root   18 Nov 29  2021 ceph.conf -> /etc/pve/ceph.conf

Okay, I see where the misunderstanding was

Check the contents of the ceph.conf file. There should be a section like this:

Code:

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

It will point the Ceph clients to the /etc/pve/priv directory to look for the keyring file. And in there you should see the ceph.client.admin.keyring file on all nodes.
If the keyring file is not located there, then move it there from the /etc/ceph directory on node 2530pve02.

lDemoNl · Sep 20, 2022

@aaron I have tried to copy ceph.client.admin.keyring to all nodes, but nothing changed.

Could the blade-servers not have their own storage causing the problem?

lDemoNl · Oct 31, 2022

I have resolved my problem. After copying ceph.client.admin.keyring on all nodes, I started comparing rbd ls -l Ceph_Pool and VM's in GUI and found 1 VM which configured HDD on Ceph, but in Ceph it wasn't in list. After deleting this VM problem solved.

Search

Search

Problem with VM Disks list in Ceph pool

lDemoNl

Member

aaron

Proxmox Staff Member

lDemoNl

Member

aaron

Proxmox Staff Member

lDemoNl

Member

Attachments

aaron

Proxmox Staff Member

lDemoNl

Member

aaron

Proxmox Staff Member

lDemoNl

Member

lDemoNl

Member