[SOLVED] rbd error: rbd: listing images failed: (2) No such file or directory (500)

skywavecomm

New Member
May 30, 2019
15
0
1
28
When viewing the content on our ceph rbd within the Proxmox GUI, it displays this error message. No errors for running VMs on the ceph cluster or moving disks or anything.

It's more annoying than anything but how can I resolve this issue without having to create a new pool and transfer data from the existing one over since that's not really a fix?

Thanks!
 

skywavecomm

New Member
May 30, 2019
15
0
1
28
I just removed the pool and rebuilt it and the issue was resolved.

Thank you though!
 

Ralf Zenklusen

New Member
Apr 24, 2016
3
5
1
I see the same error in the GUI after a fresh 3 node cluster install of 6.0.5.
Everything seems to run fine, but the above error is shown at GUI>node>cephpool>content.
/etc/pve/priv/ceph.client.admin.keyring is identical on all nodes.

What can be done instead of recreating?

Thanks.
 

Ralf Zenklusen

New Member
Apr 24, 2016
3
5
1
Unfortunately I'm not sure what you would like to point out.

There's no external cluster - just the 3 nodes with onboard SSD used to create the ceph cluster with a single pool called "cephpool".

/etc/pve/storage.cfg shows:
rbd: cephpool
content rootdir,images
krbd 0
pool cephpool

And cat /etc/pve/priv/ceph/cephpool.keyring
[client.admin]
key = same-on-all-3-nodes==
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"

The file /etc/ceph/ceph.client.admin.keyring only exists on one node (the first created) though.

/etc/pve/priv/ceph/cephfs.secret contains the same key, so that seems ok.

Any other idea?
 

Ralf Zenklusen

New Member
Apr 24, 2016
3
5
1
Ok, solved...

rbd ls -l cephpool
showed a few correct images but also:
rbd: error opening vm-124-disk-1: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
vm-100-disk-0 80 GiB 2
vm-101-disk-0 80 GiB 2
vm-102-disk-0 80 GiB 2
vm-124-disk-2 550 GiB 2

vm-124-disk-1 was created via "GUI>Create CT" for an unprivileged container.
I then made the mistake to try to restore a privileged containers backup - which obviously failed.
Not sure, but I think the container disappeared when the restore failed or maybe I did delete it.
I then recreated the container with ID124 which resulted in the image vm-124-disk-2 to be created.
But obviously vm-124-disk-1 did not correctly deleted and resulted in this problem.

I simply deleted the image manually:
rbd rm vm-124-disk-1 -p cephpool

So there could be a problem after a failed restore...
 

skywavecomm

New Member
May 30, 2019
15
0
1
28
Ok, solved...

rbd ls -l cephpool
showed a few correct images but also:
rbd: error opening vm-124-disk-1: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
vm-100-disk-0 80 GiB 2
vm-101-disk-0 80 GiB 2
vm-102-disk-0 80 GiB 2
vm-124-disk-2 550 GiB 2

vm-124-disk-1 was created via "GUI>Create CT" for an unprivileged container.
I then made the mistake to try to restore a privileged containers backup - which obviously failed.
Not sure, but I think the container disappeared when the restore failed or maybe I did delete it.
I then recreated the container with ID124 which resulted in the image vm-124-disk-2 to be created.
But obviously vm-124-disk-1 did not correctly deleted and resulted in this problem.

I simply deleted the image manually:
rbd rm vm-124-disk-1 -p cephpool

So there could be a problem after a failed restore...
Yeah if I cancel a disk move to a ceph pool, it does say `Removing image: 1% complete...` but then is canceled at 2% so it seems that cancelling a disk move cancels the disk from being removed on the ceph pool. @Alwin
 

Alwin

Proxmox Staff Member
Aug 1, 2017
4,617
449
88
This is an old thread, please open up a new one. Also if not done so already, upgrade to the latest packages.
 
  • Like
Reactions: skywavecomm

wiguyot

New Member
Oct 12, 2021
2
3
1
52
I have encountered same kind of problem (proxmox v6) with ceph rbd. I problem came from a VM image on the rbd that wasn't destroyed during the destruction of the VM.
I have found the problem switching from "rbd -p my-pool list" to "rbd -p my-pool list --long". I had i line more in the short version. It was the faulty image to remove by "rbd -p my-pool rm my-faulty-file".
 

brosky

Member
Oct 13, 2015
25
2
23
just a quick follow up, i've encountered this error when I move a disk from local storage to ceph storage. All tasks finished without errors but somehow if you already have a disk with the same name on the pool, proxmox will rename your new image and update the VM config.

This happens when you cancel migration jobs - pve does not clean up after a cancelled job.

Just list the pool with --long and without then compare the results in a excel file or diff.
 
  • Like
Reactions: Urbaman

ZZ9

Member
Feb 9, 2020
14
2
8
52
Unfortunately I ran into this problem (again).
The last time the info given by wiguyot helped (removing a failed image). But not this time.
I can add an external newly created ceph-pool via gui (or command line)
As long as the pool is empty I can do a list command on one of the proxmox-servers ("rbd ls -p rbd" or "rbd ls -p rbd --long")
But as soon as I copy something over to the new pool on one of the ceph-servers (i.e. "rbd deep cp BackupPool/vm-107-disk-0 rbd/vm-107-disk-0")
"rbd ls -p rbd" works (and shows the vm-107-disk-0) but "rbd ls -p rbd --long" no longer does (on proxmox). No Output.
And the Webinterface shows no images as well and gives a connection error. ("Connection timed out (596)")
(on the ceph-cluster "rbd ls -p rbd --long" always shows the expected files)

I am stuck here for some days now. Could somebody maybe point me in the right direction or give a hint where to debug more?

(proxmox is V 7.2.4 and ceph 17.2.0)
Last test for today: When creating a new VM the disk is actually created on the cluster/pool. (as "rbd list --long shows on the ceph-servers) but cannot start and I cannot remove the VM afterwards...)

Update: After rebooting every ceph-node one by one is worked again out of the sudden.... I have no clue what happend.
 
Last edited:

gunterwa

New Member
Apr 1, 2022
6
0
1
Ok, solved...

rbd ls -l cephpool
showed a few correct images but also:
rbd: error opening vm-124-disk-1: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
vm-100-disk-0 80 GiB 2
vm-101-disk-0 80 GiB 2
vm-102-disk-0 80 GiB 2
vm-124-disk-2 550 GiB 2

vm-124-disk-1 was created via "GUI>Create CT" for an unprivileged container.
I then made the mistake to try to restore a privileged containers backup - which obviously failed.
Not sure, but I think the container disappeared when the restore failed or maybe I did delete it.
I then recreated the container with ID124 which resulted in the image vm-124-disk-2 to be created.
But obviously vm-124-disk-1 did not correctly deleted and resulted in this problem.

I simply deleted the image manually:
rbd rm vm-124-disk-1 -p cephpool

So there could be a problem after a failed restore...

Your solution does work for me, GUI can now correctlly show ceph pool and error msg -- "rdb error: rdb: listing images failed: (2) No such file or directory (500)" disappeared. THX .).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!