[SOLVED] rbd error: rbd: listing images failed: (2) No such file or directory (500)

skywavecomm

Member
May 30, 2019
15
0
21
30
When viewing the content on our ceph rbd within the Proxmox GUI, it displays this error message. No errors for running VMs on the ceph cluster or moving disks or anything.

It's more annoying than anything but how can I resolve this issue without having to create a new pool and transfer data from the existing one over since that's not really a fix?

Thanks!
 
I see the same error in the GUI after a fresh 3 node cluster install of 6.0.5.
Everything seems to run fine, but the above error is shown at GUI>node>cephpool>content.
/etc/pve/priv/ceph.client.admin.keyring is identical on all nodes.

What can be done instead of recreating?

Thanks.
 
Unfortunately I'm not sure what you would like to point out.

There's no external cluster - just the 3 nodes with onboard SSD used to create the ceph cluster with a single pool called "cephpool".

/etc/pve/storage.cfg shows:
rbd: cephpool
content rootdir,images
krbd 0
pool cephpool

And cat /etc/pve/priv/ceph/cephpool.keyring
[client.admin]
key = same-on-all-3-nodes==
caps mds = "allow *"
caps mgr = "allow *"
caps mon = "allow *"
caps osd = "allow *"

The file /etc/ceph/ceph.client.admin.keyring only exists on one node (the first created) though.

/etc/pve/priv/ceph/cephfs.secret contains the same key, so that seems ok.

Any other idea?
 
Ok, solved...

rbd ls -l cephpool
showed a few correct images but also:
rbd: error opening vm-124-disk-1: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
vm-100-disk-0 80 GiB 2
vm-101-disk-0 80 GiB 2
vm-102-disk-0 80 GiB 2
vm-124-disk-2 550 GiB 2

vm-124-disk-1 was created via "GUI>Create CT" for an unprivileged container.
I then made the mistake to try to restore a privileged containers backup - which obviously failed.
Not sure, but I think the container disappeared when the restore failed or maybe I did delete it.
I then recreated the container with ID124 which resulted in the image vm-124-disk-2 to be created.
But obviously vm-124-disk-1 did not correctly deleted and resulted in this problem.

I simply deleted the image manually:
rbd rm vm-124-disk-1 -p cephpool

So there could be a problem after a failed restore...
 
Ok, solved...

rbd ls -l cephpool
showed a few correct images but also:
rbd: error opening vm-124-disk-1: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
vm-100-disk-0 80 GiB 2
vm-101-disk-0 80 GiB 2
vm-102-disk-0 80 GiB 2
vm-124-disk-2 550 GiB 2

vm-124-disk-1 was created via "GUI>Create CT" for an unprivileged container.
I then made the mistake to try to restore a privileged containers backup - which obviously failed.
Not sure, but I think the container disappeared when the restore failed or maybe I did delete it.
I then recreated the container with ID124 which resulted in the image vm-124-disk-2 to be created.
But obviously vm-124-disk-1 did not correctly deleted and resulted in this problem.

I simply deleted the image manually:
rbd rm vm-124-disk-1 -p cephpool

So there could be a problem after a failed restore...
Yeah if I cancel a disk move to a ceph pool, it does say `Removing image: 1% complete...` but then is canceled at 2% so it seems that cancelling a disk move cancels the disk from being removed on the ceph pool. @Alwin
 
This is an old thread, please open up a new one. Also if not done so already, upgrade to the latest packages.
 
  • Like
Reactions: skywavecomm
I have encountered same kind of problem (proxmox v6) with ceph rbd. I problem came from a VM image on the rbd that wasn't destroyed during the destruction of the VM.
I have found the problem switching from "rbd -p my-pool list" to "rbd -p my-pool list --long". I had i line more in the short version. It was the faulty image to remove by "rbd -p my-pool rm my-faulty-file".
 
just a quick follow up, i've encountered this error when I move a disk from local storage to ceph storage. All tasks finished without errors but somehow if you already have a disk with the same name on the pool, proxmox will rename your new image and update the VM config.

This happens when you cancel migration jobs - pve does not clean up after a cancelled job.

Just list the pool with --long and without then compare the results in a excel file or diff.
 
  • Like
Reactions: Urbaman
Unfortunately I ran into this problem (again).
The last time the info given by wiguyot helped (removing a failed image). But not this time.
I can add an external newly created ceph-pool via gui (or command line)
As long as the pool is empty I can do a list command on one of the proxmox-servers ("rbd ls -p rbd" or "rbd ls -p rbd --long")
But as soon as I copy something over to the new pool on one of the ceph-servers (i.e. "rbd deep cp BackupPool/vm-107-disk-0 rbd/vm-107-disk-0")
"rbd ls -p rbd" works (and shows the vm-107-disk-0) but "rbd ls -p rbd --long" no longer does (on proxmox). No Output.
And the Webinterface shows no images as well and gives a connection error. ("Connection timed out (596)")
(on the ceph-cluster "rbd ls -p rbd --long" always shows the expected files)

I am stuck here for some days now. Could somebody maybe point me in the right direction or give a hint where to debug more?

(proxmox is V 7.2.4 and ceph 17.2.0)
Last test for today: When creating a new VM the disk is actually created on the cluster/pool. (as "rbd list --long shows on the ceph-servers) but cannot start and I cannot remove the VM afterwards...)

Update: After rebooting every ceph-node one by one is worked again out of the sudden.... I have no clue what happend.
 
Last edited:
Ok, solved...

rbd ls -l cephpool
showed a few correct images but also:
rbd: error opening vm-124-disk-1: (2) No such file or directory
NAME SIZE PARENT FMT PROT LOCK
vm-100-disk-0 80 GiB 2
vm-101-disk-0 80 GiB 2
vm-102-disk-0 80 GiB 2
vm-124-disk-2 550 GiB 2

vm-124-disk-1 was created via "GUI>Create CT" for an unprivileged container.
I then made the mistake to try to restore a privileged containers backup - which obviously failed.
Not sure, but I think the container disappeared when the restore failed or maybe I did delete it.
I then recreated the container with ID124 which resulted in the image vm-124-disk-2 to be created.
But obviously vm-124-disk-1 did not correctly deleted and resulted in this problem.

I simply deleted the image manually:
rbd rm vm-124-disk-1 -p cephpool

So there could be a problem after a failed restore...

Your solution does work for me, GUI can now correctlly show ceph pool and error msg -- "rdb error: rdb: listing images failed: (2) No such file or directory (500)" disappeared. THX .).
 
In my case, the error appeared after removing a VM. The output showed that the image was being removed
("100% complete. Done.") but was followed by this error. The VM was still shown under the respective node. Removing it again did not help. This has happened more than once.

I checked the content of the respective CEPH but the disk image was nowhere to be seen.

I then just deleted the VM's conf file (in /etc/pve/qemu-server and the VM from the GUI.
 
as I did not the first time of course.
addition: the disk in the virtual machine was eventually removed from the configuration (vm without hdd )
the task was to delete vm-106-disk-1
pay attention to the different types of commands listing the directory of the storage in different ways displayed vm-106-disk-1
================================
Linux pve3 5.15.83-1-pve #1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z) x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Wed Feb 1 16:11:18 MSK 2023 on pts/0 root@pve3:~# rbd -p pool-seph list vm-100-disk-0 vm-101-disk-0 vm-101-disk-1 vm-102-disk-0 vm-104-disk-0 vm-105-disk-0 vm-107-disk-0 vm-106-disk-1 root@pve3:~# rbd -p pool-seph rm vm-106-disk-1 2023-02-01T16:23:51.715+0300 7f45d2704700 -1 librbd::image::preRemoveRequest: 0x7f45b0068cb0 check_image_watchers: image has watchers - not removing Removing image: 0% complete...failed. rbd: error: image still has watchers This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout. root@pve3:~# rbd -p pool-seph list vm-100-disk-0 vm-101-disk-0 vm-101-disk-1 vm-102-disk-0 vm-104-disk-0 vm-105-disk-0 vm-107-disk-0 vm-106-disk-1 root@pve3:~# rbd ls -l pool-seph rbd: error opening vm-106-disk-1: (2) No such file or directory NAME SIZE PARENT FMT PROT LOCK vm-100-disk-0 50 GiB 2 excl vm-101-disk-0 100 GiB 2 excl vm-101-disk-1 500 GiB 2 excl vm-102-disk-0 200 GiB 2 excl vm-104-disk-0 100 GiB 2 excl vm-105-disk-0 100 GiB 2 excl vm-107-disk-0 100 GiB 2 excl rbd: listing images failed: (2) No such file or directory root@pve3:~# rbd -p pool-seph list vm-100-disk-0 vm-101-disk-0 vm-101-disk-1 vm-102-disk-0 vm-104-disk-0 vm-105-disk-0 vm-107-disk-0 vm-106-disk-1 root@pve3:~# rbd -p pool-seph rm vm-106-disk-1 Removing image: 100% complete...done. root@pve3:~# rbd -p pool-seph list vm-100-disk-0 vm-101-disk-0 vm-101-disk-1 vm-102-disk-0 vm-104-disk-0 vm-105-disk-0 vm-107-disk-0 root@pve3:~# rbd ls -l pool-seph NAME SIZE PARENT FMT PROT LOCK vm-100-disk-0 50 GiB 2 excl vm-101-disk-0 100 GiB 2 excl vm-101-disk-1 500 GiB 2 excl vm-102-disk-0 200 GiB 2 excl vm-104-disk-0 100 GiB 2 excl vm-105-disk-0 100 GiB 2 excl vm-107-disk-0 100 GiB 2 excl root@pve3:~#
 
Last edited:
  • Like
Reactions: omerk

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!