Unsuccessful deleting VM templates, I get an error in the tasks display: "listing images failed"

rainer042

Active Member
Dec 3, 2019
37
3
28
123
Hello,

since recently I experience a strange problem when trying to delete a VM-template on a VM cluster named pxa no matter which template I try. Templates and VM storage for VMs on pxa reside on rbd "ceph-storage" of another pve cluster named pxd. pxa has no own mass storage. pxd is a hyperconverged setup. When I try to delete a template on pxa using storage on pxd I get the message shown below:

TASK ERROR: rbd error: rbd: listing images failed: (2) No such file or directory

The template is not deleted. The attached rbd storage used on pxa that is actually stored on pxd is named pxd_a. The ceph pool used for pxd_a on cluster pxd is erasure-encoded.
This pxd-storage is used for nearly all VMs on cluster pxa and alltough deleting templates does not work at the moment, deleting VMs with storage on repos pxd works just fine. Also running VMs on pxa using storage on pxd_a also works fine since a long time.

If I login to cluster pxd and try to get rbd info of a template image I could not delete on pxa this works just fine vm-230 is an example:
Code:
root@pxd1:/root# rbd info pxd_a-metadata/base-230-disk-0
rbd image 'base-230-disk-0':
        size 32 GiB in 8192 objects
        order 22 (4 MiB objects)
        snapshot_count: 1
        id: c2147a8927face
        data_pool: pxd_a-data
        ...

On pxa /etc/pve/storage.cfg looks like this for the storage in question:
Code:
rbd: pxd_a
        content images
        data-pool pxd_a-data
        krbd 0
        monhost 141.26.x.a,141.26.x.b,141.26.x.c
        pool pxd_a-metadata
        username admin

I also rechecked the admin.keyring. On the storage server pxd in file
/etc/pve/priv/ceph/pxdstorage.keyring
On the ceph client (pxa) the same key can be found in
/etc/pve/priv/ceph/pxd_a.keyring
and as already said, I am able to create and delete regular VMs on cluster pxa with storage from pxd. Just deleting templates does not work.

pveversion reports the version info below on both PVE clusters pxa and pxd:
pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.13-1-pve)

Does anyone have an idea?
How could I get a more detailed error message when trying to delete a vm-template on cluster pxa to find out more?

Thanks
Rainer
 
Last edited:
I found the culprit and also found that I had a similar problem on a different cluster before: broken rbd
The problem was, that in the ceph pool I described pxd: pxd_a-metadata there was one rbd that was broken. Running rbd -p pxd_a-metadata you see it, but trying to get information about it running rbd -p pxd_a-metadata info vm-296-disk-0 fails. This rbd had nothing to do with the template I wanted to delete which failed with the initial error except for the fact that both are in the same ceph pool.

Probably pve's storage managenment does something similar like a rbd info which also fails and then is reported as

"listing images failed"

The only thing that could have helped in this case is a more detailed error message telling what excactly did fail (getting infos about broken rbd
vm-296-disk-0 in my case.

Rainer
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!