Unable to destroy VM and browse CEPH pool

number5

New Member
Mar 22, 2021
19
1
3
United States
I have a couple hosts in a cluster with CEPH setup. I attempted to clone a very large test VM and something went wrong and the clone never completed. Right now, I am having two issues. My issue is that I am now trying to destroy the failed cloned VM, but I am unable to do so. I unlocked it, but when I try to destroy it, I get the following error: rbd error: rbd: listing images failed: (2) No such file or directory
In trying to troubleshoot that, I found that when I attempt to browser the data pool/datastore in the browser, I only see the same rbd error. The pool, however, seems to be working correctly otherwise: I can still create VMs and my other VMs are working as expected. I am also not seeing any errors looking at the CEPH logs or at the CEPH info pages in proxmox. Any help would be appreciated.
 

Attachments

  • rbderror.png
    rbderror.png
    11.4 KB · Views: 13
mhmm.. weird, can you post the output of 'pveversion -v' the storage config and a vm config?
 
mhmm.. weird, can you post the output of 'pveversion -v' the storage config and a vm config?
Hi, absolutely. What would be the best way to get the storage config?

Here's the the output from pveversion -v:

Code:
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-6
pve-kernel-helper: 6.3-6
pve-kernel-5.4.101-1-pve: 5.4.101-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.8-pve2
ceph-fuse: 15.2.8-pve2
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-network-perl: 0.4-6
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-2
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Here are the contents of the .conf for the VM I cloned from:

Code:
# qmclone temporary file
root@server1:/etc/pve/qemu-server# cat 100.conf
agent: 1
bios: ovmf
boot: order=scsi0;ide2
cores: 10
efidisk0: datstore:vm-100-disk-1,size=1M
ide2: cephfs:iso/ubuntu-20.04.1-desktop-amd64.iso,media=cdrom
memory: 20480
name: testVM-1
net0: virtio=26:36:F8:6B:CA:36,bridge=vxlan1
numa: 0
ostype: l26
parent: asdfg
scsi0: datstore:vm-100-disk-0,size=3200G
scsihw: virtio-scsi-pci
smbios1: uuid=3afc8a4e-c7ff-4dfc-b369-bbf2112be8d9
sockets: 2
vmgenid: a2a18b96-0e46-442d-a895-a9ba2bbf4af4

[asdfg]
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 10
efidisk0: datstore:vm-100-disk-1,size=1M
ide2: cephfs:iso/ubuntu-20.04.1-desktop-amd64.iso,media=cdrom
memory: 20480
name: testVM-1
net0: virtio=36:5D:5E:F8:D8:39,bridge=vmbr2,tag=147
numa: 0
ostype: l26
scsi0: datstore:vm-100-disk-0,size=3200G
scsihw: virtio-scsi-pci
smbios1: uuid=3afc8a4e-c7ff-4dfc-b369-bbf2112be8d9
snaptime: 1615393585
sockets: 2
vmgenid: a2a18b96-0e46-442d-a895-a9ba2bbf4af4
 
Last edited:
cat /etc/pve/storage.cfg
Hey, thanks for following up. I was able to fix it by following the solution proposed here https://forum.proxmox.com/threads/r...failed-2-no-such-file-or-directory-500.56577/
Once I manually deleted the broken drive with
Code:
rbd rm my_vm_disk_name -p my_ceph_pool_name
, I was able to delete the VM from the proxmox browser interface. I can also list all the drives from my ceph pool again through the browser.
Thanks again for your help!