Can't remove image when migrated with CEPH + KRBD

hybrid512 · Jul 11, 2016

Hi,

I just found a nasty bug when using ProxMox 4.2 in a clustered setup with Ceph and a KRBD configured storage.

Using KRBD, a /dev/rbdxx entry is created on the server to gain access to the RBD image.
When migrating a VM using such volume from server A to server B, the /dev/rbdxx device is still mapped on the server A and then mapped again on server B.
If you try to delete this VM, Ceph will complain because there are still existing watchers on this image.
In fact the "watcher" is the /dev/rbdxx that is still mapped on the server A.

In order to be able to remove that image again, you first have to unmap this device on the server A.

So to conclude, when migrating VM between servers when there are KRBD images involved, ProxMox should not forget to unmap properly the device on the preceding node.

Regards.

fabian · Jul 12, 2016

which ceph version are you using? cannot reproduce this here using hammer (both client & cluster)

jrosengren · Jul 13, 2016

I ran into this problem when I cloned a VM that had ceph storage devices to another host. The clone was fine, however I couldn't remove the cloned VM as there were RBD devices mapped on the host where the VM was cloned from.

Running Ceph 0.94.7 and latest Proxmox (free) update.s

hybrid512 · Jul 13, 2016

Package versions

proxmox-ve: 4.2-54 (running kernel: 4.4.10-1-pve)
pve-manager: 4.2-15 (running version: 4.2-15/6669ad2c)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.10-1-pve: 4.4.10-54
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-42
qemu-server: 4.0-81
pve-firmware: 1.1-8
libpve-common-perl: 4.0-68
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-55
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-68
pve-firewall: 2.0-29
pve-ha-manager: 1.0-32
ksm-control-daemon: 1.2-1
glusterfs-client: 3.6.9-2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
ceph: 0.94.7-1~bpo80+1

hybrid512 · Jul 13, 2016

Here is a way to fix the problem when encountered :

on any node with access to ceph :

Code:

rbd info vm-100-disk-1
rbd image 'vm-100-disk-1':
        size 1024 TB in 268435456 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.82072ae8944a
        format: 2
        features: layering

rados -p <ceph pool name> listwatchers rbd_header.82072ae8944a

It will return a line like this where "watcher=" is the IP of the node where the image is mapped.

Code:

watcher=172.16.10.123:0/2563416157 client.4043115 cookie=1

Connect this node and type :

Code:

rbd showmapped
id pool image          snap device    
0  rbd  vm-100-disk-1 -    /dev/rbd0  
1  rbd  vm-101-disk-1 -    /dev/rbd1  

rbd unmap  /dev/rbd0

Once unmapped, you can then delete the image (by deleting its according VM for example)

fabian · Jul 13, 2016

thanks! patch that fixes this is on pve-devel.

hybrid512 · Jul 13, 2016

great !

jrosengren · Jul 13, 2016

thank you!

Search

Search

Can't remove image when migrated with CEPH + KRBD

hybrid512

Active Member

fabian

Proxmox Staff Member

jrosengren

Member

hybrid512

Active Member

hybrid512

Active Member

fabian

Proxmox Staff Member

hybrid512

Active Member

jrosengren

Member