Problem with remove disk on ceph storage

Melanxolik

Well-Known Member
Dec 18, 2013
86
0
46
Hello all. I have problem with proxmox4 and remove ceph disk.
I saw bug report https://bugzilla.proxmox.com/show_bug.cgi?id=553#c1

My problem reproduced this way:
I created vm on proxmox cluster and first node on this cluster, after create I transfered vm to third node on cluster, and after 5minutes I try to remove this vm from cluster.
Proxmox said:

Code:
Removing all snapshots: 100% complete...done.
image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
TASK ERROR: rbd rm 'vm-624-disk-1' error: rbd: error: image still has watchers

that's strange,

Code:
rbd -p MYPOOL ls
vm-624-disk-1

Try remove from console:
Code:
# rbd rm vm-624-disk-1 -p MYPOOL
2015-11-09 11:24:08.616860 7f4f2b28c800 -1 librbd: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

Code:
rbd status vm-624-disk-1 -p MYPOOL
Watchers:
    watcher=192.168.126.1:0/2093160405 client.2137999 cookie=10

Oo that's very strange, because my VM work on node: 192.168.126.3 - third node, but watcher open on node first.
after that I try to start and stop VM on third node, and remove VM, I got error message.

In the end I migrate my VM to first node and removed this VM, maybe this situation is bug and need fix this problem? but I don't understand how I can fix?
 
Hmm, that error actually keeps us from removing images? Does not seem obvious to me when looking at that bug report that they are related.
 
Great news, it's been resolved in 0.94.6 whenever we get that. Glad I put some pressure on them about it. woo hoo!
 
There is a "workaround"

  • Create a new pool.
  • Move all vDisks that reside on the old pool (and need to be kept) to the new pool.
  • double check you have no data on the old Pool you want to keep.
  • remove the old pool.
  • Profit. Kinda. Sadly.

ps.: we encounter this issue mostly on EC pools and use a EC_pool per vDisk approach for most of what we are doing.
 
There is a "workaround"

  • Create a new pool.
  • Move all vDisks that reside on the old pool (and need to be kept) to the new pool.
  • double check you have no data on the old Pool you want to keep.
  • remove the old pool.
  • Profit. Kinda. Sadly.

ps.: we encounter this issue mostly on EC pools and use a EC_pool per vDisk approach for most of what we are doing.
@Melanxolik, I encountered similar problem two days ago.
End up I did same steps like what mentioned by Q-wulf, it is a pain when you have a lot and large vDisk.
 
yeah, and you might need to trigger a manual scrub
Code:
ceph osd scrub osd.x

so ceph actually deleted the pg's no longer recognized.
You will know if you need to by watching ceph with
Code:
ceph -w
while executing the pool delete. If your available space does not increase you probably need to scrub.
 
No resolution?
The only way I have worked around is to set the disk to "no-backup"
Then clone the VM.
Remove the old VM, and the disks will be deleted as well.

This lets me know that proxmox has the capability to delete the disks, but direct delete does not work
 
Hello,
I have the same issue now. I still have a watcher and I do not know how to remove the watcher and indeed would like to prevent to reboot all my nodes.
The ceph disk file in question is about 3TB big.

rbd status vm-111-disk-2 -p ceph-vm
Watchers:
watcher=172.16.0.3:0/3271776928 client.16316856 cookie=139974259587264


Nevermind, found a solution: https://forum.proxmox.com/threads/problem-with-remove-disk-on-ceph-storage.24387/

rbd lock rm vm-111-disk-2 auto\ 139974259587264 client.16316856 -p ceph-vm
(get data via:

rbd lock ls vm-111-disk-2 -p ceph-vm)
 
Last edited:
I had the same issue.

Code:
:~# rbd rm vNTDB-Storage/vm-112-disk-0
2019-08-01 14:22:29.567286 7fc2b97fa700 -1 librbd::image::RemoveRequest: 0x560d820f5470 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.

:~# rbd status vNTDB-Storage/vm-112-disk-0
Watchers:
        watcher=10.102.166.130:0/4114533936 client.68244147 cookie=18446462598732840961


:~# ceph osd blacklist add 10.102.166.130:0/4114533936
blacklisting 10.102.166.130:0/4114533936 until 2019-08-01 15:36:02.421699 (3600 sec)

:~# rbd rm vCNTDB-Storage/vm-112-disk-0
Removing image: 100% complete...done.

:~# ceph osd blacklist rm 10.102.166.130:0/4114533936
un-blacklisting 10.102.166.130:0/4114533936
 
Last edited:
@kaltsi, please open up a new thread, don't post into long dead ones. It makes it less likely to get help.
 
Today I had a similar problem when I removed some VMs from a pve 7.2 system with 3 hosts and some ceph VM images (vm-33-disk-2, ..). Storage in my setup is provided by an external Nautilus cluster.

For these rbd images there were watchers and I was unable to "rbd rm" them. So just like richinbg did I also ran a rbd lock rm for the lock that existed for each of the leftover rbds. Afterwads I was able to remove the leftover rbd images.

It seems however one should be careful when removing such locks on a running pve system:

Some minutes after unlocking and deleting all the leftover rbds (all the VMs that once owned these rbd images had already been deleted) I noticed that other VMs running on this pve system suddenly had no longer write access to their rbd disks. All VM consoles showed write errors the VMs beeing unable to write to their own disks.

The solution was to stop all running VMs, reboot the pve hosts and then restart all VMs. Afterwards everything was fine again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!