[SOLVED] ceph can not remove image - watchers

RobFantini

Famous Member
May 24, 2012
2,009
102
133
Boston,Mass
hello
I've checked other threads and ceph lists.

For some reason a ceph disk lists as on 2 different ceph storage.

I backed up, deleted and restored the lxc to zfs.

now on pve storage list contents the disk still shows on both places.

so i spent some hours trying to remove. not done yet and may end up deleting the pool in a day or 2 any way.
i thought i'd post this info

Code:
# rbd status -p ceph vm-213-disk-1
Watchers:
        watcher=10.11.12.3:0/2474997724 client.78184312 cookie=18446462598732840963


# rbd  info vm-213-disk-1   -p ceph
rbd image 'vm-213-disk-1':
        size 101GiB in 25856 objects
        order 22 (4MiB objects)
        block_name_prefix: rbd_data.77ad306b8b4567
        format: 2
        features: layering
        flags:
        create_timestamp: Sat Aug 25 04:44:22 2018

# rados -p ceph listwatchers rbd_header.77ad306b8b4567
watcher=10.11.12.3:0/2474997724 client.78184312 cookie=18446462598732840963

# i tried removing the mon . did not fix,  added it back

# rbd showmapped
id pool image         snap device  
... ceph vm-213-disk-1 -    /dev/rbd0
1  ceph vm-213-disk-1 -    /dev/rbd1

# this worked
rbd unmap /dev/rbd0

# rbd unmap /dev/rbd1
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy


# rbd showmapped
id pool image         snap device  
1  ceph vm-213-disk-1 -    /dev/rbd1


cat /sys/kernel/debug/ceph/220b9a53-4556-48e3-a73c-28deff665e45.client78184312/osdc

REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732840963    osd9    13.d511aa64     13.264  [9,49,40]/9     [9,49,40]/9     e76294  rbd_header.77ad306b8b4567    0x20     2       WC/0
BACKOFFS


#  I tried stopping osd9 .   that did not fix.   another osd showed

# cat /sys/kernel/debug/ceph/220b9a53-4556-48e3-a73c-28deff665e45.client78184312/osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732840963    osd49   13.d511aa64     13.264  [49,40]/49      [49,40]/49      e76296  rbd_header.77ad306b8b4567    0x20     3       WC/0
BACKOFFS


# started osd9 again after a few min

# cat /sys/kernel/debug/ceph/220b9a53-4556-48e3-a73c-28deff665e45.client78184312/osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732840963    osd9    13.d511aa64     13.264  [9,49,40]/9     [9,49,40]/9     e76298  rbd_header.77ad306b8b4567    0x20     4       WC/0
BACKOFFS

and that is where i left off.
 
Last edited:
Code:
pve3  ~ # ceph -s
  cluster:
    id:     220b9a53-4556-48e3-a73c-28deff665e45
    health: HEALTH_WARN
            noout flag(s) set
  services:
    mon: 3 daemons, quorum pve3,sys8,pve10
    mgr: pve3(active), standbys: sys8, pve10
    osd: 65 osds: 65 up, 65 in
         flags noout
  data:
    pools:   2 pools, 1088 pgs
    objects: 32.70k objects, 124GiB
    usage:   436GiB used, 25.1TiB / 25.5TiB avail
    pgs:     1088 active+clean
  io:
    client:   43.9KiB/s wr, 0op/s rd, 10op/s wr

Code:
# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-8-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1
 
attached shows ceph disk appears two different storage ID's
 

Attachments

  • pve3_Proxmox_Virtual_Environment.png
    pve3_Proxmox_Virtual_Environment.png
    37.5 KB · Views: 67
  • pve3_Proxmox_Virtual_Environment-1-.png
    pve3_Proxmox_Virtual_Environment-1-.png
    38.1 KB · Views: 65
generally if you're unable to release a mounted rbd, AND you're sure there is no outstanding IO, you can use the force switch (eg rbd unmap -o force /dev/rbd0.)

if that doesnt work, you'll need to reboot the node. (edit- the node would not go down quietly. you'd need to shoot it in the head.)
 
this removed the watcher - thank you.
Code:
rbd unmap -o force /dev/rbd1

which also fixed these
Code:
rados -p ceph listwatchers rbd_header.77ad306b8b4567

cat /sys/kernel/debug/ceph/220b9a53-4556-48e3-a73c-28deff665e45.client78184312/osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
BACKOFFS

rbd showmapped

Next -

vm-213-disk-1 still shows up in both ceph_vm and ceph_ct storage screens .

Do you have a suggestion to remove those?


thank you for the help .
 
I am trying to get info on disk or disks. in progress...
Code:
rbd --pool ceph  info vm-213-disk-1
rbd image 'vm-213-disk-1':
        size 101GiB in 25856 objects
        order 22 (4MiB objects)
        block_name_prefix: rbd_data.77ad306b8b4567
        format: 2
        features: layering
        flags:
        create_timestamp: Sat Aug 25 04:44:22 2018


I am not sure if that is the disk at ceph_ct or ceph_vm ... or if there is only one disk getting reported as being at both storages.
 
I am not sure if that is the disk at ceph_ct or ceph_vm ... or if there is only one disk getting reported as being at both storages.
This is the expected behavior, and the rest of your disks should be showing in both places as well.

both your proxmox storage pools are pointing to the same ceph pool, one in krbd mode (ct) and the other in kernel mode (vm.)
 
This is the expected behavior, and the rest of your disks should be showing in both places as well.

both your proxmox storage pools are pointing to the same ceph pool, one in krbd mode (ct) and the other in kernel mode (vm.)

OK thank you.

-----------------------------------------

now I am still trying to remove the stranded disk.
Code:
# rbd rm -f  -p ceph   vm-213-disk-1
2018-11-27 14:22:02.709825 7f6d5effd700 -1 librbd::image::RemoveRequest: 0x5559bfb876e0 check_image_watchers: image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
pve3  ~ #

the watchers are back
Code:
# cat /sys/kernel/debug/ceph/220b9a53-4556-48e3-a73c-28deff665e45.client78498966/osdc
REQUESTS 0 homeless 0                                                                                                                            
LINGER REQUESTS                                                                                                                                              
18446462598732840961    osd9    13.d511aa64     13.264  [9,49,40]/9     [9,49,40]/9     e76420  rbd_header.77ad306b8b4567       0x20    0       WC/0        
BACKOFFS

That disk is the only one left at ceph. We moved all our vm's to zfs while we upgrade the storage network.
We had bad ceph slow request issues , and I wonder if it is related to the LINGER REQUESTS / BACKOFFS

Is it normal to have 'LINGER REQUESTS' ?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!