[SOLVED] "Phantom" destroyed OSD

Belokan

Active Member
Apr 27, 2016
155
16
38
Hello,

I've replaced a failing OSD (osd.1) on my cluster.

GUI: Stop -> Out -> Destroy then Create OSD with the new disk (same sdX as the removed failed one).

On the GUI, OSD tab, my 3 OSDs where present and Up/In but I did not noticed that osd.3 replaced osd.1 at first.

Now, on the Ceph TAB the status is 3 Up/In and 1 Down/Out as ceph osd stat shows:

root@pve1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.36409 root default
-2 0.45470 host pve1
2 hdd 0.45470 osd.2 up 1.00000 1.00000
-3 0.45470 host pve2
3 hdd 0.45470 osd.3 up 1.00000 1.00000
-4 0.45470 host pve3
0 hdd 0.45470 osd.0 up 1.00000 1.00000
1 0 osd.1 down 0 1.00000

I've manually tried to destroy the "phantom" OSD:

root@pve1:~# ceph osd destroy 1 --yes-i-really-mean-it
destroyed osd.1

But it still appears as Out/Down in the GUI and now as "destroyed" in the CLI:

root@pve1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.36409 root default
-2 0.45470 host pve1
2 hdd 0.45470 osd.2 up 1.00000 1.00000
-3 0.45470 host pve2
3 hdd 0.45470 osd.3 up 1.00000 1.00000
-4 0.45470 host pve3
0 hdd 0.45470 osd.0 up 1.00000 1.00000
1 0 osd.1 destroyed 0 1.00000

root@pve1:~# ceph osd stat
4 osds: 3 up, 3 in

How could I clean up the situation now ?

root@pve1:~# pveversion -v
proxmox-ve: 5.1-43 (running kernel: 4.15.17-1-pve)
pve-manager: 5.1-52 (running version: 5.1-52/ba597a64)
pve-kernel-4.15: 5.1-4
pve-kernel-4.15.17-1-pve: 4.15.17-8
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-15
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-20
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-17
pve-cluster: 5.0-27
pve-container: 2.0-22
pve-docs: 5.1-18
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-3
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.8-pve1~bpo9

Thanks in advance,

Olivier
 
Remove the OSD from the auth list (ceph auth del osd.ID) and crush (ceph osd crush rm osd.ID). This should remove the OSD entry and the auth key.
 
Hello Alwin,

Thanks for your answer. Unfortunately, it seems you've underestimated the "phantom" factor ;)

root@pve1:~# ceph auth del osd.1
entity osd.1 does not exist

root@pve1:~# ceph osd crush rm osd.1
device 'osd.1' does not appear in the crush map

It does not show in ceph auth ls nor has a key:

root@pve1:~# ceph auth print-key osd.1
Error ENOENT: don't have osd.1

Have a nice day.
 
Got it with purge ...

root@pve1:~# ceph osd purge 1 --yes-i-really-mean-it
purged osd.1

root@pve1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.36409 root default
-2 0.45470 host pve1
2 hdd 0.45470 osd.2 up 1.00000 1.00000
-3 0.45470 host pve2
3 hdd 0.45470 osd.3 up 1.00000 1.00000
-4 0.45470 host pve3
0 hdd 0.45470 osd.0 up 1.00000 1.00000

Solved !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!