SOLVED: duplicate ceph osd IDs - how to resolve?

sigmarb

Well-Known Member
Nov 8, 2016
69
6
48
39
Dear Proxmox/Ceph-users,

i have the strange problem, that two disks seem to use the same osd ID. This is a 3 node proxmox 6 cluster.

Code:
root@adm-proxmox02:~# ceph-volume lvm list

Code:
====== osd.19 ======

  [block]       /dev/ceph-0baaac44-3e68-49ef-b79d-e7ec2027052a/osd-block-9877ee85-a4a8-41e6-af93-360b041b3e38

      block device              /dev/ceph-0baaac44-3e68-49ef-b79d-e7ec2027052a/osd-block-9877ee85-a4a8-41e6-af93-360b041b3e38
      block uuid                8EHC3G-kMkc-ECyZ-bbk9-0ajx-jg1z-t6wNHB
      cephx lockbox secret     
      cluster fsid              8c76cfc1-ad2e-4e2b-b770-d3d2a28dc2c1
      cluster name              ceph
      crush device class        None
      encrypted                 0
      osd fsid                  9877ee85-a4a8-41e6-af93-360b041b3e38
      osd id                    19
      type                      block
      vdo                       0
      devices                   /dev/nvme3n1


Code:
root@adm-proxmox01:~# ceph-volume lvm list

Code:
====== osd.19 ======

  [block]       /dev/ceph-211487ed-4662-45f5-b8b6-2a3bf2df82f5/osd-block-a3b030e8-5037-4dd9-a6c3-37ad8fddb690

      block device              /dev/ceph-211487ed-4662-45f5-b8b6-2a3bf2df82f5/osd-block-a3b030e8-5037-4dd9-a6c3-37ad8fddb690
      block uuid                LgLfjK-IlpR-6Wre-wDB8-fU5z-i6Bf-CmaKWB
      cephx lockbox secret     
      cluster fsid              8c76cfc1-ad2e-4e2b-b770-d3d2a28dc2c1
      cluster name              ceph
      crush device class        None
      encrypted                 0
      osd fsid                  a3b030e8-5037-4dd9-a6c3-37ad8fddb690
      osd id                    19
      osdspec affinity         
      type                      block
      vdo                       0
      devices                   /dev/sdh

However the disk is obviously not in use on proxmox02:

Code:
root@adm-proxmox01:~# ceph osd tree
ID CLASS     WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
-1           47.56541 root default                                   
-3           16.47614     host adm-proxmox01                         
13 intel-ssd  0.87329         osd.13             up  1.00000 1.00000
19 intel-ssd  0.87329         osd.19             up  1.00000 1.00000
12      nvme  1.86299         osd.12             up  1.00000 1.00000
14      nvme  1.86299         osd.14             up  1.00000 1.00000
15      nvme  1.86299         osd.15             up  1.00000 1.00000
24      nvme  1.86299         osd.24             up  1.00000 1.00000
 0       ssd  1.81940         osd.0              up  1.00000 1.00000
 1       ssd  1.81940         osd.1              up  1.00000 1.00000
 5       ssd  1.81940         osd.5              up  1.00000 1.00000
 6       ssd  1.81940         osd.6              up  1.00000 1.00000
-5           14.61314     host adm-proxmox02                         
25 intel-ssd  0.87329         osd.25             up  1.00000 1.00000
26 intel-ssd  0.87329         osd.26             up  1.00000 1.00000
16      nvme  1.86299         osd.16             up  1.00000 1.00000
17      nvme  1.86299         osd.17             up  1.00000 1.00000
18      nvme  1.86299         osd.18             up  1.00000 1.00000
 2       ssd  1.81940         osd.2              up  1.00000 1.00000
 3       ssd  1.81940         osd.3              up  1.00000 1.00000
 4       ssd  1.81940         osd.4              up  1.00000 1.00000
 7       ssd  1.81940         osd.7              up  1.00000 1.00000
 
Could solve the problem on my own with:
Code:
ceph-volume lvm zap /dev/nvme3n1 --destroy
fdisk /dev/nvme3n1 (just hit W)
And re-add it via proxmox gui again.

and finally assign device-class again with

Code:
ceph osd crush rm-device-class osd.29
ceph osd crush set-device-class nvme osd.29
 
  • Like
Reactions: lDemoNl and aaron