Remove and re-add cepf OSD

vaschthestampede

Active Member
Oct 21, 2020
122
8
38
38
I'm trying to familiarize myself with problematic ceph situations.

I can't find the solution to a situation that seems simple enough.
The problem is this:
Once I have inserted various OSDs I delete them, by Stop -> Out -> Destroy.
Now I try to add them again.
The problem is that the re-added disks remains down.

How can it be solved?
Am I doing something wrong?

Available for any clarification.
 
When you add the new OSD did you see error output?

No.

pveversion -v
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-4-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.0.9
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5: 6.5.11-4
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.4
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.9
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-1
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.2
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3

pveceph status
Code:
  cluster:
    id:     058adf89-fed2-49a1-89b4-ece054dd484c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3 (age 4d)
    mgr: pve1(active, since 5d), standbys: pve3, pve2
    osd: 3 osds: 0 up (since 5d), 3 in (since 8m)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:
 
What's the output of 'ceph osd tree' and systemctl status ceph-osd@<your osd id>? I had an osd in the down state and the daemon wasn't running so systemctl restart ceph-osd@<osd id> got it working again.
 
ceph osd tree
Code:
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default                          
-3         0.03119      host pve1                          
 0     os  0.03119          osd.0    down   1.00000  1.00000
-5         0.03119      host pve2                          
 1     os  0.03119          osd.1    down   1.00000  1.00000
-7         0.03119      host pve3                          
 2     os  0.03119          osd.2    down   1.00000  1.00000

systemctl status ceph-osd@0
Code:
○ ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead) since Fri 2024-04-12 18:40:27 CEST; 4 days ago
   Duration: 4.100s
   Main PID: 42854 (code=exited, status=0/SUCCESS)
        CPU: 414ms

Apr 12 18:40:23 pve3 systemd[1]: Starting ceph-osd@0.service - Ceph object storage daemon osd.0...
Apr 12 18:40:23 pve3 systemd[1]: Started ceph-osd@0.service - Ceph object storage daemon osd.0.
Apr 12 18:40:26 pve3 ceph-osd[42854]: 2024-04-12T18:40:26.450+0200 7f46ca2c96c0 -1 osd.0 240 log_to_monitors true
Apr 12 18:40:26 pve3 ceph-osd[42854]: 2024-04-12T18:40:26.454+0200 7f46ca2c96c0 -1 osd.0 240 mon_cmd_maybe_osd_create fail: 'osd.0 has already bound to class 'os', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <i>
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46b77dd6c0 -1 osd.0 280 map says i am stopped by admin. shutting down.
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46c72816c0 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46c72816c0 -1 osd.0 280 *** Got signal Interrupt ***
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46c72816c0 -1 osd.0 280 *** Immediate shutdown (osd_fast_shutdown=true) ***
Apr 12 18:40:27 pve3 systemd[1]: ceph-osd@0.service: Deactivated successfully.
 
Last edited:
Hi @vaschthestampede

Did you do zap on the destroyed OSD? I believe 'pveceph osd destroy' command does this, but if you use regular ceph command it needs to be done manually...
 
OK, the GUI command should do zap. And if you were able to re-create the OSD from the GUI, that means the disks were zapped (first sectors zeroed). The command is 'ceph-volume lvm zap ...'

I can see that your OSDs are assigned some class 'os'. Not sure where it came from, maybe you were playing with something. You can try to reset that with 'ceph osd crush set-device-class hdd osd.0 osd.1 osd.2'
 
ceph osd crush set-device-class hdd osd.0 osd.1 osd.2
Code:
Error EBUSY: osd.0 has already bound to class 'os', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first

So I run:
Code:
ceph osd crush rm-device-class 0
ceph osd crush rm-device-class 1
ceph osd crush rm-device-class 2
ceph osd crush set-device-class hdd osd.0 osd.1 osd.2

The osd(s) now are hdd type but still down.

ceph osd tree
Code:
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default                           
-3         0.03119      host pve1                           
 0    hdd  0.03119          osd.0    down   1.00000  1.00000
-5         0.03119      host pve2                           
 1    hdd  0.03119          osd.1    down   1.00000  1.00000
-7         0.03119      host pve3                           
 2    hdd  0.03119          osd.2    down   1.00000  1.00000

systemctl status ceph-osd@0
Code:
○ ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead) since Thu 2024-04-18 08:53:17 CEST; 1min 56s ago
   Duration: 4.494s
    Process: 1167647 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 0 (code=exited, status=0/SUCCESS)
    Process: 1167651 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 0 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
   Main PID: 1167651 (code=exited, status=0/SUCCESS)
        CPU: 519ms

Apr 18 08:53:13 pve1 systemd[1]: Starting ceph-osd@0.service - Ceph object storage daemon osd.0...
Apr 18 08:53:13 pve1 systemd[1]: Started ceph-osd@0.service - Ceph object storage daemon osd.0.
Apr 18 08:53:16 pve1 ceph-osd[1167651]: 2024-04-18T08:53:16.175+0200 7f66d8c2b6c0 -1 osd.0 240 log_to_monitors true
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66c613f6c0 -1 osd.0 280 map says i am stopped by admin. shutting down.
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66d5be36c0 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66d5be36c0 -1 osd.0 280 *** Got signal Interrupt ***
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66d5be36c0 -1 osd.0 280 *** Immediate shutdown (osd_fast_shutdown=true) ***
Apr 18 08:53:17 pve1 systemd[1]: ceph-osd@0.service: Deactivated successfully.
 
Same. I accidently destroyed the wrong OSD...obviously I can just format it and add it back in, and it'll rebuild. But I would like to just add it back.

The disk partition name still shows the osd name Ceph-volume inventory /dev/sdX gives me all the IDs

Ceph-volume lvm activate --all shows it being activated and the others skipped, created all the directories in osd.x

But 'ceph add crush set x 1.0 host=xxxx' says the OSD has not been created.

What's the command to create an OSD? I found in the documentation how to create AND initialize one .. But not how to create one and add an existing one back in.

I even found documentation where people moved osds to an entirely new cluster due to hardware failure... But they had to create a monitor... And I don't need to do that. They used the ceph-objectstore-tool.

Or did I just unlock a new achievement...first one to accidently delete a wrong OSD? :)
 
Updates.
The problem seems to be resolved, at least in part, if the matadata servers are stopped.
I say "at least in part" because some OSDs still require one manual start.
The system where I tested this solution is not the same but the Ceph versions are the same.

pveversion -v
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.0-1
proxmox-backup-file-restore: 3.2.0-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.1
pve-cluster: 8.0.6
pve-container: 5.0.10
pve-docs: 8.2.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.5
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

pveceph status
Code:
  cluster:
    id:     a78f999f-d592-4fc6-a0e6-7dde2d0fb778
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3 (age 2d)
    mgr: pve1(active, since 2d), standbys: pve3, pve2
    osd: 5 osds: 4 up (since 2m), 5 in (since 2m)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   51 GiB used, 512 GiB / 563 GiB avail
    pgs:
 
Hi!

Got the same problem.
3 node ceph cluster, proxmox 8.2, 1 ceph pool with 4*20Tb HDD on each node, CEPH v.17.2.7
I've added 2*4Tb SSD on each node, and they were added to the same pool as hdd but no PGs were replicated.
So I decided to make the difeferent pools and rules for HDD and SSD and set this new SSDs down and then out.

Now I can't readd SSDs as OSD anymore.
This SSDs are not on the osd tree.
So I wonder if I can just wipe them?


root@pve1:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 218.27875 root default
-7 72.75958 host pve1
8 hdd 18.18990 osd.8 up 1.00000 1.00000
9 hdd 18.18990 osd.9 up 1.00000 1.00000
10 hdd 18.18990 osd.10 up 1.00000 1.00000
11 hdd 18.18990 osd.11 up 1.00000 1.00000
-5 72.75958 host pve2
4 hdd 18.18990 osd.4 up 1.00000 1.00000
5 hdd 18.18990 osd.5 up 1.00000 1.00000
6 hdd 18.18990 osd.6 up 1.00000 1.00000
7 hdd 18.18990 osd.7 up 1.00000 1.00000
-3 72.75958 host pve3
0 hdd 18.18990 osd.0 up 1.00000 1.00000
1 hdd 18.18990 osd.1 up 1.00000 1.00000
2 hdd 18.18990 osd.2 up 1.00000 1.00000
3 hdd 18.18990 osd.3 up 1.00000 1.00000
root@pve1:~# ceph health
HEALTH_OK


lsblk shows me the following
As you can see sdg and sdh which are the new SSDs have some LVM and CEPH records.

sdc 8:32 0 18.2T 0 disk
└─ceph--2cb5e04f--cfe1--486f--8109--a0a917a947ff-osd--block--0c46025e--a9e9--42c2--b8b7--c2f3811d3f9b
252:6 0 18.2T 0 lvm
sdd 8:48 0 18.2T 0 disk
└─ceph--599c0891--10a4--47fc--8da1--c1a7cf1e700c-osd--block--c1d1e405--6b5e--4cbb--bed5--c0d09cccde45
252:7 0 18.2T 0 lvm
sde 8:64 0 18.2T 0 disk
└─ceph--c298cd1e--163c--4894--8f76--697d601eee2b-osd--block--c7889ae0--2c83--440c--aad2--7b05663d689d
252:9 0 18.2T 0 lvm
sdf 8:80 0 18.2T 0 disk
└─ceph--ef4044e4--f931--4622--9a57--626db564199d-osd--block--cc2d6589--c031--4d90--a12e--feb2537dc774
252:8 0 18.2T 0 lvm


sdg 8:96 0 3.5T 0 disk
└─ceph--0bbc52c4--7d5d--4371--8952--a1bd6415325c-osd--block--1b26ef42--526c--40bb--8068--0c7bf3395e15
252:0 0 3.5T 0 lvm
sdh 8:112 0 3.5T 0 disk
└─ceph--99fa4939--1f3a--417f--80e0--73493990050a-osd--block--ba4b641e--3ce0--4eff--8acc--3230f46807aa
252:1 0 3.5T 0 lvm


sdi 8:128 0 28T 0 disk
└─mpatha 252:12 0 28T 0 mpath
├─ru2nas4_vg-vm--138--disk--0 252:13 0 12.1T 0 lvm
├─ru2nas4_vg-vm--138--disk--1 252:14 0 300G 0 lvm
└─ru2nas4_vg-vm--138--disk--2 252:15 0 200G 0 lvm
 
So I wonder if I can just wipe them?
Yes.

There is an integrated function for this: <node> --> Disks --> <select your disk> --> "Wipe Disk".

Only now the disk is clean and "unused" and can be reused for any other means - or a new OSD :-)
 
  • Like
Reactions: Senin
Thanks for encourage )
I destroyed volumes through Disks/LVM and then they were added as OSDs successfully.
 
  • Like
Reactions: UdoB