Remove and re-add cepf OSD

vaschthestampede

Active Member
Oct 21, 2020
115
7
38
38
I'm trying to familiarize myself with problematic ceph situations.

I can't find the solution to a situation that seems simple enough.
The problem is this:
Once I have inserted various OSDs I delete them, by Stop -> Out -> Destroy.
Now I try to add them again.
The problem is that the re-added disks remains down.

How can it be solved?
Am I doing something wrong?

Available for any clarification.
 
When you add the new OSD did you see error output?

No.

pveversion -v
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.11-4-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.0.9
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
proxmox-kernel-6.5: 6.5.11-4
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.4
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.9
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-1
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.2
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 5.3.0-2
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve3

pveceph status
Code:
  cluster:
    id:     058adf89-fed2-49a1-89b4-ece054dd484c
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3 (age 4d)
    mgr: pve1(active, since 5d), standbys: pve3, pve2
    osd: 3 osds: 0 up (since 5d), 3 in (since 8m)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:
 
What's the output of 'ceph osd tree' and systemctl status ceph-osd@<your osd id>? I had an osd in the down state and the daemon wasn't running so systemctl restart ceph-osd@<osd id> got it working again.
 
ceph osd tree
Code:
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default                          
-3         0.03119      host pve1                          
 0     os  0.03119          osd.0    down   1.00000  1.00000
-5         0.03119      host pve2                          
 1     os  0.03119          osd.1    down   1.00000  1.00000
-7         0.03119      host pve3                          
 2     os  0.03119          osd.2    down   1.00000  1.00000

systemctl status ceph-osd@0
Code:
○ ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead) since Fri 2024-04-12 18:40:27 CEST; 4 days ago
   Duration: 4.100s
   Main PID: 42854 (code=exited, status=0/SUCCESS)
        CPU: 414ms

Apr 12 18:40:23 pve3 systemd[1]: Starting ceph-osd@0.service - Ceph object storage daemon osd.0...
Apr 12 18:40:23 pve3 systemd[1]: Started ceph-osd@0.service - Ceph object storage daemon osd.0.
Apr 12 18:40:26 pve3 ceph-osd[42854]: 2024-04-12T18:40:26.450+0200 7f46ca2c96c0 -1 osd.0 240 log_to_monitors true
Apr 12 18:40:26 pve3 ceph-osd[42854]: 2024-04-12T18:40:26.454+0200 7f46ca2c96c0 -1 osd.0 240 mon_cmd_maybe_osd_create fail: 'osd.0 has already bound to class 'os', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <i>
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46b77dd6c0 -1 osd.0 280 map says i am stopped by admin. shutting down.
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46c72816c0 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46c72816c0 -1 osd.0 280 *** Got signal Interrupt ***
Apr 12 18:40:27 pve3 ceph-osd[42854]: 2024-04-12T18:40:27.846+0200 7f46c72816c0 -1 osd.0 280 *** Immediate shutdown (osd_fast_shutdown=true) ***
Apr 12 18:40:27 pve3 systemd[1]: ceph-osd@0.service: Deactivated successfully.
 
Last edited:
Hi @vaschthestampede

Did you do zap on the destroyed OSD? I believe 'pveceph osd destroy' command does this, but if you use regular ceph command it needs to be done manually...
 
OK, the GUI command should do zap. And if you were able to re-create the OSD from the GUI, that means the disks were zapped (first sectors zeroed). The command is 'ceph-volume lvm zap ...'

I can see that your OSDs are assigned some class 'os'. Not sure where it came from, maybe you were playing with something. You can try to reset that with 'ceph osd crush set-device-class hdd osd.0 osd.1 osd.2'
 
ceph osd crush set-device-class hdd osd.0 osd.1 osd.2
Code:
Error EBUSY: osd.0 has already bound to class 'os', can not reset class to 'hdd'; use 'ceph osd crush rm-device-class <id>' to remove old class first

So I run:
Code:
ceph osd crush rm-device-class 0
ceph osd crush rm-device-class 1
ceph osd crush rm-device-class 2
ceph osd crush set-device-class hdd osd.0 osd.1 osd.2

The osd(s) now are hdd type but still down.

ceph osd tree
Code:
ID  CLASS  WEIGHT   TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         0.09357  root default                           
-3         0.03119      host pve1                           
 0    hdd  0.03119          osd.0    down   1.00000  1.00000
-5         0.03119      host pve2                           
 1    hdd  0.03119          osd.1    down   1.00000  1.00000
-7         0.03119      host pve3                           
 2    hdd  0.03119          osd.2    down   1.00000  1.00000

systemctl status ceph-osd@0
Code:
○ ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead) since Thu 2024-04-18 08:53:17 CEST; 1min 56s ago
   Duration: 4.494s
    Process: 1167647 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 0 (code=exited, status=0/SUCCESS)
    Process: 1167651 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 0 --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
   Main PID: 1167651 (code=exited, status=0/SUCCESS)
        CPU: 519ms

Apr 18 08:53:13 pve1 systemd[1]: Starting ceph-osd@0.service - Ceph object storage daemon osd.0...
Apr 18 08:53:13 pve1 systemd[1]: Started ceph-osd@0.service - Ceph object storage daemon osd.0.
Apr 18 08:53:16 pve1 ceph-osd[1167651]: 2024-04-18T08:53:16.175+0200 7f66d8c2b6c0 -1 osd.0 240 log_to_monitors true
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66c613f6c0 -1 osd.0 280 map says i am stopped by admin. shutting down.
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66d5be36c0 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66d5be36c0 -1 osd.0 280 *** Got signal Interrupt ***
Apr 18 08:53:17 pve1 ceph-osd[1167651]: 2024-04-18T08:53:17.851+0200 7f66d5be36c0 -1 osd.0 280 *** Immediate shutdown (osd_fast_shutdown=true) ***
Apr 18 08:53:17 pve1 systemd[1]: ceph-osd@0.service: Deactivated successfully.
 
Same. I accidently destroyed the wrong OSD...obviously I can just format it and add it back in, and it'll rebuild. But I would like to just add it back.

The disk partition name still shows the osd name Ceph-volume inventory /dev/sdX gives me all the IDs

Ceph-volume lvm activate --all shows it being activated and the others skipped, created all the directories in osd.x

But 'ceph add crush set x 1.0 host=xxxx' says the OSD has not been created.

What's the command to create an OSD? I found in the documentation how to create AND initialize one .. But not how to create one and add an existing one back in.

I even found documentation where people moved osds to an entirely new cluster due to hardware failure... But they had to create a monitor... And I don't need to do that. They used the ceph-objectstore-tool.

Or did I just unlock a new achievement...first one to accidently delete a wrong OSD? :)
 
Updates.
The problem seems to be resolved, at least in part, if the matadata servers are stopped.
I say "at least in part" because some OSDs still require one manual start.
The system where I tested this solution is not the same but the Ceph versions are the same.

pveversion -v
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.0-1
proxmox-backup-file-restore: 3.2.0-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.1
pve-cluster: 8.0.6
pve-container: 5.0.10
pve-docs: 8.2.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.5
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

pveceph status
Code:
  cluster:
    id:     a78f999f-d592-4fc6-a0e6-7dde2d0fb778
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3 (age 2d)
    mgr: pve1(active, since 2d), standbys: pve3, pve2
    osd: 5 osds: 4 up (since 2m), 5 in (since 2m)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   51 GiB used, 512 GiB / 563 GiB avail
    pgs:
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!