remove crashed disk in ceph

stefek143

Member
Aug 30, 2019
8
1
8
40
Hello,
My Proxmox version is 6.4-9, Ceph 15.2.13 .

I had problem with disk and when I wanted kick him from pool then I get some errors:

Code:
destroy OSD osd.61
Remove osd.61 from the CRUSH map
Remove the osd.61 authentication key.
Remove OSD osd.61
--> Zapping: /dev/ceph-eca617c5-b240-4cc7-bcf0-d44ef8d23a40/osd-block-40e96b0c-8030-4b05-bf04-498794f85715
--> Unmounting /var/lib/ceph/osd/ceph-61
Running command: /bin/umount -v /var/lib/ceph/osd/ceph-61
stderr: umount: /var/lib/ceph/osd/ceph-61 unmounted
Running command: /bin/dd if=/dev/zero of=/dev/ceph-eca617c5-b240-4cc7-bcf0-d44ef8d23a40/osd-block-40e96b0c-8030-4b05-bf04-498794f85715 bs=1M count=10 conv=fsync
stderr: /bin/dd: fsync failed for '/dev/ceph-eca617c5-b240-4cc7-bcf0-d44ef8d23a40/osd-block-40e96b0c-8030-4b05-bf04-498794f85715': Input/output error
stderr: 10+0 records in
10+0 records out
stderr: 10485760 bytes (10 MB, 10 MiB) copied, 4578.14 s, 2.3 kB/s
--> RuntimeError: command returned non-zero exit status: 1
command '/usr/sbin/ceph-volume lvm zap --osd-id 61 --destroy' failed: exit code 1
command '/sbin/pvremove /dev/sdh' failed: exit code 5
TASK OK

In OSD pool and crush map I dont see osd.61 It looks like is out from ceph, but still exists LV, PV and service:

a) osd service:

Code:
● ceph-osd@61.service - Ceph object storage daemon osd.61
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: timeout) since Sat 2021-11-06 01:19:25 CET; 2 days ago
 Main PID: 102374

Nov 06 01:08:20 proxmox-07 ceph-osd[102374]: 2021-11-06T01:08:20.307+0100 7f17d1b82e00 -1 bdev(0x55a974f08380 /var/lib/ceph/osd/ceph-61/block) _sync_write sync_file_range error: (5) Input/output error
Nov 06 01:13:24 proxmox-07 systemd[1]: Stopping Ceph object storage daemon osd.61...
Nov 06 01:14:54 proxmox-07 systemd[1]: ceph-osd@61.service: State 'stop-sigterm' timed out. Killing.
Nov 06 01:14:54 proxmox-07 systemd[1]: ceph-osd@61.service: Killing process 102374 (ceph-osd) with signal SIGKILL.
Nov 06 01:16:25 proxmox-07 systemd[1]: ceph-osd@61.service: Processes still around after SIGKILL. Ignoring.
Nov 06 01:17:55 proxmox-07 systemd[1]: ceph-osd@61.service: State 'stop-final-sigterm' timed out. Killing.
Nov 06 01:17:55 proxmox-07 systemd[1]: ceph-osd@61.service: Killing process 102374 (ceph-osd) with signal SIGKILL.
Nov 06 01:19:25 proxmox-07 systemd[1]: ceph-osd@61.service: Processes still around after final SIGKILL. Entering failed mode.
Nov 06 01:19:25 proxmox-07 systemd[1]: ceph-osd@61.service: Failed with result 'timeout'.
Nov 06 01:19:25 proxmox-07 systemd[1]: Stopped Ceph object storage daemon osd.61.


How can clean I safety remove this disk?
 
Have you tried using webinterface? OSD down, out and destroy? (pveceph destroy 61 --cleanup)

Otherwise maybe try: systemctl stop ceph-osd@61 && ceph osd down osd.61 && ceph osd purge osd.61
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!