Ceph OSD stoped and out

frantek

Renowned Member
May 30, 2009
176
7
83
Hi,

I've a problem with one OSD in my Ceph cluster:

Code:
# ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
    pg 7.2fa is active+clean+inconsistent, acting [13,6,16]

Code:
# systemctl status ceph-osd@14.service
● ceph-osd@14.service - Ceph object storage daemon osd.14
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: signal) since Wed 2019-05-15 10:20:26 CEST; 6min ago
  Process: 3225166 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 14 --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 3225161 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS)
 Main PID: 3225166 (code=killed, signal=ABRT)

Mai 15 10:20:26 pve03 systemd[1]: ceph-osd@14.service: Start request repeated too quickly.
Mai 15 10:20:26 pve03 systemd[1]: Failed to start Ceph object storage daemon osd.14.
Mai 15 10:20:26 pve03 systemd[1]: ceph-osd@14.service: Unit entered failed state.
Mai 15 10:20:26 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.
Mai 15 10:23:20 pve03 systemd[1]: ceph-osd@14.service: Start request repeated too quickly.
Mai 15 10:23:20 pve03 systemd[1]: Failed to start Ceph object storage daemon osd.14.
Mai 15 10:23:20 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.
Mai 15 10:24:14 pve03 systemd[1]: ceph-osd@14.service: Start request repeated too quickly.
Mai 15 10:24:14 pve03 systemd[1]: Failed to start Ceph object storage daemon osd.14.
Mai 15 10:24:14 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.

# pveversion --verbose
proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-2
pve-kernel-4.15.18-14-pve: 4.15.18-38
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-9
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-42
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-37
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-51
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

How to fix this?

TIA
 
Hmm, now scrubbing errors are gone by doing nothing. Now I get:

Code:
~# ceph health detail
HEALTH_WARN 1 osds down; 44423/801015 objects misplaced (5.546%)
OSD_DOWN 1 osds down
    osd.14 (root=default,host=pve03) is down
OBJECT_MISPLACED 44423/801015 objects misplaced (5.546%)

Code:
# systemctl status ceph-osd@14.service
● ceph-osd@14.service - Ceph object storage daemon osd.14
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: activating (auto-restart) (Result: signal) since Wed 2019-05-15 19:38:15 CEST; 7s ago
  Process: 3633324 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 14 --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 3633319 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS)
 Main PID: 3633324 (code=killed, signal=ABRT)

Mai 15 19:38:15 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.

I can start the OSD but then it crashes with:

--
-- Unit pvesr.service has begun starting up.
Mai 15 19:39:01 pve03 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pvesr.service has finished starting up.
--
-- The start-up result is done.
Mai 15 19:39:05 pve03 systemd[1]: ceph-osd@14.service: Service hold-off time over, scheduling restart.
Mai 15 19:39:05 pve03 systemd[1]: Stopped Ceph object storage daemon osd.14.
-- Subject: Unit ceph-osd@14.service has finished shutting down
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit ceph-osd@14.service has finished shutting down.
Mai 15 19:39:05 pve03 systemd[1]: Starting Ceph object storage daemon osd.14...
-- Subject: Unit ceph-osd@14.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit ceph-osd@14.service has begun starting up.
Mai 15 19:39:05 pve03 systemd[1]: Started Ceph object storage daemon osd.14.
-- Subject: Unit ceph-osd@14.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit ceph-osd@14.service has finished starting up.
--
-- The start-up result is done.
Mai 15 19:39:05 pve03 ceph-osd[3634089]: starting osd.14 at - osd_data /var/lib/ceph/osd/ceph-14 /var/lib/ceph/osd/ceph-14/journal
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
Mai 15 19:39:08 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
Mai 15 19:39:09 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
Mai 15 19:39:12 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.065699 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) _write(3544): wri
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: In function 'v
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.065753 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) error (5) Input/
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.069017 7fccb4415e00 -1 /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/sr
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: -4> 2019-05-15 19:39:12.065699 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) _write(35
Mai 15 19:39:12 pve03 ceph-osd[3634089]: -3> 2019-05-15 19:39:12.065753 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) error (5
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 0> 2019-05-15 19:39:12.069017 7fccb4415e00 -1 /mnt/pve/store/tlamprecht/sources/ceph/ceph-12
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: *** Caught signal (Aborted) **
Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.072802 7fccb4415e00 -1 *** Caught signal (Aborted) **
Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 0> 2019-05-15 19:39:12.072802 7fccb4415e00 -1 *** Caught signal (Aborted) **
Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Main process exited, code=killed, status=6/ABRT
Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Unit entered failed state.
Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.

Code:
Mai 15 19:39:08 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408

sounds like I should replace the disk ... right?

Well, smartcl does not look good ...

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   091   091   006    Pre-fail  Always       -       181999400
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       15
  5 Reallocated_Sector_Ct   0x0033   097   097   036    Pre-fail  Always       -       584
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       538350167
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21203 (23 243 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       15
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       664
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   064   045    Old_age   Always       -       31 (Min/Max 29/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       2
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       11
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       71
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 25 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   076   076   000    Old_age   Offline      -       21202 (114 252 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       48702137992
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       41224941657
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0
 
Hmm, now scrubbing errors are gone by doing nothing. Now I get:

Code:
~# ceph health detail
HEALTH_WARN 1 osds down; 44423/801015 objects misplaced (5.546%)
OSD_DOWN 1 osds down
    osd.14 (root=default,host=pve03) is down
OBJECT_MISPLACED 44423/801015 objects misplaced (5.546%)

Code:
# systemctl status ceph-osd@14.service
● ceph-osd@14.service - Ceph object storage daemon osd.14
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: activating (auto-restart) (Result: signal) since Wed 2019-05-15 19:38:15 CEST; 7s ago
  Process: 3633324 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 14 --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 3633319 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS)
 Main PID: 3633324 (code=killed, signal=ABRT)

Mai 15 19:38:15 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.

I can start the OSD but then it crashes with:

--
-- Unit pvesr.service has begun starting up.
Mai 15 19:39:01 pve03 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit pvesr.service has finished starting up.
--
-- The start-up result is done.
Mai 15 19:39:05 pve03 systemd[1]: ceph-osd@14.service: Service hold-off time over, scheduling restart.
Mai 15 19:39:05 pve03 systemd[1]: Stopped Ceph object storage daemon osd.14.
-- Subject: Unit ceph-osd@14.service has finished shutting down
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit ceph-osd@14.service has finished shutting down.
Mai 15 19:39:05 pve03 systemd[1]: Starting Ceph object storage daemon osd.14...
-- Subject: Unit ceph-osd@14.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit ceph-osd@14.service has begun starting up.
Mai 15 19:39:05 pve03 systemd[1]: Started Ceph object storage daemon osd.14.
-- Subject: Unit ceph-osd@14.service has finished start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit ceph-osd@14.service has finished starting up.
--
-- The start-up result is done.
Mai 15 19:39:05 pve03 ceph-osd[3634089]: starting osd.14 at - osd_data /var/lib/ceph/osd/ceph-14 /var/lib/ceph/osd/ceph-14/journal
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
Mai 15 19:39:08 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
Mai 15 19:39:09 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
Mai 15 19:39:12 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.065699 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) _write(3544): wri
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: In function 'v
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.065753 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) error (5) Input/
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.069017 7fccb4415e00 -1 /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/sr
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: -4> 2019-05-15 19:39:12.065699 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) _write(35
Mai 15 19:39:12 pve03 ceph-osd[3634089]: -3> 2019-05-15 19:39:12.065753 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) error (5
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 0> 2019-05-15 19:39:12.069017 7fccb4415e00 -1 /mnt/pve/store/tlamprecht/sources/ceph/ceph-12
Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: *** Caught signal (Aborted) **
Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.072802 7fccb4415e00 -1 *** Caught signal (Aborted) **
Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 0> 2019-05-15 19:39:12.072802 7fccb4415e00 -1 *** Caught signal (Aborted) **
Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Main process exited, code=killed, status=6/ABRT
Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Unit entered failed state.
Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.

Code:
Mai 15 19:39:08 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408

sounds like I should replace the disk ... right?

Well, smartcl does not look good ...

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   091   091   006    Pre-fail  Always       -       181999400
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       15
  5 Reallocated_Sector_Ct   0x0033   097   097   036    Pre-fail  Always       -       584
  7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       538350167
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21203 (23 243 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       15
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       664
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   064   045    Old_age   Always       -       31 (Min/Max 29/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       2
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       11
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       71
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 25 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       16
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   076   076   000    Old_age   Offline      -       21202 (114 252 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       48702137992
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       41224941657
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0


I agree sounds like a bad disk, I am new to ceph as well, but trying to keep up with threads like this so I can learn from others experience. Let me know how you make out. Hopefully a new disk will get you up and going.