OSD keeps going down and out

Adam Koczarski · Sep 23, 2019

I have an OSD which keeps toggling to down and out. Here's what I'm seeing in the syslog. Any clue here why this would be happening?

Sep 23 02:22:26 SeaC01N02 kernel: [533115.376053] sd 0:0:16:0: [sdo] tag#262 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 23 02:22:26 SeaC01N02 kernel: [533115.376846] sd 0:0:16:0: [sdo] tag#262 Sense Key : Medium Error [current]
Sep 23 02:22:26 SeaC01N02 kernel: [533115.377630] sd 0:0:16:0: [sdo] tag#262 Add. Sense: Unrecovered read error
Sep 23 02:22:26 SeaC01N02 kernel: [533115.378294] sd 0:0:16:0: [sdo] tag#262 CDB: Read(16) 88 00 00 00 00 00 00 54 e8 80 00 00 00 80 00 00
Sep 23 02:22:26 SeaC01N02 kernel: [533115.378964] print_req_error: critical medium error, dev sdo, sector 5564592 flags 0
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: /root/sources/pve/ceph/ceph-14.2.2/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' threa
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: /root/sources/pve/ceph/ceph-14.2.2/src/os/bluestore/BlueStore.cc: 8786: FAILED ceph_assert(r == 0)
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 2019-09-23 02:22:26.599 7ff715337700 -1 bdev(0x5580918fa000 /var/lib/ceph/osd/ceph-28/block) read stalled read 0xa9c10000~10000 (direct) since 533150s, timeout is 5s
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 2019-09-23 02:22:26.599 7ff715337700 -1 bluestore(/var/lib/ceph/osd/ceph-28) _do_read bdev-read failed: (61) No data available
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x5580850d0e84]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 2: (()+0x51905c) [0x5580850d105c]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 3: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x3e6a) [0x5580856e601a]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 4: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1d3) [0x5580856e6303]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 5: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap:

bject&)+0x2cb) [0x5580855645db]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 6: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x6db) [0x5580854816fb]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 7: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x83) [0x558085320b13]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 8: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x194b) [0x55808534cf7b]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 9: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4bb) [0x55808534e09b]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 10: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1a) [0x5580855043ca]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x558085282667]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x55808585f7d4]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5580858621d0]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 14: (()+0x7fa3) [0x7ff730066fa3]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 15: (clone()+0x3f) [0x7ff72fc164cf]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: *** Caught signal (Aborted) **
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: in thread 7ff715337700 thread_name:tp_osd_tp
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 2019-09-23 02:22:26.603 7ff715337700 -1 /root/sources/pve/ceph/ceph-14.2.2/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph:
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: /root/sources/pve/ceph/ceph-14.2.2/src/os/bluestore/BlueStore.cc: 8786: FAILED ceph_assert(r == 0)
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x5580850d0e84]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 2: (()+0x51905c) [0x5580850d105c]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 3: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x3e6a) [0x5580856e601a]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 4: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1d3) [0x5580856e6303]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 5: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap:

bject&)+0x2cb) [0x5580855645db]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 6: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x6db) [0x5580854816fb]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 7: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x83) [0x558085320b13]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 8: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x194b) [0x55808534cf7b]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 9: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x4bb) [0x55808534e09b]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 10: (PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1a) [0x5580855043ca]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x558085282667]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x55808585f7d4]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5580858621d0]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 14: (()+0x7fa3) [0x7ff730066fa3]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 15: (clone()+0x3f) [0x7ff72fc164cf]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: ceph version 14.2.2 (a887fe9a5d3d97fe349065d3c1c9dbd7b8870855) nautilus (stable)
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 1: (()+0x12730) [0x7ff730071730]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 2: (gsignal()+0x10b) [0x7ff72fb547bb]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 3: (abort()+0x121) [0x7ff72fb3f535]
Sep 23 02:22:26 SeaC01N02 ceph-osd[1518564]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x5580850d0ed5]

Stoiko Ivanov · Sep 23, 2019

Adam Koczarski said:
Sep 23 02:22:26 SeaC01N02 kernel: [533115.376053] sd 0:0:16:0: [sdo] tag#262 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 23 02:22:26 SeaC01N02 kernel: [533115.376846] sd 0:0:16:0: [sdo] tag#262 Sense Key : Medium Error [current]
Sep 23 02:22:26 SeaC01N02 kernel: [533115.377630] sd 0:0:16:0: [sdo] tag#262 Add. Sense: Unrecovered read error
Sep 23 02:22:26 SeaC01N02 kernel: [533115.378294] sd 0:0:16:0: [sdo] tag#262 CDB: Read(16) 88 00 00 00 00 00 00 54 e8 80 00 00 00 80 00 00
Sep 23 02:22:26 SeaC01N02 kernel: [533115.378964] print_req_error: critical medium error, dev sdo, sector 5564592 flags 0

This looks like a disk is failing - replace sdo

I hope this helps!

Adam Koczarski · Sep 23, 2019

Just ran a smartctl and got the following. I also ran it on the other 16 drives in this node. sdm and sdn also have Raw_Read_Errors. The rest look clean. I might have a few drives which need replacing??

Stoiko Ivanov · Sep 23, 2019

hmm - the raw read error rate would not worry me this much - the 4600 current pending sectors and the offline uncorrectable count would worry me.

since the disk looks rather new (949 hours) - maybe also check the cables and power supply.

What kind of disk is this?

Adam Koczarski · Sep 23, 2019

These are 8TB spinners in (5) brand new Dell R740xd servers. I'll contact Dell and get this drive replaced. I'll also check the Raw_Read_Errors on the other spinners. I did have a couple of the other adjacent drives go down and out last week. I <think> it was the other drives with Raw_Read_Errors.

paradox55 · Sep 23, 2019

Adam Koczarski said:
These are 8TB spinners in (5) brand new Dell R740xd servers. I'll contact Dell and get this drive replaced. I'll also check the Raw_Read_Errors on the other spinners. I did have a couple of the other adjacent drives go down and out last week. I <think> it was the other drives with Raw_Read_Errors.

You should be checking every disk for errors before putting them into production. Run a test on the disk before trying to RMA - if it's fine your issue is elsewhere.

Adam Koczarski · Sep 23, 2019

paradox55 said:
You should be checking every disk for errors before putting them into production. Run a test on the disk before trying to RMA - if it's fine your issue is elsewhere.

I'm in pre-production of Proxmox/Ceph now. As for running disk tests, what would you recommend for accomplishing this?

paradox55 · Sep 23, 2019

Adam Koczarski said:
I'm in pre-production of Proxmox/Ceph now. As for running disk tests, what would you recommend for accomplishing this?

The usual. Smartctl can run long tests on offline disks and there are other options out there if you research.

They were in the cluster already so it's technically in production even if the cluster isn't. You want to test the disk for errors right after getting them so you can return them to the seller for replacement/refund during the 15-30 day RMA period. Otherwise you're stuck with going through seagate/WD unless you have a contract with the seller.

Adam Koczarski · Sep 23, 2019

Dell confirmed the failing drive via iDRAC. The replacement is on the way. Is the process for replacing a drive with an associated DB/WAL via the version 6 GUI documented somewhere?

Adam Koczarski · Sep 24, 2019

Can anyone confirm if destroying an osd via the GUI will also destroy the associated db/wal I initially created on the VNMe? Then just create the replacement osd on the new drive referencing the NVMe as before??

TIA!

Adam Koczarski · Sep 25, 2019

New drive installed. Since the osd was already down and out I destroyed it, shut down the node and replaced this non-hot swapable drive in the mid-bay of the server. Booted it back up, tested the drive and recreated the osd and associated it with the VNMe for db/wal. Worked like a charm!

Thx for the help...

Search

Search

OSD keeps going down and out

Adam Koczarski

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

Adam Koczarski

Well-Known Member

Stoiko Ivanov

Proxmox Staff Member

Adam Koczarski

Well-Known Member

paradox55

Member

Adam Koczarski

Well-Known Member

paradox55

Member

Adam Koczarski

Well-Known Member

Adam Koczarski

Well-Known Member

Adam Koczarski

Well-Known Member