[SOLVED] Problems with Ceph: osd goes down

grefabu

Well-Known Member
May 23, 2018
233
14
58
50
Hi,

in an setup with Ceph we've a problem:

An osd goes down immediatily and the pool go readonly:

In the log I've this entrys:

Code:
root@ph-pve006:~# grep 'osd.8' /var/log/ceph/ceph.audit.log
2020-09-17 13:05:11.773212 mon.ph-pve004 (mon.0) 3575117 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:11.775337 mon.ph-pve004 (mon.0) 3575118 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:11.781545 mon.ph-pve006 (mon.1) 3338019 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509242,v1:10.0.45.43:6817/509242]' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:11.783787 mon.ph-pve006 (mon.1) 3338020 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509242,v1:10.0.45.43:6817/509242]' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:19.608085 mon.ph-pve004 (mon.0) 3575178 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:19.611308 mon.ph-pve004 (mon.0) 3575179 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:19.616324 mon.ph-pve005 (mon.2) 3410338 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509449,v1:10.0.45.43:6817/509449]' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:19.619604 mon.ph-pve005 (mon.2) 3410339 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509449,v1:10.0.45.43:6817/509449]' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:28.323236 mon.ph-pve004 (mon.0) 3575231 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509654,v1:10.0.45.43:6817/509654]' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:28.325814 mon.ph-pve004 (mon.0) 3575232 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509654,v1:10.0.45.43:6817/509654]' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:44.364742 mon.ph-pve004 (mon.0) 3575288 : audit [INF] from='mgr.204125 10.0.45.41:0/1975' entity='mgr.ph-pve004' cmd=[{"prefix":"config-key set","key":"mgr/crash/crash/2020-09-17_11:05:21.190075Z_1ab3986c-6cd8-4976-ac2e-0751e967f4e9","val":"{\"os_version_id\": \"10\", \"assert_condition\": \"r == 0\", \"utsname_release\": \"5.3.18-1-pve\", \"os_name\": \"Debian GNU/Linux 10 (buster)\", \"entity_name\": \"osd.8\", \"assert_file\": \"/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc\", \"timestamp\": \"2020-09-17 11:05:21.190075Z\", \"process_name\": \"ceph-osd\", \"utsname_machine\": \"x86_64\", \"assert_line\": 9152, \"utsname_sysname\": \"Linux\", \"os_version\": \"10 (buster)\", \"os_id\": \"10\", \"assert_thread_name\": \"tp_osd_tp\", \"utsname_version\": \"#1 SMP PVE 5.3.18-1 (Wed, 05 Feb 2020 11:49:10 +0100)\", \"backtrace\": [\"(()+0x12730) [0x7f3b108a6730]\", \"(gsignal()+0x10b) [0x7f3b103897bb]\", \"(abort()+0x121) [0x7f3b10374535]\", \"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x560ece4e7ad7]\", \"(()+0x518c5e) [0x560ece4e7c5e]\", \"(BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x39e7) [0x560ecea527f7]\", \"(BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b5) [0x560ecea56595]\", \"(ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, unsigned long, unsigned int, ceph::buffer::v14_2_0::list*)+0xa3) [0x560ece8c59a3]\", \"(PrimaryLogPG::do_sparse_read(PrimaryLogPG::OpContext*, OSDOp&)+0x5bd) [0x560ece77048d]\", \"(PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x7c77) [0x560ece787127]\", \"(PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x14f) [0x560ece79251f]\", \"(PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x385) [0x560ece792cb5]\", \"(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3101) [0x560ece797371]\", \"(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd77) [0x560ece799777]\", \"(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x397) [0x560ece5c8b77]\", \"(PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x560ece86aa52]\", \"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x560ece5e3d17]\", \"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x560eceba2864]\", \"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560eceba5270]\", \"(()+0x7fa3) [0x7f3b1089bfa3]\", \"(clone()+0x3f) [0x7f3b1044b4cf]\"], \"utsname_hostname\": \"ph-pve006.peiker-holding.de\", \"assert_msg\": \"/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7f3af390f700 time 2020-09-17 13:05:21.166856\\n/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: 9152: FAILED ceph_assert(r == 0)\\n\", \"crash_id\": \"2020-09-17_11:05:21.190075Z_1ab3986c-6cd8-4976-ac2e-0751e967f4e9\", \"assert_func\": \"int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)\", \"ceph_version\": \"14.2.6\"}"}]: dispatch

This occure the scond time this week, on Monday night there where two osd on an other node that goes down.
After kill all VMs and wait some time we could reactivate the osd's.

Now two days after the osd on the other node is down.

Some ideas for a reason or solution?

Bye

Gregor
 
Last edited:
Update to the latest version.
Yes, and buy a subscription, I know...

But neverless here an grep of the syslog of the node

Code:
Sep 17 13:05:00 ph-pve006 systemd[1]: Starting Proxmox VE replication runner...
Sep 17 13:05:00 ph-pve006 systemd[1]: pvesr.service: Succeeded.
Sep 17 13:05:00 ph-pve006 systemd[1]: Started Proxmox VE replication runner.
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433628] sd 1:0:2:0: [sdc] tag#593 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433634] sd 1:0:2:0: [sdc] tag#593 Sense Key : Medium Error [current]
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433638] sd 1:0:2:0: [sdc] tag#593 Add. Sense: Unrecovered read error
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433643] sd 1:0:2:0: [sdc] tag#593 CDB: Read(10) 28 00 16 06 93 c0 00 00 80 00
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433648] blk_update_request: critical medium error, dev sdc, sector 369529864 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
Sep 17 13:05:00 ph-pve006 ceph-osd[1803]: 2020-09-17 13:05:00.938 7fb3917e8700 -1 bluestore(/var/lib/ceph/osd/ceph-8) _do_read bdev-read failed: (61) No data available
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7fb3917e8700 time 2020-09-17 13:05:01.011064
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: 9152: FAILED ceph_assert(r == 0)
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55aa840d8a86]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  2: (()+0x518c5e) [0x55aa840d8c5e]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  3: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x39e7) [0x55aa846437f7]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  4: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b5) [0x55aa84647595]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  5: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, unsigned long, unsigned int, ceph::buffer::v14_2_0::list*)+0xa3) [0x55aa844b69a3]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  6: (PrimaryLogPG::do_sparse_read(PrimaryLogPG::OpContext*, OSDOp&)+0x5bd) [0x55aa8436148d]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  7: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x7c77) [0x55aa84378127]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  8: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x14f) [0x55aa8438351f]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  9: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x385) [0x55aa84383cb5]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  10: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3101) [0x55aa84388371]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd77) [0x55aa8438a777]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x397) [0x55aa841b9b77]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  13: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55aa8445ba52]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x55aa841d4d17]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x55aa84793864]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55aa84796270]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  17: (()+0x7fa3) [0x7fb3ae774fa3]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  18: (clone()+0x3f) [0x7fb3ae3244cf]
 
I've found an strange interesting thing.

There is an mismatch with the mounted devices:

I've four ssd for Ceph, plus one for a local LVM.

Yet one of the ssd I difine to normaly to the Ceph is shown al LVM. Here a Screenshot:
1600360404455.png

The real one for the LVM I take out in hope it allocated the right one to the ceph once more.
 
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433648] blk_update_request: critical medium error, dev sdc, sector 369529864 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
Well, this here seems to be a more concerning issue.
 
In the moment I've the status, that 4 pgs are recovery_unfound.

We try to idendicate the objects, that are harmed of the unfounded pgs. How could we manage it?
We try an repair/revoke of the pgs, but the pool still isn't readable. Even with an min_size = 1 (we set it back to 2 emediatly)

I think we could leave a image of the harmed VM, but I like to identificate them,...
 
We try to idendicate the objects, that are harmed of the unfounded pgs. How could we manage it?
The pg dump will tell you which PGs they are. Then you can query the PG with the rados tool and see what objects are inside. Each RBD image has a prefix. This prefix is also visible in the object name. With that you know which image fits to what object.
 
The problem is solved. Today I tried a last time before destroy the pool to revoke the unfounded pgs. And now they are 'cleaned'. Don't no why the first time it doesn't work,...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!