[SOLVED] Problems with Ceph: osd goes down

grefabu · Sep 17, 2020

Hi,

in an setup with Ceph we've a problem:

An osd goes down immediatily and the pool go readonly:

In the log I've this entrys:

Code:

root@ph-pve006:~# grep 'osd.8' /var/log/ceph/ceph.audit.log
2020-09-17 13:05:11.773212 mon.ph-pve004 (mon.0) 3575117 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:11.775337 mon.ph-pve004 (mon.0) 3575118 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:11.781545 mon.ph-pve006 (mon.1) 3338019 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509242,v1:10.0.45.43:6817/509242]' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:11.783787 mon.ph-pve006 (mon.1) 3338020 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509242,v1:10.0.45.43:6817/509242]' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:19.608085 mon.ph-pve004 (mon.0) 3575178 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:19.611308 mon.ph-pve004 (mon.0) 3575179 : audit [INF] from='osd.8 ' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:19.616324 mon.ph-pve005 (mon.2) 3410338 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509449,v1:10.0.45.43:6817/509449]' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:19.619604 mon.ph-pve005 (mon.2) 3410339 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509449,v1:10.0.45.43:6817/509449]' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:28.323236 mon.ph-pve004 (mon.0) 3575231 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509654,v1:10.0.45.43:6817/509654]' entity='osd.8' cmd=[{"prefix": "osd crush set-device-class", "class": "ssd", "ids": ["8"]}]: dispatch
2020-09-17 13:05:28.325814 mon.ph-pve004 (mon.0) 3575232 : audit [INF] from='osd.8 [v2:10.0.45.43:6816/509654,v1:10.0.45.43:6817/509654]' entity='osd.8' cmd=[{"prefix": "osd crush create-or-move", "id": 8, "weight":1.7461, "args": ["host=ph-pve006", "root=default"]}]: dispatch
2020-09-17 13:05:44.364742 mon.ph-pve004 (mon.0) 3575288 : audit [INF] from='mgr.204125 10.0.45.41:0/1975' entity='mgr.ph-pve004' cmd=[{"prefix":"config-key set","key":"mgr/crash/crash/2020-09-17_11:05:21.190075Z_1ab3986c-6cd8-4976-ac2e-0751e967f4e9","val":"{\"os_version_id\": \"10\", \"assert_condition\": \"r == 0\", \"utsname_release\": \"5.3.18-1-pve\", \"os_name\": \"Debian GNU/Linux 10 (buster)\", \"entity_name\": \"osd.8\", \"assert_file\": \"/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc\", \"timestamp\": \"2020-09-17 11:05:21.190075Z\", \"process_name\": \"ceph-osd\", \"utsname_machine\": \"x86_64\", \"assert_line\": 9152, \"utsname_sysname\": \"Linux\", \"os_version\": \"10 (buster)\", \"os_id\": \"10\", \"assert_thread_name\": \"tp_osd_tp\", \"utsname_version\": \"#1 SMP PVE 5.3.18-1 (Wed, 05 Feb 2020 11:49:10 +0100)\", \"backtrace\": [\"(()+0x12730) [0x7f3b108a6730]\", \"(gsignal()+0x10b) [0x7f3b103897bb]\", \"(abort()+0x121) [0x7f3b10374535]\", \"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x560ece4e7ad7]\", \"(()+0x518c5e) [0x560ece4e7c5e]\", \"(BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x39e7) [0x560ecea527f7]\", \"(BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b5) [0x560ecea56595]\", \"(ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, unsigned long, unsigned int, ceph::buffer::v14_2_0::list*)+0xa3) [0x560ece8c59a3]\", \"(PrimaryLogPG::do_sparse_read(PrimaryLogPG::OpContext*, OSDOp&)+0x5bd) [0x560ece77048d]\", \"(PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x7c77) [0x560ece787127]\", \"(PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x14f) [0x560ece79251f]\", \"(PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x385) [0x560ece792cb5]\", \"(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3101) [0x560ece797371]\", \"(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd77) [0x560ece799777]\", \"(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x397) [0x560ece5c8b77]\", \"(PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x560ece86aa52]\", \"(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x560ece5e3d17]\", \"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x560eceba2864]\", \"(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x560eceba5270]\", \"(()+0x7fa3) [0x7f3b1089bfa3]\", \"(clone()+0x3f) [0x7f3b1044b4cf]\"], \"utsname_hostname\": \"ph-pve006.peiker-holding.de\", \"assert_msg\": \"/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7f3af390f700 time 2020-09-17 13:05:21.166856\\n/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: 9152: FAILED ceph_assert(r == 0)\\n\", \"crash_id\": \"2020-09-17_11:05:21.190075Z_1ab3986c-6cd8-4976-ac2e-0751e967f4e9\", \"assert_func\": \"int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)\", \"ceph_version\": \"14.2.6\"}"}]: dispatch

This occure the scond time this week, on Monday night there where two osd on an other node that goes down.
After kill all VMs and wait some time we could reactivate the osd's.

Now two days after the osd on the other node is down.

Some ideas for a reason or solution?

Bye

Gregor

Alwin · Sep 17, 2020

grefabu said:
In the log I've this entrys:

The audit log doesn't show what was going on with the OSD or the system. Check the corresponding OSD log and syslog.

grefabu said:
Now two days after the osd on the other node is down.

Could be a network related issue.

grefabu said:
Some ideas for a reason or solution?

Update to the latest version.

grefabu · Sep 17, 2020

Alwin said:
Update to the latest version.

Yes, and buy a subscription, I know...

But neverless here an grep of the syslog of the node

Code:

Sep 17 13:05:00 ph-pve006 systemd[1]: Starting Proxmox VE replication runner...
Sep 17 13:05:00 ph-pve006 systemd[1]: pvesr.service: Succeeded.
Sep 17 13:05:00 ph-pve006 systemd[1]: Started Proxmox VE replication runner.
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433628] sd 1:0:2:0: [sdc] tag#593 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433634] sd 1:0:2:0: [sdc] tag#593 Sense Key : Medium Error [current]
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433638] sd 1:0:2:0: [sdc] tag#593 Add. Sense: Unrecovered read error
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433643] sd 1:0:2:0: [sdc] tag#593 CDB: Read(10) 28 00 16 06 93 c0 00 00 80 00
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433648] blk_update_request: critical medium error, dev sdc, sector 369529864 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
Sep 17 13:05:00 ph-pve006 ceph-osd[1803]: 2020-09-17 13:05:00.938 7fb3917e8700 -1 bluestore(/var/lib/ceph/osd/ceph-8) _do_read bdev-read failed: (61) No data available
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: In function 'int BlueStore::_do_read(BlueStore::Collection*, BlueStore::OnodeRef, uint64_t, size_t, ceph::bufferlist&, uint32_t, uint64_t)' thread 7fb3917e8700 time 2020-09-17 13:05:01.011064
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]: /mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/os/bluestore/BlueStore.cc: 9152: FAILED ceph_assert(r == 0)
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  ceph version 14.2.6 (ba51347bdbe28c7c0e2e9172fa2983111137bb60) nautilus (stable)
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x55aa840d8a86]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  2: (()+0x518c5e) [0x55aa840d8c5e]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  3: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int, unsigned long)+0x39e7) [0x55aa846437f7]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  4: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v14_2_0::list&, unsigned int)+0x1b5) [0x55aa84647595]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  5: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned long, unsigned long, unsigned int, ceph::buffer::v14_2_0::list*)+0xa3) [0x55aa844b69a3]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  6: (PrimaryLogPG::do_sparse_read(PrimaryLogPG::OpContext*, OSDOp&)+0x5bd) [0x55aa8436148d]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  7: (PrimaryLogPG::do_osd_ops(PrimaryLogPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x7c77) [0x55aa84378127]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  8: (PrimaryLogPG::prepare_transaction(PrimaryLogPG::OpContext*)+0x14f) [0x55aa8438351f]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  9: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x385) [0x55aa84383cb5]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  10: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3101) [0x55aa84388371]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd77) [0x55aa8438a777]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x397) [0x55aa841b9b77]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  13: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0x55aa8445ba52]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x7d7) [0x55aa841d4d17]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b4) [0x55aa84793864]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  16: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55aa84796270]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  17: (()+0x7fa3) [0x7fb3ae774fa3]
Sep 17 13:05:01 ph-pve006 ceph-osd[1803]:  18: (clone()+0x3f) [0x7fb3ae3244cf]

grefabu · Sep 17, 2020

I've found an strange interesting thing.

There is an mismatch with the mounted devices:

I've four ssd for Ceph, plus one for a local LVM.

Yet one of the ssd I difine to normaly to the Ceph is shown al LVM. Here a Screenshot:

The real one for the LVM I take out in hope it allocated the right one to the ceph once more.

grefabu · Sep 17, 2020

Where will I find the 'fstab' ?

grefabu · Sep 17, 2020

Here a additional screenshot:

The unmounted ssd should normaly on LVM an the here named sdc should be in the ceph as osd.4,...

Alwin · Sep 18, 2020

grefabu said:
Sep 17 13:05:00 ph-pve006 kernel: [18390941.433648] blk_update_request: critical medium error, dev sdc, sector 369529864 op 0x0READ) flags 0x0 phys_seg 7 prio class 0

Well, this here seems to be a more concerning issue.

grefabu · Sep 21, 2020

For information: the cause was an overprovisionising of RAM, the CephFS seem to have an OOM for write operation.

grefabu · Sep 23, 2020

In the moment I've the status, that 4 pgs are recovery_unfound.

We try to idendicate the objects, that are harmed of the unfounded pgs. How could we manage it?
We try an repair/revoke of the pgs, but the pool still isn't readable. Even with an min_size = 1 (we set it back to 2 emediatly)

I think we could leave a image of the harmed VM, but I like to identificate them,...

Alwin · Sep 23, 2020

grefabu said:
We try to idendicate the objects, that are harmed of the unfounded pgs. How could we manage it?

The pg dump will tell you which PGs they are. Then you can query the PG with the rados tool and see what objects are inside. Each RBD image has a prefix. This prefix is also visible in the object name. With that you know which image fits to what object.

grefabu · Sep 24, 2020

The problem is solved. Today I tried a last time before destroy the pool to revoke the unfounded pgs. And now they are 'cleaned'. Don't no why the first time it doesn't work,...

[SOLVED] Problems with Ceph: osd goes down

grefabu

Well-Known Member

Alwin

Proxmox Retired Staff

grefabu

Well-Known Member

grefabu

Well-Known Member

grefabu

Well-Known Member

grefabu

Well-Known Member

Alwin

Proxmox Retired Staff

grefabu

Well-Known Member

grefabu

Well-Known Member

Alwin

Proxmox Retired Staff

grefabu

Well-Known Member

We value your privacy