monitors crash after upgrade Proxmox 4.4 to Proxmox 5.1

ignaqui

Active Member
Jan 12, 2017
13
6
43
46
Hello,

After successful upgrade Proxmox 4.4 to Proxmox 5.1 we decided to go with BlueStore instead of FileStore. As soon as I deleted one FileStore OSD and created BlueStore instead, ceph became unresponsive. It appeared that 2 of 3 monitors crashed, and failed to start.

Here is the snippet of the journal (full journal dump attached):

Nov 30 08:57:56 tw-brk-prx-01 ceph-mon[17814]: /home/builder/source/ceph-12.2.1/src/crush/CrushWrapper.cc: In function 'int CrushWrapper::insert_item(CephContext*, int, float, std::__cxx11::string, const std::map<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >&)' thread 7f33e5bf9700 time 2017-11-30 08:57:56.530888
Nov 30 08:57:56 tw-brk-prx-01 ceph-mon[17814]: /home/builder/source/ceph-12.2.1/src/crush/CrushWrapper.cc: 963: FAILED assert(!r)
Nov 30 08:57:56 tw-brk-prx-01 ceph-mon[17814]: ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)

Please advise.

Thanks,
ignaqui
 

Attachments

I tried complete reboot (with VMs down) - this didn't help. The only thing which kind of worked was to add two small machines as temporary monitors to create quorum. However I need to resolve root cause asap :(
 
It appeared the particular osd is broken:
# devices
device 0 osd.0
...
device 5 osd.5
device 6 device6
device 7 osd.7
...
device 23 osd.23

when I deleted that OSD - everything is recovered.
but when I tried to add it back - the issue came back.

Any thoughts?

I saw similar posts around, but I was unable to fix the issue:
https://stackoverflow.com/questions/39301452/how-to-abandon-ceph-pgs-that-are-stuck-in-incomplete
https://medium.com/@george.shuklin/osd-does-not-exist-5f031c8aab41
https://stackoverflow.com/questions...leftover-osd-devices-in-the-ceph-crush-map-re