INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'

tomtom13

Well-Known Member
Dec 28, 2016
62
3
48
42
Hi gentelmen:

I'm running a tiny cluster with 4 nodes. On Saturday there was a failure and one node started to behave weird. To make ceph chug away I decided to delete monitor from it and just let everything go with its way. Unfortunately I've deleted monitor from wrong node ... and the I've deleted the right on. After some patching up on the server everything is back to nearly normal

No. This is where is starts to get strange. At this point ceph refuses to work 100% and tells me that manager does not exist so I can go and whistle ... OK ... why mgr does not exist ?! I'm trying to create manager through guy but only option is to create it along with monitor I hit it and this is what I get:

Code:
Created symlink /etc/systemd/system/ceph-mon.target.wants/ceph-mon@proxmox-dl180-14bay-2.service -> /lib/systemd/system/ceph-mon@.service.
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'

after 10 minutes I gave up ... OK, something is wrong ... hmm let's try the old "update trick" so I've moved from v5.0 to most recent (of yesterday, I think it's 5.1 ?).

Anyway I'm trying to create monitor and I get the same thing ... and whole cluster is unresponsive. So I've deleted those monitors from /etc/pve/ceph.conf and it was nicely distributed to other machines ... at least corosync works ... I've had to manually remove all reminders of the created monitor (systemctl disable etc) to make it work.

Now, my state of cluster is that there is only one monitor ... strangely it follows old naming convention it's mon.0 not mon.hostname. I've tried to create managers with pveceph createmgr and it works perfectly (thanks I remember that in the past it was broken).

Now everything works, there are 3 managers and 1 monitor, rdb works, all VM's are humming away.

So I've stopped all VM's not to cause more damage and tried to see what's happening on machine that creates monitor, and this is what "ps aux | grep ceph" gives

Code:
ceph      2026  2.0  4.3 2867712 2160148 ?     Ssl  00:18  23:26 /usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup ceph
ceph      2160  1.9  4.3 2851180 2135456 ?     Ssl  00:18  22:22 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph
ceph      2292  1.9  4.2 2822092 2106240 ?     Ssl  00:18  21:57 /usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup ceph
ceph      2420  1.8  4.3 2855252 2147712 ?     Ssl  00:18  20:30 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
ceph      9779  0.2  0.0 428020 47544 ?        Ssl  00:59   3:04 /usr/bin/ceph-mgr -f --cluster ceph --id proxmox-dl180-14bay-2 --setuser ceph --setgroup ceph
root     16129  0.0  0.2 548388 103052 ?       Ss   19:10   0:00 task UPID:proxmox-dl180-14bay-2:00003F01:0067A4E1:5A89CFAE:cephcreatemon:mon.proxmox-dl180-14bay-2:root@pam:
root     16166  0.0  0.1 548388 97632 ?        S    19:10   0:00 task UPID:proxmox-dl180-14bay-2:00003F01:0067A4E1:5A89CFAE:cephcreatemon:mon.proxmox-dl180-14bay-2:root@pam:
root     16168  0.1  0.0  34692  9804 ?        S    19:10   0:00 /usr/bin/python /usr/sbin/ceph-create-keys -i proxmox-dl180-14bay-2
ceph     16169  0.4  0.1 468072 73516 ?        Ssl  19:10   0:01 /usr/bin/ceph-mon -f --cluster ceph --id proxmox-dl180-14bay-2 --setuser ceph --setgroup ceph
root     18006  0.6  0.0 566624 18564 ?        Sl   19:16   0:00 /usr/bin/rados -p rbd -m 192.168.123.241,192.168.123.242 --auth_supported cephx -n client.admin --keyring /etc/pve/priv/ceph/ceph_storage.keyring df
root     18022  0.0  0.0  12788   984 pts/0    S+   19:16   0:00 grep ceph

As a side note this installation is started as proxmox 5.0 beta so I know that there might be some issues that are a fault of that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!