I saw some threads on something similar but none of them looked like the issue I have. Apologies if I overlooked a thread already covering this.
Anyway:
I cannot create a new monitor on my cluster. Or rather, it gets created but never gets quorum.
I tried doing this via:
- Web UI
- pveceph via command line
- manually as described here: http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-mons/
I always get the following situation:
pveceph or ui will show:
without ever ending.
And the log file for the monitor will have something similar to this:
My main issue right now is that I only have one monitor up, which - testcluster or not - I want to change asap. ;-)
Every node (4 of them) can reach the others and port 6789 is not blocked by the firewall.
Anyone any ideas how to fix this?
Version and config info:
Anyway:
I cannot create a new monitor on my cluster. Or rather, it gets created but never gets quorum.
I tried doing this via:
- Web UI
- pveceph via command line
- manually as described here: http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-mons/
I always get the following situation:
pveceph or ui will show:
# pveceph createmon -id 0 said:Created symlink /etc/systemd/system/ceph-mon.target.wants/ceph-mon@0.service -> /lib/systemd/system/ceph-mon@.service.
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
without ever ending.
And the log file for the monitor will have something similar to this:
# tail -f /var/log/ceph/ceph-mon.0.log said:2017-09-03 09:28:44.269377 7f5cdebe4f80 4 rocksdb: [/home/builder/source/ceph-12.1.2/src/rocksdb/db/version_set.cc:2395] Creating manifest 26
2017-09-03 09:28:44.270652 7f5cdebe4f80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504430924270649, "job": 1, "event": "recovery_finished"}
2017-09-03 09:28:44.273735 7f5cdebe4f80 4 rocksdb: [/home/builder/source/ceph-12.1.2/src/rocksdb/db/db_impl_open.cc:1063] DB pointer 0x5645b299a000
2017-09-03 09:28:44.275134 7f5cdebe4f80 0 starting mon.0 rank 0 at public addr 192.168.1.1:6789/0 at bind addr 192.168.1.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 20b519f3-4988-4ac5-ac3c-7cd352431ebb
2017-09-03 09:28:44.275294 7f5cdebe4f80 0 starting mon.0 rank 0 at 192.168.1.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 20b519f3-4988-4ac5-ac3c-7cd352431ebb
2017-09-03 09:28:44.722781 7f5cdebe4f80 0 mon.0@-1(probing) e1 my rank is now 0 (was -1)
2017-09-03 09:28:44.725226 7f5cd17de700 0 -- 192.168.1.1:6789/0 >> 192.168.1.2:6789/0 conn(0x5645b55ad000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13618560 cs=1 l=0).process missed message? skipped from seq 0 to 124134995
2017-09-03 09:28:44.725381 7f5cd47e4700 0 mon.0@0(probing) e64 my rank is now -1 (was 0)
2017-09-03 09:28:44.727080 7f5cd17de700 0 -- 192.168.1.1:6789/0 >> 192.168.1.2:6789/0 conn(0x5645b5dd9000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13618561 cs=1 l=0).process missed message? skipped from seq 0 to 313340281
My main issue right now is that I only have one monitor up, which - testcluster or not - I want to change asap. ;-)
Every node (4 of them) can reach the others and port 6789 is not blocked by the firewall.
Anyone any ideas how to fix this?
Version and config info:
# pveversion --verbose said:proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
openvswitch-switch: 2.7.0-2
ceph: 12.1.2-pve1
# cat /etc/pve/ceph.conf said:[global]
auth client required = none
auth cluster required = none
auth service required = none
auth supported = cephx
cluster network = 192.168.2.0/24
fsid = 20b519f3-4988-4ac5-ac3c-7cd352431ebb
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 10240
public network = 192.168.1.0/24
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.1]
host = srv02
mon addr = 192.168.1.2:6789