[SOLVED] PVE 5 and Ceph luminous: unable to create monitor

Waschbüsch · Sep 3, 2017

I saw some threads on something similar but none of them looked like the issue I have. Apologies if I overlooked a thread already covering this.
Anyway:

I cannot create a new monitor on my cluster. Or rather, it gets created but never gets quorum.

I tried doing this via:

- Web UI
- pveceph via command line
- manually as described here: http://docs.ceph.com/docs/luminous/rados/operations/add-or-rm-mons/

I always get the following situation:

pveceph or ui will show:

# pveceph createmon -id 0 said:
Created symlink /etc/systemd/system/ceph-mon.target.wants/ceph-mon@0.service -> /lib/systemd/system/ceph-mon@.service.
admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'synchronizing'

without ever ending.

And the log file for the monitor will have something similar to this:

# tail -f /var/log/ceph/ceph-mon.0.log said:
2017-09-03 09:28:44.269377 7f5cdebe4f80 4 rocksdb: [/home/builder/source/ceph-12.1.2/src/rocksdb/db/version_set.cc:2395] Creating manifest 26

2017-09-03 09:28:44.270652 7f5cdebe4f80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1504430924270649, "job": 1, "event": "recovery_finished"}
2017-09-03 09:28:44.273735 7f5cdebe4f80 4 rocksdb: [/home/builder/source/ceph-12.1.2/src/rocksdb/db/db_impl_open.cc:1063] DB pointer 0x5645b299a000
2017-09-03 09:28:44.275134 7f5cdebe4f80 0 starting mon.0 rank 0 at public addr 192.168.1.1:6789/0 at bind addr 192.168.1.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 20b519f3-4988-4ac5-ac3c-7cd352431ebb
2017-09-03 09:28:44.275294 7f5cdebe4f80 0 starting mon.0 rank 0 at 192.168.1.1:6789/0 mon_data /var/lib/ceph/mon/ceph-0 fsid 20b519f3-4988-4ac5-ac3c-7cd352431ebb
2017-09-03 09:28:44.722781 7f5cdebe4f80 0 mon.0@-1(probing) e1 my rank is now 0 (was -1)
2017-09-03 09:28:44.725226 7f5cd17de700 0 -- 192.168.1.1:6789/0 >> 192.168.1.2:6789/0 conn(0x5645b55ad000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13618560 cs=1 l=0).process missed message? skipped from seq 0 to 124134995
2017-09-03 09:28:44.725381 7f5cd47e4700 0 mon.0@0(probing) e64 my rank is now -1 (was 0)
2017-09-03 09:28:44.727080 7f5cd17de700 0 -- 192.168.1.1:6789/0 >> 192.168.1.2:6789/0 conn(0x5645b5dd9000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13618561 cs=1 l=0).process missed message? skipped from seq 0 to 313340281

My main issue right now is that I only have one monitor up, which - testcluster or not - I want to change asap. ;-)
Every node (4 of them) can reach the others and port 6789 is not blocked by the firewall.

Anyone any ideas how to fix this?

Version and config info:

# pveversion --verbose said:
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
openvswitch-switch: 2.7.0-2
ceph: 12.1.2-pve1

# cat /etc/pve/ceph.conf said:
[global]
auth client required = none
auth cluster required = none
auth service required = none
auth supported = cephx
cluster network = 192.168.2.0/24
fsid = 20b519f3-4988-4ac5-ac3c-7cd352431ebb
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 10240
public network = 192.168.1.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
host = srv02
mon addr = 192.168.1.2:6789

dcsapak · Sep 4, 2017

what does

Code:

ceph status

say?

Waschbüsch · Sep 4, 2017

Well, if I try to add a monitor, all I ever get is a timeout.
Otherwise:

Code:

# ceph status
  cluster:
    id:     20b519f3-4988-4ac5-ac3c-7cd352431ebb
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum 1
    mgr: 1(active), standbys: 0, 2, 3
    osd: 12 osds: 12 up, 12 in
 
  data:
    pools:   4 pools, 2048 pgs
    objects: 1500k objects, 2396 GB
    usage:   7455 GB used, 20257 GB / 27712 GB avail
    pgs:     2048 active+clean
 
  io:
    client:   955 kB/s rd, 5911 kB/s wr, 214 op/s rd, 218 op/s wr

Waschbüsch · Sep 5, 2017

Fwiw, I upgraded the ceph packages to 12.1.4 via test repo but the situation remains unchanged otherwise.

fabian · Sep 5, 2017

what does the log of mon.1 say?

Waschbüsch · Sep 5, 2017

during the attempted creation via

Code:

pveceph createmon

I get:

Code:

]2017-09-05 06:34:48.283245 7f1a35f4b700  0 mon.1@0(leader) e82 handle_command mon_command({"format":"plain","prefix":"mon getmap"} v 0) v1
2017-09-05 06:34:48.283295 7f1a35f4b700  0 log_channel(audit) log [DBG] : from='client.? 192.168.1.1:0/3896201511' entity='client.admin' cmd=[{"format":"plain","prefix":"mon getmap"}]: dispatch
2017-09-05 06:34:48.667232 7f1a35f4b700  1 mon.1@0(leader) e82  adding peer 192.168.1.1:6789/0 to list of hints
2017-09-05 06:34:48.667853 7f1a35f4b700  1 mon.1@0(leader) e82  adding peer 192.168.1.1:6789/0 to list of hints
2017-09-05 06:34:49.959705 7f1a32744700  1 mon.1@0(leader).log v49898403 check_sub sending message to client.76084147 192.168.1.2:0/632939495 with 0 entries (version 49898403)
2017-09-05 06:35:40.240509 7f1a38750700  0 mon.1@0(leader).data_health(13721) update_stats avail 59% total 7935 MB, used 2785 MB, avail 4724 MB

I probably need to increase debug level? Which settings do you recommend?

fabian · Sep 5, 2017

http://docs.ceph.com/docs/master/ra...g-and-debug/#subsystem-log-and-debug-settings

you could also try dumping the mon status of the failing monitor using the admin socket:

Code:

ceph --admin-daemon /var/run/ceph/ceph-mon.0.asok mon_status

and compare it to the one from mon.1 on the other node.

Waschbüsch · Sep 5, 2017

mon status:

Code:

{
    "name": "0",
    "rank": -1,
    "state": "synchronizing",
    "election_epoch": 0,
    "quorum": [],
    "features": {
        "required_con": "144115188077969408",
        "required_mon": [
            "kraken",
            "luminous"
        ],
        "quorum_con": "0",
        "quorum_mon": []
    },
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [],
    "sync": {
        "sync_provider": "mon.0 192.168.1.2:6789/0",
        "sync_cookie": 230267289601,
        "sync_start_version": 103105711
    },
    "monmap": {
        "epoch": 83,
        "fsid": "20b519f3-4988-4ac5-ac3c-7cd352431ebb",
        "modified": "2017-09-03 10:34:27.803779",
        "created": "2015-07-15 17:40:37.481052",
        "features": {
            "persistent": [
                "kraken",
                "luminous"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "1",
                "addr": "192.168.1.2:6789/0",
                "public_addr": "192.168.1.2:6789/0"
            }
        ]
    },
    "feature_map": {
        "mon": {
            "group": {
                "features": 2305244844532236283,
                "release": "luminous",
                "num": 2
            }
        },
        "client": {
            "group": {
                "features": 2305244844532236283,
                "release": "luminous",
                "num": 4
            }
        }
    }
}

Waschbüsch · Sep 5, 2017

logfiles during a new attempt attached. debug level for mon was set to 20/5.

fabian · Sep 5, 2017

does the existing monitor have a big mon store? if so, try increasing the timeout for the systemd unit, maybe it is able to successfully sync then:

Code:

mkdir /etc/systemd/system/ceph-mon@.service.d
echo -e '[Service]\nTimeoutStartSec=infinity\n' > /etc/systemd/system/ceph-mon@.service.d/wait-longer.conf
systemctl daemon-reload

removing the wait-longer.conf file reverts to the default timeout again. you can verify whether it is enabled with "systemctl cat ceph-mon@"

Waschbüsch · Sep 5, 2017

The setting (applied on both existing node and node to be added) did not change the behavior. :-(

Btw, the store does not look overly large to me. ;-)

Code:

# du -sh /var/lib/ceph/mon/ceph-1
98M    ceph-1

fabian · Sep 5, 2017

could you provide the journal output around the time of the failed creation as well?

Waschbüsch · Sep 5, 2017

I am sorry. What journal? Do you mean the log file inside the monitor dir?

fabian · Sep 5, 2017

Code:

journalctl -u "ceph*" -u "pve*" --since "-10m"

(adapt the time if necessary)

Waschbüsch · Sep 5, 2017

I am not very used to systemd - but I guess journalctl should have been obvious to me. Sorry.
Attached are the journal of both srv01 (mon.0) and srv02 (mon.1) during an attempt, as well as mon.0's log during the time.

fabian · Sep 5, 2017

okay, and once more:

Code:

journalctl --since "11:00" --until "11:01"

Waschbüsch · Sep 5, 2017

Code:

-- Logs begin at Sun 2017-09-03 03:23:10 UTC, end at Tue 2017-09-05 11:29:31 UTC. --
Sep 05 11:00:00 srv01 systemd[1]: Starting Proxmox VE replication runner...
Sep 05 11:00:01 srv01 systemd[1]: Started Proxmox VE replication runner.
Sep 05 11:00:14 srv01 pveceph[482]: <root@pam> starting task UPID:srv01:000001E4:013187AF:59AE83BE:ceph
Sep 05 11:00:15 srv01 systemd[1]: Started Ceph cluster monitor daemon.
Sep 05 11:00:15 srv01 systemd[1]: Reloading.
Sep 05 11:00:15 srv01 systemd[1]: apt-daily.timer: Adding 3h 52min 8.583306s random time.
Sep 05 11:00:15 srv01 systemd[1]: apt-daily-upgrade.timer: Adding 24min 8.778454s random time.
Sep 05 11:00:42 srv01 pveceph[484]: received interrupt
Sep 05 11:00:42 srv01 pveceph[520]: command 'ceph-create-keys -i 0' failed: received interrupt
Sep 05 11:00:42 srv01 pveceph[482]: <root@pam> end task UPID:srv01:000001E4:013187AF:59AE83BE:cephcreat
Sep 05 11:00:46 srv01 systemd[1]: Stopping Ceph cluster monitor daemon...
Sep 05 11:00:46 srv01 ceph-mon[523]: 2017-09-05 11:00:46.599548 7ffafb697700 -1 received  signal: Termi
Sep 05 11:00:46 srv01 ceph-mon[523]: 2017-09-05 11:00:46.599572 7ffafb697700 -1 mon.0@-1(synchronizing)
Sep 05 11:00:46 srv01 systemd[1]: Stopped Ceph cluster monitor daemon.

11:00:42 was when I stopped the attempt.
And it seems it was still trying to do 'ceph-create-keys -i 0' when I did.
The weird thing is that other procedures (which I guess also need to create keys) such as 'pveceph createmgr' work without issues.

fabian · Sep 5, 2017

sorry if this was not clear - please keep the attempt to create running at least for a couple of minutes. the new monitor needs to sync its store with the existing one, depending on your network and the size of the store this can take a bit..

also, you can try with the 12.2.0 packages available in the test repo

Waschbüsch · Sep 5, 2017

When I added monitors before it was near-instant, but I'll give it more time.
As for 12.2.0 - I'll update and let you know if it makes a difference

Waschbüsch · Sep 5, 2017

Twice, I waited for the pveceph command to timeout on its own:

Code:

INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
INFO:ceph-create-keys:ceph-mon is not in quorum: u'probing'
ceph-mon was not able to join quorum within 0 seconds
TASK ERROR: command 'ceph-create-keys -i 0' failed: exit code 1

Then I followed the surviving mon's log during an attempt and found this repeatedly:

Code:

2017-09-05 13:40:52.783440 7f9362301700  1 mon.1@1(peon).paxos(paxos recovering c 103116574..103117208) lease_timeout -- calling new election
2017-09-05 13:40:52.783899 7f935fafc700  1 mon.1@1(probing) e93 handle_timecheck drop unexpected msg

Apparently continuous re-elections cause the synching of a new mon take way longer.

Anyway, why the timeouts? In the end, I had clock skew. I never thought about it and

Code:

ceph status

could not tell me as long as I only had the one surviving monitor.

Thanks, Fabian, for your patience in helping me with this!

[SOLVED] PVE 5 and Ceph luminous: unable to create monitor

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Attachments

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Attachments

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

We value your privacy