Reinstall/remove dead monitor

EvilBox

Member
May 10, 2019
12
0
21
29
Hi community!
I have updated ceph 12 to 14 according to the following document - https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus but I have a problem with monitor

Bash:
root@pve03:/etc/ceph# pveceph createmon
monitor 'pve03' already exists

Bash:
root@pve03:/etc/ceph# pveceph destroymon pve03
no such monitor id 'pve03'

Bash:
root@pve03:/etc/ceph# cat /etc/pve/ceph.conf
[global]
    auth client required = cephx
    auth cluster required = cephx
    auth service required = cephx
    cluster network = 192.168.0.0/24
    fsid = e1ee6b28-xxxx-xxxx-xxxx-11d1f6efab9b
    mon allow pool delete = true
    osd journal size = 5120
    osd pool default min size = 2
    osd pool default size = 3
    public network = 192.168.0.0/24
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.pve02]
    host = pve02
    mon addr = 192.168.0.57

Bash:
root@pve03:/etc/ceph# ll /var/lib/ceph/mon/
total 0

Bash:
root@pve03:/etc/ceph# ps aux | grep ceph
root 861026 0.0 0.0 17308 9120 ? Ss 19:03 0:00 /usr/bin/python2.7 /usr/bin/ceph-crash
ceph 863641 0.0 0.2 492588 169916 ? Ssl 19:08 0:04 /usr/bin/ceph-mgr -f --cluster ceph --id pve03 --setuser ceph --setgroup ceph
root 890587 0.0 0.0 6072 892 pts/0 S+ 20:43 0:00 grep ceph

Bash:
root@pve03:~# ceph mon dump
dumped monmap epoch 9
epoch 9
fsid e1ee6b28-xxxx-xxxx-xxxx-11d1f6efab9b
last_changed 2019-10-05 19:07:48.598830
created 2019-05-11 01:28:04.534419
min_mon_release 14 (nautilus)
0: [v2:192.168.0.57:3300/0,v1:192.168.0.57:6789/0] mon.pve02

Bash:
root@pve03:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-4.15: 5.4-9
pve-kernel-5.0.21-2-pve: 5.0.21-6
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2

Code:
syslog:
Oct 05 19:57:47 pve03 systemd[1]: Started Ceph cluster monitor daemon.
Oct 05 19:57:47 pve03 ceph-mon[875279]: 2019-10-05 19:57:47.506 7ffb1227f440 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve03' does not exist: have you run 'mkfs'?
Oct 05 19:57:47 pve03 systemd[1]: ceph-mon@pve03.service: Main process exited, code=exited, status=1/FAILURE
Oct 05 19:57:47 pve03 systemd[1]: ceph-mon@pve03.service: Failed with result 'exit-code'.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Service RestartSec=10s expired, scheduling restart.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Scheduled restart job, restart counter is at 4.
Oct 05 19:57:57 pve03 systemd[1]: Stopped Ceph cluster monitor daemon.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Start request repeated too quickly.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Failed with result 'exit-code'.
Oct 05 19:57:57 pve03 systemd[1]: Failed to start Ceph cluster monitor daemon.

Is it possible to get around the error? Thanks!
 
Last edited:
This solution not work for me:
Bash:
root@pve05:~# ceph-mon -i pve05 --extract-monmap tmp/map
2019-10-07 16:50:42.590 7f647b8ce440 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve05' is empty: have you run 'mkfs'?

i'm trying add monitor manually:
Bash:
ceph auth get mon. -o tmp/key
ceph mon getmap -o tmp/map
ceph-mon -i mon.pve05 --mkfs --monmap tmp/map --keyring tmp/key
and remove
Bash:
ceph mon remove mon.pve05

After reboot, ceph nodes not started. Looks like he died completely.
Bash:
root@pve05:~# ceph -s
Cluster connection aborted
Bash:
root@pve05:~# systemctl status ceph-crash.service ceph-fuse.target ceph-mds.target ceph-mgr.target ceph-mon.target ceph-osd.target ceph.target
● ceph-crash.service - Ceph crash dump collector
   Loaded: loaded (/lib/systemd/system/ceph-crash.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-10-07 16:51:50 MSK; 34min ago
 Main PID: 2616543 (ceph-crash)
    Tasks: 1 (limit: 4915)
   Memory: 5.0M
   CGroup: /system.slice/ceph-crash.service
           └─2616543 /usr/bin/python2.7 /usr/bin/ceph-crash

Oct 07 16:51:50 pve05 systemd[1]: Started Ceph crash dump collector.
Oct 07 16:51:50 pve05 ceph-crash[2616543]: INFO:__main__:monitoring path /var/lib/ceph/crash, delay 600s

● ceph-fuse.target - ceph target allowing to start/stop all ceph-fuse@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-fuse.target; enabled; vendor preset: enabled)
   Active: active since Mon 2019-10-07 16:51:50 MSK; 34min ago

Oct 07 16:51:50 pve05 systemd[1]: Stopped target ceph target allowing to start/stop all ceph-fuse@.service instances at once.
Oct 07 16:51:50 pve05 systemd[1]: Stopping ceph target allowing to start/stop all ceph-fuse@.service instances at once.
Oct 07 16:51:50 pve05 systemd[1]: Reached target ceph target allowing to start/stop all ceph-fuse@.service instances at once.

● ceph-mds.target - ceph target allowing to start/stop all ceph-mds@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mds.target; enabled; vendor preset: enabled)
   Active: active since Mon 2019-10-07 16:51:50 MSK; 34min ago

Oct 07 16:51:50 pve05 systemd[1]: Stopped target ceph target allowing to start/stop all ceph-mds@.service instances at once.
Oct 07 16:51:50 pve05 systemd[1]: Stopping ceph target allowing to start/stop all ceph-mds@.service instances at once.
Oct 07 16:51:50 pve05 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.

● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mgr.target; enabled; vendor preset: enabled)
   Active: active since Mon 2019-10-07 16:51:50 MSK; 34min ago

Oct 07 16:51:50 pve05 systemd[1]: Stopping ceph target allowing to start/stop all ceph-mgr@.service instances at once.
Oct 07 16:51:50 pve05 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mgr@.service instances at once.

● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; vendor preset: enabled)
   Active: active since Mon 2019-10-07 16:51:50 MSK; 34min ago

Oct 07 16:51:50 pve05 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.

● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph-osd.target; enabled; vendor preset: enabled)
   Active: active since Mon 2019-10-07 16:51:51 MSK; 34min ago

Oct 07 16:51:51 pve05 systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.

● ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
   Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
   Active: active since Mon 2019-10-07 16:51:51 MSK; 34min ago

Oct 07 16:51:51 pve05 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.
New entries in the logs do not appear:
Bash:
root@pve05:~# date
Mon 07 Oct 2019 05:15:07 PM MSK
Bash:
root@pve05:~# tail -n 11 /var/log/ceph/ceph.log
2019-10-07 14:15:26.353911 mgr.pve01 (mgr.1934238) 77718 : cluster [DBG] pgmap v77741: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:28.354646 mgr.pve01 (mgr.1934238) 77719 : cluster [DBG] pgmap v77742: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:30.355225 mgr.pve01 (mgr.1934238) 77720 : cluster [DBG] pgmap v77743: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:32.356000 mgr.pve01 (mgr.1934238) 77721 : cluster [DBG] pgmap v77744: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:34.356471 mgr.pve01 (mgr.1934238) 77722 : cluster [DBG] pgmap v77745: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:36.357053 mgr.pve01 (mgr.1934238) 77723 : cluster [DBG] pgmap v77746: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:38.357720 mgr.pve01 (mgr.1934238) 77724 : cluster [DBG] pgmap v77747: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:40.358319 mgr.pve01 (mgr.1934238) 77725 : cluster [DBG] pgmap v77748: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:42.358999 mgr.pve01 (mgr.1934238) 77726 : cluster [DBG] pgmap v77749: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:44.359448 mgr.pve01 (mgr.1934238) 77727 : cluster [DBG] pgmap v77750: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:46.360119 mgr.pve01 (mgr.1934238) 77728 : cluster [DBG] pgmap v77751: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
2019-10-07 14:15:48.360842 mgr.pve01 (mgr.1934238) 77729 : cluster [DBG] pgmap v77752: 128 pgs: 128 active+clean; 0 B data, 335 GiB used, 4.9 TiB / 5.2 TiB avail
 
Last edited:
root@pve03:/etc/ceph# pveceph createmon
Why did you want to create a MON after upgrade?

Can you please post the output of systemctl status ceph-mon*?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!