Reinstall monitor

Mikepop · Mar 4, 2018

Hi, I've reinstalled a node following this doc https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster#Re-installing_a_cluster_node but I have a problem with monitor

# pveceph createmon

monitor 'mon.int101' already exists

# pveceph destroymon int101

monitor filesystem '/var/lib/ceph/mon/ceph-int101' does not exist on this node

# pvecm status

Quorum information

------------------

Date: Sun Mar 4 21:11:02 2018

Quorum provider: corosync_votequorum

Nodes: 3

Node ID: 0x00000001

Ring ID: 1/80

Quorate: Yes

Votequorum information

----------------------

Expected votes: 3

Highest expected: 3

Total votes: 3

Quorum: 2

Flags: Quorate

Membership information

----------------------

Nodeid Votes Name

0x00000001 1 10.10.10.101 (local)

0x00000002 1 10.10.10.102

0x00000003 1 10.10.10.103

Is there any way to force the monitor creation? Everything seems to work fine except quorum.

Regards

Alwin · Mar 5, 2018

In the /etc/pve/ceph.conf you still have the monitor 'int101' configured.

Mikepop · Mar 5, 2018

Thanks, I've recreated the monitor after delete from /etc/pve/ceph.conf.

Regards

CastyMcBoozer · Apr 23, 2019

What if I don't have an entry for the monitor in /etc/pve/ceph.conf? It's already been removed, but I can't re-create the monitor.

I also can't manually delete /var/lib/ceph/mon/ceph-pve3

Alwin · Apr 24, 2019

@CastyMcBoozer, well, the MON might be still running, if you can't even remove the directory. And did you remove through 'pveceph mon destroy' or just from the ceph.conf?

CastyMcBoozer · Apr 24, 2019

Alwin said:
@CastyMcBoozer, well, the MON might be still running, if you can't even remove the directory. And did you remove through 'pveceph mon destroy' or just from the ceph.conf?

I originally removed it from the GUI. I'm not sure what service I'd be looking for running but:

root@pve3:~# ps aux | grep pve

ceph 1454 0.0 0.3 357504 29712 ? Ssl Apr22 0:18 /usr/bin/ceph-mds -f --cluster ceph --id pve3 --setuser ceph --setgroup ceph

root 1597 0.0 1.0 517172 81836 ? Ss Apr22 2:00 pve-firewall

root 1616 0.1 1.0 515192 82508 ? Ss Apr22 4:51 pvestatd

root 1712 0.0 1.3 556884 110580 ? Ss Apr22 0:01 pvedaemon

root 1717 0.0 1.4 567568 120112 ? S Apr22 0:01 pvedaemon worker

root 1718 0.0 1.4 567604 120124 ? S Apr22 0:01 pvedaemon worker

root 1719 0.0 1.4 566988 119552 ? S Apr22 0:01 pvedaemon worker

root 1873 0.0 1.1 526468 91560 ? Ss Apr22 0:09 pve-ha-crm

www-data 1898 0.0 1.6 564712 131724 ? Ss Apr22 0:02 pveproxy

root 1979 0.0 1.1 526164 91220 ? Ss Apr22 0:19 pve-ha-lrm

www-data 141158 0.0 1.4 567048 119380 ? S Apr23 0:01 pveproxy worker

www-data 141159 0.0 1.4 567048 119380 ? S Apr23 0:01 pveproxy worker

www-data 141160 0.0 1.4 567048 119380 ? S Apr23 0:01 pveproxy worker

root 377870 0.0 0.0 94144 180 ? Ssl 06:25 0:00 /usr/sbin/pvefw-logger

root 417519 0.0 0.0 12784 936 pts/0 S+ 10:26 0:00 grep pve

CastyMcBoozer · Apr 24, 2019

and:

ps aux | grep ceph

ceph 1454 0.0 0.3 357504 29712 ? Ssl Apr22 0:18 /usr/bin/ceph-mds -f --cluster ceph --id pve3 --setuser ceph --setgroup ceph

ceph 1705 0.1 14.7 1917600 1184888 ? Ssl Apr22 4:50 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph

root 417832 0.0 0.0 12784 944 pts/0 S+ 10:28 0:00 grep ceph

Alwin · Apr 25, 2019

You can find out with 'lsof' what files/directories are open by what process at that location.

CastyMcBoozer · Apr 25, 2019

Alwin said:
You can find out with 'lsof' what files/directories are open by what process at that location.

Nada. Files not in use.

Alwin · Apr 26, 2019

And can you delete it now?

CastyMcBoozer · Apr 26, 2019

Alwin said:
And can you delete it now?

Are you asking if lsof fixed it? What is this? Quantum mechanics double slit experiment? LOL no nothing has changed. I could just rebuild the node entirely but I kept hearing how stable and production ready Ceph was. I’m not convinced.

Alwin · Apr 26, 2019

CastyMcBoozer said:
Are you asking if lsof fixed it? What is this? Quantum mechanics double slit experiment? LOL no nothing has changed.

It's called time between actions (see the timestamps on the posts). Where a state of a running system can definitely change.
Alwin, Yesterday at 07:39
CastyMcBoozer, Yesterday at 19:14
Alwin, Today at 10:38
CastyMcBoozer, 46 minutes ago (Today 14:03)

CastyMcBoozer said:
I could just rebuild the node entirely but I kept hearing how stable and production ready Ceph was. I’m not convinced.

If you aren't sending any error message or logfiles, everything is just fishing in the dark.

EvilBox · Oct 5, 2019

Hi all!
Looks like I'm having a similar problem:

root@pve03:/etc/ceph# pveceph createmon
monitor 'pve03' already exists

root@pve03:/etc/ceph# pveceph destroymon pve03
no such monitor id 'pve03'

root@pve03:/etc/ceph# cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.0.0/24
fsid = e1ee6b28-xxxx-xxxx-xxxx-11d1f6efab9b
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 192.168.0.0/24
ms_bind_ipv4 = true
ms_bind_ipv6 = false
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.pve02]
host = pve02
mon addr = 192.168.0.57

root@pve03:/etc/ceph# ll /var/lib/ceph/mon/
total 0

root@pve03:/etc/ceph# ps aux | grep ceph
root 861026 0.0 0.0 17308 9120 ? Ss 19:03 0:00 /usr/bin/python2.7 /usr/bin/ceph-crash
ceph 863641 0.0 0.2 492588 169916 ? Ssl 19:08 0:04 /usr/bin/ceph-mgr -f --cluster ceph --id pve03 --setuser ceph --setgroup ceph
root 890587 0.0 0.0 6072 892 pts/0 S+ 20:43 0:00 grep ceph

root@pve03:~# ceph mon dump
dumped monmap epoch 9
epoch 9
fsid e1ee6b28-xxxx-xxxx-xxxx-11d1f6efab9b
last_changed 2019-10-05 19:07:48.598830
created 2019-05-11 01:28:04.534419
min_mon_release 14 (nautilus)
0: [v2:192.168.0.57:3300/0,v1:192.168.0.57:6789/0] mon.pve02

syslog:
Oct 05 19:57:47 pve03 systemd[1]: Started Ceph cluster monitor daemon.
Oct 05 19:57:47 pve03 ceph-mon[875279]: 2019-10-05 19:57:47.506 7ffb1227f440 -1 monitor data directory at '/var/lib/ceph/mon/ceph-pve03' does not exist: have you run 'mkfs'?
Oct 05 19:57:47 pve03 systemd[1]: ceph-mon@pve03.service: Main process exited, code=exited, status=1/FAILURE
Oct 05 19:57:47 pve03 systemd[1]: ceph-mon@pve03.service: Failed with result 'exit-code'.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Service RestartSec=10s expired, scheduling restart.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Scheduled restart job, restart counter is at 4.
Oct 05 19:57:57 pve03 systemd[1]: Stopped Ceph cluster monitor daemon.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Start request repeated too quickly.
Oct 05 19:57:57 pve03 systemd[1]: ceph-mon@pve03.service: Failed with result 'exit-code'.
Oct 05 19:57:57 pve03 systemd[1]: Failed to start Ceph cluster monitor daemon.

Is it possible to get around the error? Thanks!

Alwin · Oct 7, 2019

@EvilBox, could you please put your posting into a new thread and add a pveversion -v to it?

EvilBox · Oct 7, 2019

Hi @Alwin!
no problem - https://forum.proxmox.com/threads/reinstall-remove-dead-monitor.58766/

Search

Search

Reinstall monitor

Mikepop

Well-Known Member

Alwin

Proxmox Retired Staff

Mikepop

Well-Known Member

CastyMcBoozer

New Member

Alwin

Proxmox Retired Staff

CastyMcBoozer

New Member

CastyMcBoozer

New Member

Alwin

Proxmox Retired Staff

CastyMcBoozer

New Member

Alwin

Proxmox Retired Staff

CastyMcBoozer

New Member

Alwin

Proxmox Retired Staff

EvilBox

Member

Alwin

Proxmox Retired Staff

EvilBox

Member

We value your privacy