I am running en experimental 3 node proxmox 6 VE - Ceph hyperconverged cluster (blade01, blade02, blade10) . I had an issue with ceph versions that got corrected. However now I am seeing an issue with Monitors on the Blade02.
GUI screenshot attached. I see a "?" and on hovering i get "Address Unknown / Stopped"
If I go to the Monitor screen, I see only only one monitor in Blade 02. Start,
Stop, and Restart Actions give a "Done " popup.
Why the discrepancy ??
I am unable to create a new Monitor on Blade02 as well.
Appreciate the help.
Thanks
Vivek
The syslog for Blade 02 shows
If I go to the directory /var/lib/ceph/mon on the Blade02, it is in fact empty. The mon directory is owned by ceph/ceph user/group. (rwx r-x --- permissions)
ceph -s shows only TWO monitors on Blade01 and Blade10.
The Ceph Global Configuration in the GUI also shows Two Mons
GUI screenshot attached. I see a "?" and on hovering i get "Address Unknown / Stopped"
If I go to the Monitor screen, I see only only one monitor in Blade 02. Start,
Stop, and Restart Actions give a "Done " popup.
Why the discrepancy ??
I am unable to create a new Monitor on Blade02 as well.
Appreciate the help.
Thanks
Vivek
The syslog for Blade 02 shows
Code:
Oct 23 09:44:41 systemd[1]: Started Ceph cluster monitor daemon.
Oct 23 09:44:41 ceph-mon[39041]: 2019-10-23 09:44:41.764 7f36adf6a440 -1 rocksdb: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-dell0104blade02/store.db/LOCK: Permission denied
Oct 23 09:44:41 ceph-mon[39041]: 2019-10-23 09:44:41.764 7f36adf6a440 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-dell0104blade02': (22) Invalid argument
Oct 23 09:44:41 systemd[1]: ceph-mon@dell0104blade02.service: Main process exited, code=exited, status=1/FAILURE
Oct 23 09:44:41 systemd[1]: ceph-mon@dell0104blade02.service: Failed with result 'exit-code'.
Oct 23 09:44:45 systemd[1]: Stopped Ceph cluster monitor daemon.
Oct 23 09:44:45 systemd[1]: Started Ceph cluster monitor daemon.
Oct 23 09:44:45 ceph-mon[39131]: 2019-10-23 09:44:45.244 7fd5fd4ad440 -1 rocksdb: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-dell0104blade02/store.db/LOCK: Permission denied
Oct 23 09:44:45 dell0104blade10 ceph-mon[39131]: 2019-10-23 09:44:45.244 7fd5fd4ad440 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-dell0104blade02': (22) Invalid argument
Oct 23 09:44:45 systemd[1]: ceph-mon@dell0104blade02.service: Main process exited, code=exited, status=1/FAILURE
Oct 23 09:44:45 systemd[1]: ceph-mon@dell0104blade02.service: Failed with result 'exit-code'.
Oct 23 09:44:55 systemd[1]: ceph-mon@dell0104blade02.service: Service RestartSec=10s expired, sched
Why the discrepancy ??
I am unable to create a new Monitor on Blade02 as well.
uling restart.
Oct 23 09:44:55 systemd[1]: ceph-mon@dell0104blade02.service: Scheduled restart job, restart counter is at 1.
Oct 23 09:44:55 systemd[1]: Stopped Ceph cluster monitor daemon.
Oct 23 09:44:55 systemd[1]: Started Ceph cluster monitor daemon.
If I go to the directory /var/lib/ceph/mon on the Blade02, it is in fact empty. The mon directory is owned by ceph/ceph user/group. (rwx r-x --- permissions)
ceph -s shows only TWO monitors on Blade01 and Blade10.
Code:
root@dell0104blade02:~# ceph -s
cluster:
id: 09fc106c-d4cf-4edc-867f-db170301f857
health: HEALTH_OK
services:
mon: 2 daemons, quorum dell0104blade01,dell0104blade10 (age 2w)
mgr: dell0104blade01(active, since 2w), standbys: dell0104blade10, dell0104blade02
osd: 3 osds: 3 up (since 2w), 3 in (since 2w)
data:
Why the discrepancy ??
I am unable to create a new Monitor on Blade02 as well.
pools: 1 pools, 128 pgs
objects: 13.33k objects, 51 GiB
usage: 121 GiB used, 995 GiB / 1.1 TiB avail
pgs: 128 active+clean
io:
client: 1023 B/s wr, 0 op/s rd, 0 op/s wr
The Ceph Global Configuration in the GUI also shows Two Mons
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.15.31/24
fsid = 09fc106c-d4cf-4edc-867f-db170301f857
mon_allow_pool_delete = true
mon_host = 192.168.15.31 192.168.15.204
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.15.31/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring