Howdy, NOOB again. With a little help I got POC cluster going with CEPH. Unfortunately, I had to re-install one of the nodes. I removed from cluster (pvecm delnode pve102) and wiped all disks before re-installation.
After re-installation and re-creation of OSDs on all disks, the CEPH cluster returned to semi-health. I created a manager and a monitor on the re-installed node without error.
Unfortunately the monitor is showing stopped (in the GUI) and clicking start doesn't help. I did some googleing and checked the manager status at CLI and it shows running.
Here is some info, what do I do to get GUI to show complete health (or to get complete health?
After re-installation and re-creation of OSDs on all disks, the CEPH cluster returned to semi-health. I created a manager and a monitor on the re-installed node without error.
Unfortunately the monitor is showing stopped (in the GUI) and clicking start doesn't help. I did some googleing and checked the manager status at CLI and it shows running.
Here is some info, what do I do to get GUI to show complete health (or to get complete health?
Code:
root@pve102:~# systemctl status ceph-mon@pve102.service
● ceph-mon@pve102.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: active (running) since Wed 2024-10-02 16:57:40 PDT; 4min 12s ago
Main PID: 12526 (ceph-mon)
Tasks: 25
Memory: 136.5M
CPU: 1.260s
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve102.service
└─12526 /usr/bin/ceph-mon -f --cluster ceph --id pve102 --setuser ceph --setgroup ceph
Oct 02 16:57:40 pve102 systemd[1]: Started ceph-mon@pve102.service - Ceph cluster monitor daemon.
root@pve102:~# pvecm status
Cluster information
-------------------
Name: sv4-pve-c1
Config Version: 7
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Wed Oct 2 17:11:27 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1.6c
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.31.30.101
0x00000002 1 172.31.30.102 (local)
0x00000003 1 172.31.30.103
root@pve102:~# ceph -s
cluster:
id: 1101c540-2741-48d9-b64d-189700d0b84f
health: HEALTH_WARN
3 daemons have recently crashed
services:
mon: 2 daemons, quorum pve101,pve103 (age 4h)
mgr: pve101(active, since 4h), standbys: pve103, pve102
osd: 34 osds: 34 up (since 17m), 34 in (since 99m)
data:
pools: 2 pools, 33 pgs
objects: 4.78k objects, 19 GiB
usage: 61 GiB used, 113 TiB / 114 TiB avail
pgs: 33 active+clean
io:
client: 3.3 KiB/s wr, 0 op/s rd, 0 op/s wr
root@pve103:/etc/pve# cat /etc/pve/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.0.201.1/16
fsid = 1101c540-2741-48d9-b64d-189700d0b84f
mon_allow_pool_delete = true
mon_host = 10.0.201.1 10.0.203.1 10.0.202.1
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.0.201.1/16
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.pve101]
public_addr = 10.0.201.1
[mon.pve102]
public_addr = 10.0.202.1
[mon.pve103]
public_addr = 10.0.203.1
root@pve103:/etc/pve#