[SOLVED] Unable to create ceph-mgr or OSD on new setup

semira uthsala

Well-Known Member
Nov 19, 2019
43
7
48
34
Singapore
Hi all,

My pve setup got 3x storage nodes where I installed ceph nautilus 14.2.9. I went through the GUI installation and created 3x mons after the initial ceph installation.

Right away after add 3x monitors it warns me "clock skew detected on mon.01 and mon.02. I configured NTP and time is correct and equal on all 3 nodes"

After mon setup I tried to add mgr on the other two nodes. ( 1 mgr already created from the configuration step ). and I'm getting timeout error every time I try to create a manager.

Code:
root@storage-node-02:~# pveceph mgr create
creating manager directory '/var/lib/ceph/mgr/ceph-storage-node-02'
creating keys for 'mgr.storage-node-02'
got timeout

I use two separate networks for frontend and cluster networks. both networks can reach on all three nodes.


After tried many times I purge the cluster using below commands

Code:
#!/bin/bash
rm -rf /etc/systemd/system/ceph*
killall -9 ceph-mon ceph-mgr ceph-mds
rm -rf /var/lib/ceph/mon/  /var/lib/ceph/mgr/  /var/lib/ceph/mds/
pveceph purge
rm -rf /etc/pve/ceph.conf
rm -rf /etc/ceph/ceph.conf
apt -y purge ceph-mon ceph-osd ceph-mgr ceph-mds
apt -y autoremove
rm /etc/init.d/ceph

And reboot the nodes and did the clean ceph install. and tried to create the manager. Still, I'm getting the same error. timeout when try to create ceph-mgr (GUI and CLI both)

Code:
root@storage-node-02:~# ceph -s
  cluster:
    id:     ed50689b-ba7b-4a2d-af27-f8007d22d8ff
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3
            clock skew detected on mon.storage-node-02

  services:
    mon: 2 daemons, quorum storage-node-01,storage-node-02 (age 6m)
    mgr: no daemons active (since 6m)
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

After this, I tried to create OSD and I'm getting (timeout 500). Is there anything wrong with the proxmox version or ceph version I use?

Code:
root@storage-node-02:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
Last edited:
Hi All,

My NTP was not working even I configured properly due to firewall rule. I fix that and reinstalled the ceph clean. Now all working properly. I can create ceph-mgr and ceph-mons without any timeout issues. And no more " clock skew detected " warning also.