[SOLVED] new node could NOT join into ceph clsuter

huky

Renowned Member
Jul 1, 2016
70
3
73
45
Chongqing, China
I have a cluster with 7 node.
I upgraded pve from v5 to v6 last week and upgrade ceph from Luminous to Nautilus.
when I want join 2 new nodes(named node007 and node009) into it.
after pveceph install, the new node could not join ceph cluster, but the mounted cephfs:

node009:
Code:
root@node009:~# df
Filesystem                                                                          1K-blocks     Used  Available Use% Mounted on
udev                                                                                263453716        0  263453716   0% /dev
tmpfs                                                                                52695644    10516   52685128   1% /run
/dev/mapper/pve-root                                                                 98559220  5303836   88205836   6% /
tmpfs                                                                               263478200    64368  263413832   1% /dev/shm
tmpfs                                                                                    5120        0       5120   0% /run/lock
tmpfs                                                                               263478200        0  263478200   0% /sys/fs/cgroup
/dev/fuse                                                                               30720      136      30584   1% /etc/pve
10.10.10.1,10.10.10.1:6789,10.10.10.2,10.10.10.2:6789,10.10.10.3,10.10.10.3:6789:/ 7903268864 44044288 7859224576   1% /mnt/pve/cephfs1
tmpfs                                                                                52695640        0   52695640   0% /run/user/0
root@node009:~# ceph -s
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)

and the node007:
Code:
root@node007:~# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
no monitors specified to connect to.
2021-01-05 14:10:19.677533 7fa834173700 -1 failed for service _ceph-mon._tcp
[errno 2] error connecting to the cluster
root@node007:~# df
Filesystem                                                                          1K-blocks     Used  Available Use% Mounted on
udev                                                                                263453696        0  263453696   0% /dev
tmpfs                                                                                52695640    18704   52676936   1% /run
/dev/mapper/pve-root                                                                 98559220  3938996   89570676   5% /
tmpfs                                                                               263478184    58128  263420056   1% /dev/shm
tmpfs                                                                                    5120        0       5120   0% /run/lock
tmpfs                                                                               263478184        0  263478184   0% /sys/fs/cgroup
/dev/fuse                                                                               30720      136      30584   1% /etc/pve
10.10.10.1,10.10.10.1:6789,10.10.10.2,10.10.10.2:6789,10.10.10.3,10.10.10.3:6789:/ 7903268864 44044288 7859224576   1% /mnt/pve/cephfs1
tmpfs                                                                                52695636        0   52695636   0% /run/user/0
root@node007:~# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
no monitors specified to connect to.
2021-01-05 14:10:22.963703 7f13cfb83700 -1 failed for service _ceph-mon._tcp
[errno 2] error connecting to the cluster

node001:
Code:
root@node001:~# ceph -s
  cluster:
    id:     225397cb-7b69-4c24-8c34-f43951f42974
    health: HEALTH_WARN
            BlueFS spillover detected on 1 OSD(s)
            mon 2 is low on available space

  services:
    mon: 3 daemons, quorum 0,1,2 (age 5d)
    mgr: node002(active, since 5d), standbys: node001, node003
    mds: cephfs1:3 {0=node002=up:active,1=node001=up:active,2=node003=up:active}
    osd: 43 osds: 43 up, 43 in

  task status:
    scrub status:
        mds.node001: idle
        mds.node002: idle
        mds.node003: idle

  data:
    pools:   7 pools, 2720 pgs
    objects: 4.14M objects, 16 TiB
    usage:   47 TiB used, 36 TiB / 83 TiB avail
    pgs:     2720 active+clean

  io:
    client:   2.7 KiB/s rd, 1.3 MiB/s wr, 3 op/s rd, 100 op/s wr


just need a link:
Code:
ln -s /etc/pve/ceph.conf /etc/ceph/ceph.conf
 
Last edited: