[SOLVED] new node could NOT join into ceph clsuter

huky

Well-Known Member
Jul 1, 2016
70
2
48
43
Chongqing, China
I have a cluster with 7 node.
I upgraded pve from v5 to v6 last week and upgrade ceph from Luminous to Nautilus.
when I want join 2 new nodes(named node007 and node009) into it.
after pveceph install, the new node could not join ceph cluster, but the mounted cephfs:

node009:
Code:
root@node009:~# df
Filesystem                                                                          1K-blocks     Used  Available Use% Mounted on
udev                                                                                263453716        0  263453716   0% /dev
tmpfs                                                                                52695644    10516   52685128   1% /run
/dev/mapper/pve-root                                                                 98559220  5303836   88205836   6% /
tmpfs                                                                               263478200    64368  263413832   1% /dev/shm
tmpfs                                                                                    5120        0       5120   0% /run/lock
tmpfs                                                                               263478200        0  263478200   0% /sys/fs/cgroup
/dev/fuse                                                                               30720      136      30584   1% /etc/pve
10.10.10.1,10.10.10.1:6789,10.10.10.2,10.10.10.2:6789,10.10.10.3,10.10.10.3:6789:/ 7903268864 44044288 7859224576   1% /mnt/pve/cephfs1
tmpfs                                                                                52695640        0   52695640   0% /run/user/0
root@node009:~# ceph -s
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)

and the node007:
Code:
root@node007:~# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
no monitors specified to connect to.
2021-01-05 14:10:19.677533 7fa834173700 -1 failed for service _ceph-mon._tcp
[errno 2] error connecting to the cluster
root@node007:~# df
Filesystem                                                                          1K-blocks     Used  Available Use% Mounted on
udev                                                                                263453696        0  263453696   0% /dev
tmpfs                                                                                52695640    18704   52676936   1% /run
/dev/mapper/pve-root                                                                 98559220  3938996   89570676   5% /
tmpfs                                                                               263478184    58128  263420056   1% /dev/shm
tmpfs                                                                                    5120        0       5120   0% /run/lock
tmpfs                                                                               263478184        0  263478184   0% /sys/fs/cgroup
/dev/fuse                                                                               30720      136      30584   1% /etc/pve
10.10.10.1,10.10.10.1:6789,10.10.10.2,10.10.10.2:6789,10.10.10.3,10.10.10.3:6789:/ 7903268864 44044288 7859224576   1% /mnt/pve/cephfs1
tmpfs                                                                                52695636        0   52695636   0% /run/user/0
root@node007:~# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
no monitors specified to connect to.
2021-01-05 14:10:22.963703 7f13cfb83700 -1 failed for service _ceph-mon._tcp
[errno 2] error connecting to the cluster

node001:
Code:
root@node001:~# ceph -s
  cluster:
    id:     225397cb-7b69-4c24-8c34-f43951f42974
    health: HEALTH_WARN
            BlueFS spillover detected on 1 OSD(s)
            mon 2 is low on available space

  services:
    mon: 3 daemons, quorum 0,1,2 (age 5d)
    mgr: node002(active, since 5d), standbys: node001, node003
    mds: cephfs1:3 {0=node002=up:active,1=node001=up:active,2=node003=up:active}
    osd: 43 osds: 43 up, 43 in

  task status:
    scrub status:
        mds.node001: idle
        mds.node002: idle
        mds.node003: idle

  data:
    pools:   7 pools, 2720 pgs
    objects: 4.14M objects, 16 TiB
    usage:   47 TiB used, 36 TiB / 83 TiB avail
    pgs:     2720 active+clean

  io:
    client:   2.7 KiB/s rd, 1.3 MiB/s wr, 3 op/s rd, 100 op/s wr


just need a link:
Code:
ln -s /etc/pve/ceph.conf /etc/ceph/ceph.conf
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!