Re-adding Ceph Node

gdi2k

Renowned Member
Aug 13, 2016
83
1
73
I use a 3-node cluster set up with Ceph. Over the weekend node 3's system disk (SSD, no RAID) failed. I replaced the disk, removed it from the cluster, re-added it per the instructions and all is well - the cluster is complete again.

Now I'm having trouble with ceph. I removed all the OSDs and monitor (from command line) and am now trying to add them back. The problem is that I get:

Code:
root@smiles3:~# pveceph createmon --mon-address 10.15.15.52
monitor address '10.15.15.52:6789' already in use by 'mon.2'
root@smiles3:~# ceph mon remove mon.2
mon.mon.2 does not exist or has already been removed

My ceph.conf file looks like this:
Code:
root@smiles3:~# cat /etc/ceph/ceph.conf
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.15.15.0/24
     filestore xattr use omap = true
     fsid = ab9b66eb-4363-4fca-85dd-e67e47aef05f
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 1
     public network = 10.15.15.0/24

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
     host = smiles2
     mon addr = 10.15.15.51:6789

[mon.2]
     host = smiles3
     mon addr = 10.15.15.52:6789

[mon.0]
     host = smiles1
     mon addr = 10.15.15.50:6789

What's the best approach for resolving this? I could just remove mon.2 from the config file, but I don't know what repercussions that would have.
 
I should add that the list of monitors look like the attached, and if I try to remove it I get
"monitor filesystem '/var/lib/ceph/mon/ceph-2' does not exist on this node (500)".
 

Attachments

  • Selection_061.jpg
    Selection_061.jpg
    3.8 KB · Views: 57
just remove it from the config before trying to add, our tooling looks there and aborts if it finds the given ip since it assumes the config is the current configuration
 
Thanks dcsapak, I removed mon.2 from ceph.conf on an old node, then ran pveceph createmon -mon-address 10.15.15.52 on the newly installed node, but still the same issue.

It shows as Quorum = No on the list of monitors in the web UI, and if I check ceph.conf on the old node, the new node is not included in the list (although it is in the ceph.config on the new node). So the /etc/ceph/ directories are out of sync.

Can I somehow completely reset everything ceph related on the new node and start again?
 
If you are not running Proxmox VE in a cluster, the config will only be added on the node, where the command was issued. Then you have to edit the ceph.conf by hand on the other nodes.
 
I am using PVE in a 3-cluster configuration. The PVE part of it works fine (I can see all nodes, they have green arrows, I can move VMs between them all etc.). It's just Ceph that I can't get working.
 
Is '/etc/ceph/ceph.conf' a symlink to '/etc/pve/ceph.conf'? If not, then please create it.
 
Many thanks, this was helpful. smiles1 no longer its symlink, so the changes did not propagate. After fixing the symlink and then removing the entry for the new server manually, I was able to add it back.

All monitors now show they have quorum now (although the new one is now named "mon.hostname3", but I don't think that's a problem).

However, when I try to add an OSD on the newly added node, it doesn't appear in the list of OSDs (although the logs show it was added successfully). Log output:

Code:
Virtual Environment 5.2-12
Search
Node 'smiles3'
No OSD selected
Server View
Logs
()
create OSD on /dev/sdb (bluestore)
wipe disk/partition: /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.520219 s, 403 MB/s
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/sdb1              isize=2048   agcount=4, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=864, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
TASK OK

OSD table remains the same (lists no OSDs on new node) - see attached.
Selection_063.jpg
Nearly there - how can I fix this last step?When I added the new node, there was something in the output about the management directory already existing. I lost the exact message due to terminal shuffle, but I wonder if it is related?
 
Did you hit 'reload' after the creation? And is the OSD mounted?
 
Yes, reload doesn't help, it never shows. OSD is not mounted;

Code:
root@smiles3:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=16341840k,nr_inodes=4085460,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=3272576k,mode=755)
/dev/mapper/pve-root on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=34,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15249)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

Thanks for all your help!
 
As I don't see OSD 4,5,8 on the screenshot, are these seen through 'ceph osd crush tree'? If not, check 'ceph auth list'. And if they aren't in use, then you need to delete the ceph auth entry, otherwise the OSD can't be added to the cluster.
 
No, they're not visible through 'ceph osd crush tree':

Code:
root@smiles3:~# ceph osd crush tree
ID CLASS WEIGHT  TYPE NAME       
-1       4.96675 root default     
-2       2.52208     host smiles1
 0   ssd 0.22240         osd.0   
 1   ssd 0.89999         osd.1   
 6   ssd 0.95000         osd.6   
 9   ssd 0.44969         osd.9   
-3       2.44467     host smiles2
 2   ssd 0.21999         osd.2   
 3   ssd 0.87500         osd.3   
 7   ssd 0.89999         osd.7   
10   ssd 0.44969         osd.10   
-4             0     host smiles3

I believe I previously deleted these before re-adding the newly reinstalled node.
 
Any more suggestions? I would really like to get this last piece of the puzzle figured out so I can fully restore the cluster.
 
Did you check 'ceph auth list'?
 
It shows the following:

Code:
root@smiles1:~# ceph auth list
installed auth entries:

osd.0
    key: AQAWH7paYyXFKRAAR955203HR1WKXiDMYmJeJA==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.1
    key: AQBDNSxYQdD+ARAAGn0DoTbPOKt+6sM5uRlA9Q==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.10
    key: AQA91kpbjbBPAxAAvZcyR2boRDJJqQDwur0c+Q==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.11
    key: AQDV1kpbgeC7JBAAyXjUPCHHofGvLzNTFKDM3Q==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.2
    key: AQDPNSxYqUhGLxAAoaDfS4eEM9GuI32uR03z0w==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.3
    key: AQCwNixY1fPPNxAAxk8laWEMN+l2rJUtqerF9w==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.4
    key: AQDlNixYqC86ERAAHs6GC/74Qh1fRY7vQ7/LaA==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.5
    key: AQD5NixYfgZ6OhAAnOxwf1hC+tOJn+6vYqKcXQ==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.6
    key: AQANSrJYpPkDHhAADgFcenCKTOSM6ChxJhZOnA==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.7
    key: AQAhSrJYdXTvMhAAhQ0BT328zNgqtuXVJ9a3Sg==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.8
    key: AQA5SrJYdF0SFxAAlgQkrq0S8cXJdJPIp3BIUQ==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
osd.9
    key: AQAl1kpbRBCYJBAApVz/0Txaq1cn35Z/pa2ypQ==
    caps: [mgr] allow profile osd
    caps: [mon] allow profile osd
    caps: [osd] allow *
client.admin
    key: AQCNNCxYRPObLhAAKvdlMBI6VWaS7M8SOxR4YQ==
    auid: 0
    caps: [mds] allow *
    caps: [mgr] allow *
    caps: [mon] allow *
    caps: [osd] allow *
client.bootstrap-mds
    key: AQCONCxYQOIuFhAAXFXoieIvDKBffKH3PKNFdw==
    caps: [mgr] allow r
    caps: [mon] allow profile bootstrap-mds
client.bootstrap-mgr
    key: AQB/Hv5bRBVmMRAA2hAtSKnPSdqu1lBCJd5VMA==
    caps: [mon] allow profile bootstrap-mgr
client.bootstrap-osd
    key: AQCONCxYg+DGChAAVYAPCrYEKLN/Rj/QXJfalQ==
    caps: [mgr] allow r
    caps: [mon] allow profile bootstrap-osd
client.bootstrap-rbd
    key: AQA6WCRdHm3zMxAAWTscgRD5VbjJMQbuyHBnuA==
    caps: [mon] allow profile bootstrap-rbd
client.bootstrap-rgw
    key: AQCONCxYyL53EBAAxbSaQKi95whZfD+IMJxNJg==
    caps: [mgr] allow r
    caps: [mon] allow profile bootstrap-rgw
mgr.smiles2
    key: AQChFv5b1JqZORAAOcQOm3Bwjpu/H715OAIZnQ==
    caps: [mds] allow *
    caps: [mgr] allow r
    caps: [mon] allow profile mgr
    caps: [osd] allow *
mgr.smiles3
    key: AQCkFv5bLCpsLhAAQTOe1lbhDsnZaYpY0lyB5w==
    caps: [mds] allow *
    caps: [mgr] allow r
    caps: [mon] allow profile mgr
    caps: [osd] allow *

So it seems like the OSDs removed from the UI after the node died are still present here. Is that expected?
 
As I said above:
As I don't see OSD 4,5,8 on the screenshot, are these seen through 'ceph osd crush tree'? If not, check 'ceph auth list'. And if they aren't in use, then you need to delete the ceph auth entry, otherwise the OSD can't be added to the cluster.
If you are sure the OSDs don't exist anymore, you can delete the auth key with 'ceph auth del osd.ID'.

So it seems like the OSDs removed from the UI after the node died are still present here. Is that expected?
Either the removal was not complete or the creation failed and left the auth key behind. This usually only happens in those two cases and is just something to keep in mind.
 
This was the solution, thank you! Removing the non-existent OSDs with 'ceph auth del osd.ID' and then re-adding them using the Web UI worked perfectly.

Thanks all so much for the help! Ceph is rebalancing now...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!