rejoin a failed same node to ceph

selcuk2

New Member
May 19, 2024
8
0
1
Hi everyone
My env: 3 node PVE ver: 8.1.3 with ceph ver: 18.2.1

one of my nodes failed due to boot disk crash. ceph disks are healthy. we plugged in new boot disk an installed proxmox. for the cluster setting, I deleted and rejoined the node to the cluster. now clsuter is healthy

my second task is to rejoin this node to ceph.
I installed ceph for this node but It installed 18.2.4. My problems:
Firstly; after installation it did not allow me to configure, it says config is already done. (I interpreted it a a good thing). but...
MONs
in GUI, new node monitor is there but "Host is unknown" and I can not start/stop/destroy it! I can not add a new mon with pveceph

Code:
pveceph mon destroy pve3
monitor filesystem '/var/lib/ceph/mon/ceph-pve3' does not exist on this node
root@pve3:~# pveceph mon create
monitor address '10.10.10.153' already in use

in the gui it says: mon.pve3 (rank2) is down (out of quorum)
a newer version was installed but old version still running please restart


it also requests me to upgrade other nodes from 18.1.2 to 18.1.4. How can I do that? what about the following?
Code:
apt-get update
apt-get upgrade ceph ceph-common ceph-fuse ceph-mds ceph-volume gdisk nvme-cli

OSDs
in GUI, "other cluster members use a new verson of this service, please upgrade and restart"

in GUI, I can see osd's in my rejoined node but osd's are down and out. I can not make them up and running. I also do not see osd services (systemctl start ceph-osd..) for my newly rejoined host.

the second thing comes to my mind is to use "pveceph purge" in this node and start from scratch for this node..


How can I proceed?

thank you for your help
 
I have successfullu deletes/readded mon after deleting /var/lib/ceph/mon/*

now I have problem with OSDs. I was able to make them IN but I am unable to start the OSD. in the new node, I see that /var/lib/ceph/osd is empty. same dir is filed with files in other nodes.

can ı use these OSDs or should I remove currents OSDs and readd from scratch..

my 2 question
how can I upgarde ceph. my new node is isnatlled as 18.2.4 while cluster nodes have 18.2.1. ı need to upgrade my cluster to 18.2.4

thanks..
 
this command worked for me
Code:
root@pve3:~# ceph-volume lvm activate --all

I have one osd which has no lvm on it. I think I will destroy and readd it to the ceph
 
I have deleted the OSD and recreated. it is automatically used by ceph to redistribute the data..

my only question is how can I upgrade ceph from 18.2.1 -> 18.2.4. this is required for existing cluster members to be the same version with the newly added one!

new node's ceph version: 18.2.4
existing node' version: 18.2.1

GUI tells me that ı need to upgrade older ones and restart OSD services
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!