Rename node w/ hci and ceph

lknite · Oct 17, 2024

Goal:
- I have a 3 node cluster with one node I wish to rename.
- I'd like an assist to ensure that I don't break ceph.

Detail:
- I have enough resources that I was able to move all vms off the node

The Plan (on the node to be renamed)
- Set OSDs to 'out'
- PGs will become 'active+clean+remapped', wait until this number becomes zero (is this true?)
- Once 'Used %' shows 0.00, Stop the OSDs
- Once stopped, select More and choose Destroy for each OSD
- If a ceph monitor is enabled on the node, Stop the monitor, and after stopped Destroy it
- If a ceph manager is enabled on the node, Stop the manager, and after stopped Destroy it
- (at least one manager is required, enable on another node if needed)
- Remove the node from the proxmox cluster using: pvecm delnode <node> (is this necessary?)
- Rename the node using: hostnamectl set-hostname <new_name>
- Reboot
- Rename node in /etc/corosync/corosync.conf (or should this be on an existing node in the cluster /etc/pve/corosync.conf?)
- Add renamed node back into cluster: pvecm add <ip_of_existing_cluster_member> (is this necessary?)
- Add the OSDs back in
- Enable ceph monitor
- Optionally, enable ceph manager on the node
- If you are using HA, then add the new node name to the relevant ha groups.

lknite · Oct 17, 2024

I'm trying the plan above, and after the reboot I can ssh to the renamed node but the gui isn't coming up.

What's needed to get the gui to come up so I can join the node back into the cluster?

Or, how can I add the cluster via the cli, I tried this but got an error:

Code:

# pvecm add 10.0.0.21
Please enter superuser (root) password for '10.0.0.21': **********
detected the following error(s):
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
Check if node may join a cluster failed!

lknite · Oct 17, 2024

Ok, I'm not sure what changed, I did another reboot and the gui came up... I didn't even have to add the node in it just worked.

If someone knows what step I did wrong let me know and I'll update the issue so other folks can reference this in the future.

dhcolesj · Oct 17, 2024

You may not have done anything wrong, it just may have taken a couple of reboots to get it back into the fold.
I'd like to know if it all worked myself.

We might rename some nodes to take 'em from POC to test/build and if those are the right steps we'll be in good shape.

lknite · Oct 17, 2024

It's doing its recovery. I'll report back when it finishes as to whether it reaches 100% or if there's anything left over.

A couple attempts to migrate a vm back to the node didn't work, but I figure that's cause ceph is busy recovering.

lknite · Oct 18, 2024

Finished after about 10 hours. Ceph is healthy. Had to add the new node name to HA before I could move vms to it.

Rename node w/ hci and ceph

lknite

Member

lknite

Member

lknite

Member

dhcolesj

Member

lknite

Member

lknite

Member

We value your privacy