Rename node w/ hci and ceph

lknite · 2024-10-17T18:05:33+0200

Goal:
- I have a 3 node cluster with one node I wish to rename.
- I'd like an assist to ensure that I don't break ceph.

Detail:
- I have enough resources that I was able to move all vms off the node

The Plan (on the node to be renamed)
- Set OSDs to 'out'
- PGs will become 'active+clean+remapped', wait until this number becomes zero (is this true?)
- Once 'Used %' shows 0.00, Stop the OSDs
- Once stopped, select More and choose Destroy for each OSD
- If a ceph monitor is enabled on the node, Stop the monitor, and after stopped Destroy it
- If a ceph manager is enabled on the node, Stop the manager, and after stopped Destroy it
- (at least one manager is required, enable on another node if needed)
- Remove the node from the proxmox cluster using: pvecm delnode <node> (is this necessary?)
- Rename the node using: hostnamectl set-hostname <new_name>
- Reboot
- Rename node in /etc/corosync/corosync.conf (or should this be on an existing node in the cluster /etc/pve/corosync.conf?)
- Add renamed node back into cluster: pvecm add <ip_of_existing_cluster_member> (is this necessary?)
- Add the OSDs back in
- Enable ceph monitor
- Optionally, enable ceph manager on the node
- If you are using HA, then add the new node name to the relevant ha groups.

lknite · 2024-10-17T19:03:30+0200

I'm trying the plan above, and after the reboot I can ssh to the renamed node but the gui isn't coming up.

What's needed to get the gui to come up so I can join the node back into the cluster?

Or, how can I add the cluster via the cli, I tried this but got an error:

Code:

# pvecm add 10.0.0.21
Please enter superuser (root) password for '10.0.0.21': **********
detected the following error(s):
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
Check if node may join a cluster failed!

lknite · 2024-10-17T19:36:27+0200

Ok, I'm not sure what changed, I did another reboot and the gui came up... I didn't even have to add the node in it just worked.

If someone knows what step I did wrong let me know and I'll update the issue so other folks can reference this in the future.

dhcolesj · 2024-10-17T19:42:58+0200

You may not have done anything wrong, it just may have taken a couple of reboots to get it back into the fold.
I'd like to know if it all worked myself.

We might rename some nodes to take 'em from POC to test/build and if those are the right steps we'll be in good shape.

lknite · 2024-10-17T20:05:34+0200

It's doing its recovery. I'll report back when it finishes as to whether it reaches 100% or if there's anything left over.

A couple attempts to migrate a vm back to the node didn't work, but I figure that's cause ceph is busy recovering.

lknite · 2024-10-18T03:29:18+0200

Finished after about 10 hours. Ceph is healthy. Had to add the new node name to HA before I could move vms to it.

Search

Search

Rename node w/ hci and ceph

lknite

New Member

lknite

New Member

lknite

New Member

dhcolesj

Member

lknite

New Member

lknite

New Member