I was able to do this last night. Do you think this is worth documenting in the Proxmox wiki?
First part was to remove the node from Ceph.
- From the Web UI, I went to Ceph, then Monitor, and removed the Ceph monitor daemon on that node.
- I then went to Ceph, CephFS, and removed the MDS (Metadata Server) that was on that node as well.
- I then went to Ceph, OSD and for the four OSDs on that node, I clicked on "Out", then "Stop", then "Destroy".
Second part was to remove the node (following the steps
here):
I ran
pvecm nodes to list the nodes:
Code:
root@vwnode2:~# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 10.7.17.3
2 1 10.7.17.4 (local)
3 1 10.7.17.5
I then shutdown the node.
There are two quirks (or maybe bugs?) here:
- When you run pvecm nodes again, with the node shutdown, that shutdown node doesn't appear at all - even as an "offline" mode. There doesn't appear to be any flags to show it?
- The value in the "Name" column is the IP address, but not the actual "name" of the node you need to pass to pvecm delnode. It's just I happened to know what it was in this case.
Does anybody know if this is expected behaviour, or if I should file bugs for these two?
Code:
After the node is shutdown, I then delete it:
root@vwnode2:~# pvecm delnode vwnode1
Killing node 1
The third part is removing the SSH key -
alexskysilk@ mentioned above about removing it from
/root/.ssh/known_hosts - however, in my case, this file didn't exist. Because I am running a Proxmox cluster, hte known_hosts file actually seems to be in the shared PVE cluster filesystem. Hence, on one of the remaining two nodes, I edited
/etc/pve/priv/known_hosts, and removed the line for that node.
I then restarted the node, and reinstalled Proxmox from scratch.
(I did notice at this point I was getting a whole bunch of errors about invalid options in a
cephpoolname_cephfs.secret file at the console. Not sure why this was, but it seemed to go away once I got Ceph running again).
I booted it up, configured the network interfaces and static IPs as before (including for my Ceph interfaces), and was able to rejoin the cluster.
For Ceph, it was simply running:
Code:
pveceph install
pveceph init
I then re-created the Ceph monitor daemon and CephFS MDS via the Web UI.
For the OSDs, I had to do this manually via the command-line, as I wanted four OSDs per disk (which isn't possible via the Web UI - maybe that's a feature request).
Code:
ceph-volume lvm zap --destroy /dev/nvme0n1
ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1