Proxmox VE 4.0 - Delete Cluster + Ceph Node

pjkenned

Renowned Member
Dec 16, 2013
26
1
68
I have been working on a few STH articles on Proxmox VE 4.0. (e.g. http://www.servethehome.com/add-raid-1-mirrored-zpool-to-proxmox-ve/ and http://www.servethehome.com/proxmox...ceph-osd-option-being-unavailable-grayed-out/ as examples) Absolutely great job on 4.0. It is absolutely awesome how well the cluster is performing.

I did run into a minor issue with the test cluster. The 4-node cluster has 3x Intel Xeon D-1540 nodes and 1x Intel Xeon E5 V3 node (fmt-pve-01). All four were running Ceph. The "big" fmt-pve-01 node had a double Kingston V200 SSD 240GB failure within 72 hours which took out the ZFS mirror boot volume.

Proxmox-VE-Ceph-OSD-listing-600x365.jpg

That leaves the other three nodes which can have a quorum active. I do have two more nodes ready to join, but I do not want to proceed and mess up the cluster further. With a no-Ceph cluster I would normally just remove the PVE node from the cluster. I would then install new boot drives and then I would re-join the node to the cluster. That is not too hard. What I am wondering/ worried about is the addition of Ceph to the cluster.

My questions are:
1. Do I need to do something to remove the node/ OSDs from the Ceph config before removing the node from the cluster? Or does Promox take care of the Ceph config when I pvecm delnode fmt-pve-01?
2. I do have two more nodes ready to join with additional disks. Would it be best to add these nodes to the Proxmox/ Ceph cluster before removing the first node?

Any tips would be appreciated! Thank you again.

Patrick
 
I have been working on a few STH articles on Proxmox VE 4.0. (e.g. http://www.servethehome.com/add-raid-1-mirrored-zpool-to-proxmox-ve/ and http://www.servethehome.com/proxmox...ceph-osd-option-being-unavailable-grayed-out/ as examples) Absolutely great job on 4.0. It is absolutely awesome how well the cluster is performing.

I did run into a minor issue with the test cluster. The 4-node cluster has 3x Intel Xeon D-1540 nodes and 1x Intel Xeon E5 V3 node (fmt-pve-01). All four were running Ceph. The "big" fmt-pve-01 node had a double Kingston V200 SSD 240GB failure within 72 hours which took out the ZFS mirror boot volume.

View attachment 3060

That leaves the other three nodes which can have a quorum active. I do have two more nodes ready to join, but I do not want to proceed and mess up the cluster further. With a no-Ceph cluster I would normally just remove the PVE node from the cluster. I would then install new boot drives and then I would re-join the node to the cluster. That is not too hard. What I am wondering/ worried about is the addition of Ceph to the cluster.

My questions are:
1. Do I need to do something to remove the node/ OSDs from the Ceph config before removing the node from the cluster? Or does Promox take care of the Ceph config when I pvecm delnode fmt-pve-01?
Hi Patrick,
pvecm do the job for the pvecluster only - and don't change anything on ceph.
2. I do have two more nodes ready to join with additional disks. Would it be best to add these nodes to the Proxmox/ Ceph cluster before removing the first node?
in anyway the ceph-data will be transferred multible times (if you add/del your nodes one by one and wait for an healthy ceph-cluster).
But it's looks that your OSDs are quite empty - so the data movement will be not so high.
I would stop/reweight/remove (down, reweight to 0 + out) the OSDs on fmt-pve-01 one by one before removing fmt-pve-01 from the crush table.

Udo
 
Hi Patrick,
pvecm do the job for the pvecluster only - and don't change anything on ceph.

in anyway the ceph-data will be transferred multible times (if you add/del your nodes one by one and wait for an healthy ceph-cluster).
But it's looks that your OSDs are quite empty - so the data movement will be not so high.
I would stop/reweight/remove (down, reweight to 0 + out) the OSDs on fmt-pve-01 one by one before removing fmt-pve-01 from the crush table.

Udo

Udo thank you for this. fmt-pve-01 is now offline due to the failure of the mirrored SSDs. Those 4 OSDs are offline and the hostname of fmt-pve-01 cannot be resolved (it is offline.)

Due to the node being offline, Ceph still hast the monitors/ OSDs listed, but there is no way to delete via the GUI.

I am guessing I just need to manually update the CRUSH map under devices/ buckets and the config file to remove the monitor.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!