Hello,
I have a 3 node Proxmox cluster running on top of NUC hardware. Each node has a SATA SSD used for the OS, as well as a NVMe SSD used for CEPH.
Recently, the disk used for the OS on one of the nodes failed - it's completely dead. I was initially going to attempt to recover the disk using gdisk/fsck, but the disk isn't recognized by any of the myriad tools used for recovery.
My current plan is to drop a new SSD in to the node and re-install PVE on it, but I'm left with a host of questions...
Thanks in advance!
I have a 3 node Proxmox cluster running on top of NUC hardware. Each node has a SATA SSD used for the OS, as well as a NVMe SSD used for CEPH.
Recently, the disk used for the OS on one of the nodes failed - it's completely dead. I was initially going to attempt to recover the disk using gdisk/fsck, but the disk isn't recognized by any of the myriad tools used for recovery.
My current plan is to drop a new SSD in to the node and re-install PVE on it, but I'm left with a host of questions...
- Do I need to decommission and remove the original node from PVE first?
- Related: the CEPH disk is, as far as I know, fine. The monitor would (should) have been configured on the failed disk - so how would I go about recovering the existing CEPH disk once PVE is installed?
- My other nodes are currently running PVE 6.4-4... am I going to have the greatest chance for successful recovery by installing 6.4-4 on the failed node? Especially given the considerations regarding CEPH?
- Are there any general best practices, or other things I should know before proceeding?
Thanks in advance!