Recovering from failed OS disk

obsolete

New Member
Mar 23, 2021
21
2
3
55
Hello,

I have a 3 node Proxmox cluster running on top of NUC hardware. Each node has a SATA SSD used for the OS, as well as a NVMe SSD used for CEPH.

Recently, the disk used for the OS on one of the nodes failed - it's completely dead. I was initially going to attempt to recover the disk using gdisk/fsck, but the disk isn't recognized by any of the myriad tools used for recovery.

My current plan is to drop a new SSD in to the node and re-install PVE on it, but I'm left with a host of questions...

  1. Do I need to decommission and remove the original node from PVE first?
    • Related: the CEPH disk is, as far as I know, fine. The monitor would (should) have been configured on the failed disk - so how would I go about recovering the existing CEPH disk once PVE is installed?
  2. My other nodes are currently running PVE 6.4-4... am I going to have the greatest chance for successful recovery by installing 6.4-4 on the failed node? Especially given the considerations regarding CEPH?
  3. Are there any general best practices, or other things I should know before proceeding?

Thanks in advance!
 
1. i would do that.
1.a)ceph:search in the www
2. yes
3. i dont know