Recovering from failed OS disk

obsolete

New Member
Mar 23, 2021
21
2
3
54
Hello,

I have a 3 node Proxmox cluster running on top of NUC hardware. Each node has a SATA SSD used for the OS, as well as a NVMe SSD used for CEPH.

Recently, the disk used for the OS on one of the nodes failed - it's completely dead. I was initially going to attempt to recover the disk using gdisk/fsck, but the disk isn't recognized by any of the myriad tools used for recovery.

My current plan is to drop a new SSD in to the node and re-install PVE on it, but I'm left with a host of questions...

  1. Do I need to decommission and remove the original node from PVE first?
    • Related: the CEPH disk is, as far as I know, fine. The monitor would (should) have been configured on the failed disk - so how would I go about recovering the existing CEPH disk once PVE is installed?
  2. My other nodes are currently running PVE 6.4-4... am I going to have the greatest chance for successful recovery by installing 6.4-4 on the failed node? Especially given the considerations regarding CEPH?
  3. Are there any general best practices, or other things I should know before proceeding?

Thanks in advance!
 
1. i would do that.
1.a)ceph:search in the www
2. yes
3. i dont know
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!