Ceph Node Down - Proper Restore Procedures

MrPaul

Active Member
Apr 27, 2019
34
0
26
50
I've just lost my root drive which was on Cisco FlexFlash. It was supposed to be raid-0 but when I forced the master switch it is completely failing to boot where as on the other disk I was getting fsck errors and unable to write to the filesystem even after a repair. For simplicity I think I'll just consider the 2 FlexFlash disks a loss at this point.

I'm expecting the first order of business will be to reinstall Proxmox but what is the proper procedure to repair/replace all the CEPH mon/osd nodes? I've got the data on my other 2 servers as I was running with triple redundancy so even though I suspect the data is still complete locally I don't mind replacing the server in it's entirety and letting CEPH rebult if that's easier.

For what it's worth I'm running on 6.2-4 and everything was up to date with patches as of about a week ago.
 
Upon further research I see that running Proxmox off the FlexFlash is less than ideal due to the write capacity of the FlexFlash. Luckily for me this is just a lab environment so I'll probably just let it (the cluster) die off.