4 node cluster with a qdevice. This is a homelab.
Well, I really messed it up. I had an OS ssd fail on a node. No problem, reinstalled the OS on a fresh SSD, came back, gave it a new name and IP. added back in the OSDs all was good. But I didn't delete the old node. Then I noticed the versions didn't match in ceph since it was a fresh install made sense so I did an apt upgrage alll. Not thinking I did this on all the nodes at the same time. I left for a few hours and came back and assumed it was done.
Ugh, two of the nodes won't come back online. I attached a monitor and they show a ton of ceph errors. Can't ping them from other machines. So now the cluster thinks I have 3 nodes down and is in a really bad state.
any ideas on how to proceed?
Well, I really messed it up. I had an OS ssd fail on a node. No problem, reinstalled the OS on a fresh SSD, came back, gave it a new name and IP. added back in the OSDs all was good. But I didn't delete the old node. Then I noticed the versions didn't match in ceph since it was a fresh install made sense so I did an apt upgrage alll. Not thinking I did this on all the nodes at the same time. I left for a few hours and came back and assumed it was done.
Ugh, two of the nodes won't come back online. I attached a monitor and they show a ton of ceph errors. Can't ping them from other machines. So now the cluster thinks I have 3 nodes down and is in a really bad state.
any ideas on how to proceed?