So, I have a node that dropped off the network last week. Turns out the switch port went bad.
I got it plugged into a different 10 gig port, and brought it back up. But, corasync kept restarting, and the node kept joining and leaving the quorum. I am guessing that perhaps the NIC is also flaky, and wanted to replace it -- but when I did that, corasync wouldn't see the other nodes. So, I put the suspect NIC back, and all was good (except for the flapping). So, I started moving the (shutdown) LXC containers and QEMU VMs to the other nodes.
OK. Weekend is over. I come in and find that corasync on the problem node no longer sees the other nodes in the quorum (and vice versa). I can't startup anything (because there isn't a quorum). I am trying to figure the best way to move everything off the problem node (since migrate obviously won't work), so that I can remove the node, replace the NIC, and rejoin it to the cluster. (Oh, probably also take opportunity to move from 4.3 to 4.4).
Any suggestions would be appreciated. Google tells me of people who have tried moving the files -- only to find that the GUI doesn't recognize the moved containers/VMs. I saw something about using backup -- but you need to define a target storage that isn't used for anything else. I also saw some post about using a USB drive to move everything between the nodes. Is that really my best option?
I got it plugged into a different 10 gig port, and brought it back up. But, corasync kept restarting, and the node kept joining and leaving the quorum. I am guessing that perhaps the NIC is also flaky, and wanted to replace it -- but when I did that, corasync wouldn't see the other nodes. So, I put the suspect NIC back, and all was good (except for the flapping). So, I started moving the (shutdown) LXC containers and QEMU VMs to the other nodes.
OK. Weekend is over. I come in and find that corasync on the problem node no longer sees the other nodes in the quorum (and vice versa). I can't startup anything (because there isn't a quorum). I am trying to figure the best way to move everything off the problem node (since migrate obviously won't work), so that I can remove the node, replace the NIC, and rejoin it to the cluster. (Oh, probably also take opportunity to move from 4.3 to 4.4).
Any suggestions would be appreciated. Google tells me of people who have tried moving the files -- only to find that the GUI doesn't recognize the moved containers/VMs. I saw something about using backup -- but you need to define a target storage that isn't used for anything else. I also saw some post about using a USB drive to move everything between the nodes. Is that really my best option?