Node seems to no longer be in quorum

Apr 15, 2017
3
0
1
56
So, I have a node that dropped off the network last week. Turns out the switch port went bad.

I got it plugged into a different 10 gig port, and brought it back up. But, corasync kept restarting, and the node kept joining and leaving the quorum. I am guessing that perhaps the NIC is also flaky, and wanted to replace it -- but when I did that, corasync wouldn't see the other nodes. So, I put the suspect NIC back, and all was good (except for the flapping). So, I started moving the (shutdown) LXC containers and QEMU VMs to the other nodes.

OK. Weekend is over. I come in and find that corasync on the problem node no longer sees the other nodes in the quorum (and vice versa). I can't startup anything (because there isn't a quorum). I am trying to figure the best way to move everything off the problem node (since migrate obviously won't work), so that I can remove the node, replace the NIC, and rejoin it to the cluster. (Oh, probably also take opportunity to move from 4.3 to 4.4).

Any suggestions would be appreciated. Google tells me of people who have tried moving the files -- only to find that the GUI doesn't recognize the moved containers/VMs. I saw something about using backup -- but you need to define a target storage that isn't used for anything else. I also saw some post about using a USB drive to move everything between the nodes. Is that really my best option?
 
I got it plugged into a different 10 gig port, and brought it back up. But, corasync kept restarting, and the node kept joining and leaving the quorum. I am guessing that perhaps the NIC is also flaky, and wanted to replace it -- but when I did that, corasync wouldn't see the other nodes. So, I put the suspect NIC back, and all was good (except for the flapping). So, I started moving the (shutdown) LXC containers and QEMU VMs to the other nodes.

Managed switch? Can you do the omping checks on this cluster: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#cluster-network-requirements

OK. Weekend is over. I come in and find that corasync on the problem node no longer sees the other nodes in the quorum (and vice versa). I can't startup anything (because there isn't a quorum). I am trying to figure the best way to move everything off the problem node (since migrate obviously won't work), so that I can remove the node, replace the NIC, and rejoin it to the cluster. (Oh, probably also take opportunity to move from 4.3 to 4.4).

Could you give me the output of:
Code:
journalctl -u corosync
# redact public IP addresses from the next commands
pvecm status
cat /etc/pve/corosync.conf
ip  addr
 
Any suggestions would be appreciated. Google tells me of people who have tried moving the files -- only to find that the GUI doesn't recognize the moved containers/VMs. I saw something about using backup -- but you need to define a target storage that isn't used for anything else. I also saw some post about using a USB drive to move everything between the nodes. Is that really my best option?

Are the storages shared? If so you can move the config file in the quorate part of the cluster manually, after you ensured that said VM/CT really does not run on the problematic node anymore.
If the VM/CT backing storage isn't shared and accessible between the other cluster nodes you could use backup/restore procedure.
If, at some point this procedure fails as of lacking quorum you can use `pvecm expected 1` to tell corosync that it's ok to run with just its own vote.
Do not alter the state in any cluster affecting way from this point on, i.e. no pvecm add or VM/CT creation (on the problematic node).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!