[SOLVED] Node removed from Cluster, Cluster not ready - quorum

coyote

Well-Known Member
Dec 4, 2018
59
3
48
41
Hi,
i have a 2 node Cluster.
I now had to remove a node. I did that too, the node no longer appears in the GUI. However, I can no longer start VM / LXC on the first node.
The problem is the quorum, so I tried to solve it with "pvecm expect 1", but without success. I can't get the cluster to work anymore ...
Also tried to add a new node to the cluster to make it work again. But Proxmox does not allow it, also because of quorum

Code:
root@pve:~# pvecm status
Cluster information
-------------------
Name:             CoyoteHome
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov 19 09:10:06 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.805
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.66.24 (local)
 
Did you follow all the steps in the documentation [1] when removing the node?

Do you receive any particular error message when you enter the mentioned command?

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node
Yes, I tried, but I couldn't delete the node when it was turned off. I got a quorum error.
So I deleted it when it was online, that went without any problems.
It is still available under / etc / pve / nodes, but it is deleted in the GUI. Under HA in the group is also the name of the node.

No, there is no error when I enter "pvecm e 1".


Here is syslog:

Code:
Nov 19 11:25:09 pve pvesr[2686]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:25:10 pve pvesr[2686]: error during cfs-locked 'file-replication_cfg' operation: no quorum!
Nov 19 11:25:10 pve systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Nov 19 11:25:10 pve systemd[1]: pvesr.service: Failed with result 'exit-code'.
Nov 19 11:25:10 pve systemd[1]: Failed to start Proxmox VE replication runner.
Nov 19 11:25:20 pve systemd[1]: session-836.scope: Succeeded.
Nov 19 11:25:20 pve pvedaemon[7383]: <root@pam> end task UPID:pve:00007D28:06100E85:5FB61B13:vncshell::root@pam: OK
Nov 19 11:25:21 pve pveproxy[18148]: worker exit
Nov 19 11:25:53 pve pvedaemon[3047]: starting termproxy UPID:pve:00000BE7:0621AE31:5FB64831:vncshell::root@pam:
Nov 19 11:25:53 pve pvedaemon[7383]: <root@pam> starting task UPID:pve:00000BE7:0621AE31:5FB64831:vncshell::root@pam:
Nov 19 11:25:53 pve pvedaemon[7262]: <root@pam> successful auth for user 'root@pam'
Nov 19 11:25:53 pve systemd[1]: Started Session 847 of user root.
Nov 19 11:26:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 19 11:26:01 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:02 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:03 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:04 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:05 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:06 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:07 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:08 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:09 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...

Do you need another logs?
 
Could you post the output of pvecm status?
 
ah sorry... my head is somewhere else... Could you post the output of cat /etc/pve/corosync.conf? Perhaps there are some clues in there :)
 
Last edited:
Screenshot_20201119-151917.png

Hm, i don't know what Quorum 192.168.66.67 is...
It could be a container on my NAS with qdevice, but this container no longer exists.
Maybe this is the problem...
 
To test whether the problem is with the qdevice, I created a new container on my new node with the IP 192.168.66.67 and installed qdevice. Unfortunately, pvecm status remains unchanged.

pvecm status:

Code:
root@pve:~# pvecm status
Cluster information
-------------------
Name:             CoyoteHome
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov 19 21:11:16 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.814
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:          

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.66.24 (local)
root@pve:~# pvecm status
Cluster information
-------------------
Name:             CoyoteHome
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov 19 21:15:23 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.814
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:          

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.66.24 (local)


Any ideas?
 
Sorry for the delayed response. I've been setting up and testing a similar scenario on my system, to try and understand if there's a way to fix the problem.

The cluster must have gotten messed up when you deleted the node while the qdevice was still there. From 5.9.4 of the docs [1]:

Adding/Deleting Nodes After QDevice Setup​

If you want to add a new node or remove an existing one from a cluster with a QDevice setup, you need to remove the QDevice first. After that, you can add or remove nodes normally. Once you have a cluster with an even node count again, you can set up the QDevice again as described above.
Having also lost the qdevice, without removing it from the cluster, this will have put the cluster into a position in which it expects two votes for quorum, thus blocking anything corosync related.

In general section 5.9 [1] discusses all the uses and implications of qdevices.

I'll get back to you in another 15/20 minutes, if i've managed to find a fix. But I'm not sure it will be possible.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
 
As I see it, there is no way to restore quorum to the cluster given the position. The next best thing you could try would be to attempt to completely remove the cluster configuration from the node and try start over. See the below forum post for details.
https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/

______________________
edit: And in addition, remove/move data from /etc/pve/nodes/node2 from node1. This could lead to some interference with vm, ct creation otherwise.
 
Last edited:
As I see it, there is no way to restore quorum to the cluster given the position. The next best thing you could try would be to attempt to completely remove the cluster configuration from the node and try start over. See the below forum post for details.
https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/

______________________
edit: And in addition, remove/move data from /etc/pve/nodes/node2 from node1. This could lead to some interference with vm, ct creation otherwise.
Done these steps.
Seems to work, i can start VM/LXC on node 1.
Now I can create a new cluster and add the new node, right?



pvecm status:

Code:
root@pve:~# pvecm status
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_ADDRESS = "de_DE.UTF-8",
    LC_NAME = "de_DE.UTF-8",
    LC_MONETARY = "de_DE.UTF-8",
    LC_PAPER = "de_DE.UTF-8",
    LC_IDENTIFICATION = "de_DE.UTF-8",
    LC_TELEPHONE = "de_DE.UTF-8",
    LC_MEASUREMENT = "de_DE.UTF-8",
    LC_TIME = "de_DE.UTF-8",
    LC_NUMERIC = "de_DE.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
 
Last edited:
Happy to hear it :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!