[SOLVED] Node removed from Cluster, Cluster not ready - quorum

coyote · Nov 19, 2020

Hi,
i have a 2 node Cluster.
I now had to remove a node. I did that too, the node no longer appears in the GUI. However, I can no longer start VM / LXC on the first node.
The problem is the quorum, so I tried to solve it with "pvecm expect 1", but without success. I can't get the cluster to work anymore ...
Also tried to add a new node to the cluster to make it work again. But Proxmox does not allow it, also because of quorum

Code:

root@pve:~# pvecm status
Cluster information
-------------------
Name:             CoyoteHome
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov 19 09:10:06 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.805
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.66.24 (local)

dylanw · Nov 19, 2020

Did you follow all the steps in the documentation [1] when removing the node?

coyote said:
The problem is the quorum, so I tried to solve it with "pvecm expect 1", but without success.

Do you receive any particular error message when you enter the mentioned command?

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node

coyote · Nov 19, 2020

dylanw said:
Did you follow all the steps in the documentation [1] when removing the node?

Do you receive any particular error message when you enter the mentioned command?

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node

Yes, I tried, but I couldn't delete the node when it was turned off. I got a quorum error.
So I deleted it when it was online, that went without any problems.
It is still available under / etc / pve / nodes, but it is deleted in the GUI. Under HA in the group is also the name of the node.

No, there is no error when I enter "pvecm e 1".

Here is syslog:

Code:

Nov 19 11:25:09 pve pvesr[2686]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:25:10 pve pvesr[2686]: error during cfs-locked 'file-replication_cfg' operation: no quorum!
Nov 19 11:25:10 pve systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Nov 19 11:25:10 pve systemd[1]: pvesr.service: Failed with result 'exit-code'.
Nov 19 11:25:10 pve systemd[1]: Failed to start Proxmox VE replication runner.
Nov 19 11:25:20 pve systemd[1]: session-836.scope: Succeeded.
Nov 19 11:25:20 pve pvedaemon[7383]: <root@pam> end task UPID:pve:00007D28:06100E85:5FB61B13:vncshell::root@pam: OK
Nov 19 11:25:21 pve pveproxy[18148]: worker exit
Nov 19 11:25:53 pve pvedaemon[3047]: starting termproxy UPID:pve:00000BE7:0621AE31:5FB64831:vncshell::root@pam:
Nov 19 11:25:53 pve pvedaemon[7383]: <root@pam> starting task UPID:pve:00000BE7:0621AE31:5FB64831:vncshell::root@pam:
Nov 19 11:25:53 pve pvedaemon[7262]: <root@pam> successful auth for user 'root@pam'
Nov 19 11:25:53 pve systemd[1]: Started Session 847 of user root.
Nov 19 11:26:00 pve systemd[1]: Starting Proxmox VE replication runner...
Nov 19 11:26:01 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:02 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:03 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:04 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:05 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:06 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:07 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:08 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...
Nov 19 11:26:09 pve pvesr[3104]: trying to acquire cfs lock 'file-replication_cfg' ...

Do you need another logs?

dylanw · Nov 19, 2020

Could you post the output of pvecm status?

coyote · Nov 19, 2020

dylanw said:
Could you post the output of pvecm status?

Ehm, first post ;-)

dylanw · Nov 19, 2020

ah sorry... my head is somewhere else... Could you post the output of cat /etc/pve/corosync.conf? Perhaps there are some clues in there

coyote · Nov 19, 2020

Hm, i don't know what Quorum 192.168.66.67 is...
It could be a container on my NAS with qdevice, but this container no longer exists.
Maybe this is the problem...

coyote · Nov 19, 2020

To test whether the problem is with the qdevice, I created a new container on my new node with the IP 192.168.66.67 and installed qdevice. Unfortunately, pvecm status remains unchanged.

pvecm status:

Code:

root@pve:~# pvecm status
Cluster information
-------------------
Name:             CoyoteHome
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov 19 21:11:16 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.814
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:          

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.66.24 (local)
root@pve:~# pvecm status
Cluster information
-------------------
Name:             CoyoteHome
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Nov 19 21:15:23 2020
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.814
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:          

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.66.24 (local)

Any ideas?

coyote · Nov 20, 2020

@dylanw no idea?

dylanw · Nov 20, 2020

Sorry for the delayed response. I've been setting up and testing a similar scenario on my system, to try and understand if there's a way to fix the problem.

The cluster must have gotten messed up when you deleted the node while the qdevice was still there. From 5.9.4 of the docs [1]:

Adding/Deleting Nodes After QDevice Setup
If you want to add a new node or remove an existing one from a cluster with a QDevice setup, you need to remove the QDevice first. After that, you can add or remove nodes normally. Once you have a cluster with an even node count again, you can set up the QDevice again as described above.

Having also lost the qdevice, without removing it from the cluster, this will have put the cluster into a position in which it expects two votes for quorum, thus blocking anything corosync related.

In general section 5.9 [1] discusses all the uses and implications of qdevices.

I'll get back to you in another 15/20 minutes, if i've managed to find a fix. But I'm not sure it will be possible.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support

coyote · Nov 20, 2020

Oh ok, hope you find a fix

dylanw · Nov 20, 2020

As I see it, there is no way to restore quorum to the cluster given the position. The next best thing you could try would be to attempt to completely remove the cluster configuration from the node and try start over. See the below forum post for details.
https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/

______________________
edit: And in addition, remove/move data from /etc/pve/nodes/node2 from node1. This could lead to some interference with vm, ct creation otherwise.

coyote · Nov 20, 2020

dylanw said:
As I see it, there is no way to restore quorum to the cluster given the position. The next best thing you could try would be to attempt to completely remove the cluster configuration from the node and try start over. See the below forum post for details.
https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/

______________________
edit: And in addition, remove/move data from /etc/pve/nodes/node2 from node1. This could lead to some interference with vm, ct creation otherwise.

Done these steps.
Seems to work, i can start VM/LXC on node 1.
Now I can create a new cluster and add the new node, right?

pvecm status:

Code:

root@pve:~# pvecm status
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_ADDRESS = "de_DE.UTF-8",
    LC_NAME = "de_DE.UTF-8",
    LC_MONETARY = "de_DE.UTF-8",
    LC_PAPER = "de_DE.UTF-8",
    LC_IDENTIFICATION = "de_DE.UTF-8",
    LC_TELEPHONE = "de_DE.UTF-8",
    LC_MEASUREMENT = "de_DE.UTF-8",
    LC_TIME = "de_DE.UTF-8",
    LC_NUMERIC = "de_DE.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

dylanw · Nov 20, 2020

This guide will show you how to reset your locals: https://www.thomas-krenn.com/en/wiki/Perl_warning_Setting_locale_failed_in_Debian

coyote said:
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

And this error message is due to the fact that you have removed your cluster.
If you want to create a new one, follow the documentation again to set it up: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_create_cluster

coyote · Nov 20, 2020

Ok, thank you. Solved

dylanw · Nov 20, 2020

Happy to hear it

Search

Search

[SOLVED] Node removed from Cluster, Cluster not ready - quorum

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

coyote

Well-Known Member

coyote

Well-Known Member

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

Adding/Deleting Nodes After QDevice Setup

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

coyote

Well-Known Member

dylanw

Proxmox Retired Staff

[SOLVED] Node removed from Cluster, Cluster not ready - quorum

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Retired Staff

Adding/Deleting Nodes After QDevice Setup​

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Adding/Deleting Nodes After QDevice Setup