[SOLVED] No quorum error

Dec 17, 2021
37
2
13
59
Hi all.

I was having a cluster composed by two VE servers.

One of the servers suffer a HD problem and died.

Now I'm blocked in a lot of operation wit error "cluster not ready - no quorum?"

I have replaced the HD in the failing cluster and reinstalled proxmox, but when trying to join the cluster I get the same error.

Is there a way to avoid the no quorum error or to manually remove the old failed server from the cluster?

Thanks in advance.
 
In you running node CLI in put the code
xxxx it's the died node name, can found the running node web GUI left

Code:
pvecm delnode xxxx
 
I found that running
pvecm expect 1
Reduces the number of votes needed an removes the issue I was having in most operations.

Still, if I try to remove the node I get a different error
Killing node 2
Could not kill node (error = CS_ERR_NOT_EXIST)
command 'corosync-cfgtool -k 2' failed: exit code 1
Looks like the node is not existing (!?)
 
What's the output of pvecm status?
 
Now it looks like that:
root@farm1:~# pvecm status
Cluster information
-------------------
Name: SpincoCluster
Config Version: 4
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Wed Aug 10 08:48:09 2022
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.11
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 xx.xx.xx.xx (local)
0x00000002 1 xx.xx.xx.xx

I hide the IP's.
 
I found that running

Reduces the number of votes needed an removes the issue I was having in most operations.

Still, if I try to remove the node I get a different error

Looks like the node is not existing (!?)

Just as a pointer in anyone with the same issue find this thread.

In my case the solution was to use
pvecm expect 1
This command reduces the number of votes needed for quorum.
By reducing it to 1 I was able to run commands again.
 
So, I found this post on server fault

https://serverfault.com/questions/433356/proxmox-ve-cluster-not-working

And it refers to this documentation which, doesn't appear to have anything to do with proxmox

https://access.redhat.com/documenta...ter_administration/s1-creating-cluster-cli-ca

But it does mention that lines as below, would allow a cluster to have quorum with just one vote

<cman two_node="1" expected_votes="1"/>

<cluster name="mycluster" config_version="2">
<cman two_node="1" expected_votes="1"/>

But there is no cluster.conf file to be found on proxmox so ....

I did find somewhere else, about the corosync.conf file

/etc/pve/corosync.conf

And in that, it appears to have a number of votes allowed per node, so I'll try giving all 3 nodes, 3 votes each. So even one node would still work.

My cluster goes to sleep when one computer can do everything.

Since they idle often and suck 200 watts at idle !

Strangely, corosync.conf cannot be edited, not even by root !

Ah, you need to run "pvecm expect 1" before you can edit corosync.conf


Oh no, I broke it probably for good this time

pvecm expect 1
Unable to set expected votes: CS_ERR_INVALID_PARAM
 
Ok,
For some reason now I need to "expect 3"
Because I tried giving 3 votes to each node !


Code:
root@proxmox:~# pvecm expect 1
Unable to set expected votes: CS_ERR_INVALID_PARAM
root@proxmox:~# pvecm expect 2
Unable to set expected votes: CS_ERR_INVALID_PARAM
root@proxmox:~# pvecm expect 3
root@proxmox:~#


Even though I changed the corosync.conf file back to quorum_votes: 1
After reboot now it always goes back to quorum_votes: 3 !

Here's what corosync.conf looks like now

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: proxmox
    nodeid: 1
    quorum_votes: 3
    ring0_addr: 192.168.1.20
  }
  node {
    name: proxmox2
    nodeid: 2
    quorum_votes: 3
    ring0_addr: 192.168.1.21
  }
  node {
    name: proxmox3
    nodeid: 3
    quorum_votes: 3
    ring0_addr: 192.168.1.23
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: mycluster
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


So I asked chatgpt for help fixing that problem so I can get back to what I was actually trying to do !

Here is what it said

Me:

Code:
In proxmox, I have a cluster where all the other nodes are temporarily disabled. 
Therefore I cannot log in because my cluster isn't quorate. 
How can I make my current proxmox cluster work with only one node up ?

chatgpt:

Code:
If you want to make your Proxmox cluster work with only one node, you need to temporarily disable the cluster functionality on that node. This will allow you to log in and manage the virtual machines on that node, even though the other nodes are down.

To disable the cluster functionality on a Proxmox node, follow these steps:

Log in to the Proxmox Web interface on the node that you want to make active.

Go to the Cluster menu and select the "Corosync" tab.

Locate the "Cluster Configuration" section and click the "Edit" button.

In the "Cluster Configuration" dialog, set the "Cluster Mode" option to "Stand-alone node".

Click the "Save" button to apply the changes.

Restart the cluster service by running the following command on the node:




systemctl restart pve-cluster


After the cluster service has restarted, you should be able to log in to the Proxmox Web interface and manage the virtual machines on that node.

Note: Keep in mind that the virtual machines on the other nodes will not be available until the cluster functionality is re-enabled and those nodes are brought back online. Additionally, any changes made to the virtual machines on the active node will not be synchronized with the other nodes until the cluster functionality is re-enabled.


Me :
Code:
I would like instead for the cluster to be quorate even if only one
node is present. This is because the other nodes only turn on as
needed and I do not want to leave and join the cluster manually
each time a server turns on or off

chatgpt:


Code:
You can configure your Proxmox cluster to be quorate even if only one node is present by changing the "Quorum Policy" in the cluster configuration. By default, Proxmox requires a majority of nodes to be present in the cluster in order for it to be quorate. However, you can change this behavior by setting the "Quorum Policy" to "ignore".

Here's how you can change the quorum policy in Proxmox:

Log in to the Proxmox Web interface on any node in the cluster.

Go to the Cluster menu and select the "Corosync" tab.

Locate the "Cluster Configuration" section and click the "Edit" button.

In the "Cluster Configuration" dialog, set the "Quorum Policy" option to "ignore".

Click the "Save" button to apply the changes.

Restart the cluster service by running the following command on all nodes in the cluster:


systemctl restart pve-cluster


After making these changes, your Proxmox cluster will be quorate even if only one node is present. This means that you will be able to log in and manage the virtual machines on that node without having to manually join and leave the cluster each time a server turns on or off.

Me :
Code:
How can I do all of that using the ssh console access ?
Which files can I edit by hand to perform this task ?
Because I cannot log in into the web interface while the
cluster isn't quorate, I just get a login failed error


chatgpt:

Code:
You can configure the Proxmox cluster's quorum policy using the command line via SSH. The cluster configuration is stored in the file /etc/pve/corosync.conf on each node.

Here's how you can change the quorum policy in the /etc/pve/corosync.conf file:

Log in to the Proxmox node via SSH.

Open the /etc/pve/corosync.conf file using your preferred text editor, for example:


bash


nano /etc/pve/corosync.conf


Locate the section for the "Quorum Policy" and change the value to "ignore". The section should look like this:

yaml


quorum {
provider: corosync_votequorum
expected_votes: 1
two_node: 0
wait_for_all: 0
}


Save the changes to the /etc/pve/corosync.conf file and exit the text editor.

Restart the cluster service by running the following command:




systemctl restart pve-cluster


After making these changes, your Proxmox cluster will be quorate even if only one node is present. This means that you will be able to log in and manage the virtual machines on that node without having to manually join and leave the cluster each time a server turns on or off.
 
I had this error with me and this solution helped me a lot:
scp that files from the node that working fine in your cluster to the node that have an issues such as (proxmox no quorum 500)
scp -r /etc/corosync/* root@xx.xx.xx.xx:/etc/corosync/
scp /etc/pve/corosync.conf root@xx.xx.xx.xx:/etc/pve/
systemctl restart pve-cluster
 
Oh, you only have 2 node and 1 it's died

can view that
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_separate_node_without_reinstal

Code:
systemctl stop pve-cluster corosync
pmxcfs -l
rm /etc/corosync/*
rm /etc/pve/corosync.conf
killall pmxcfs
systemctl start pve-cluster
proxmox is designed by diletants, to put node aivalability and managment into one protocol is not even acceptable in home systems sometimes, there are dozens of structural design faults what makes the system totally not usable for enterprise scenarios.
 
proxmox is designed by diletants, to put node aivalability and managment into one protocol is not even acceptable in home systems sometimes, there are dozens of structural design faults what makes the system totally not usable for enterprise scenarios.

I love how you use diletants to describe the designers, yet manage to mispell availability and management in one fell swoop. At the same time,using a run-on sentance that leads to incorrect grammar "faults what makes the system". You are using protocol in the wrong reference. It isn't a protocol. It's not as if they have embedded "node aivalability and managment [sic]" into TCP/IP. Amazing that just with that one run-on sentance I can find a true diletant behind the keyboard.
 
I love how you use diletants to describe the designers, yet manage to mispell availability and management in one fell swoop

Did you just misspell mispell? ;)

It isn't a protocol. It's not as if they have embedded "node aivalability and managment [sic]" into TCP/IP.

I think he was referring to the situation
Code:
In proxmox, I have a cluster where all the other nodes are temporarily disabled.
Therefore I cannot log in because my cluster isn't quorate.
How can I make my current proxmox cluster work with only one node up ?

chatgpt:

Code:
If you want to make your Proxmox cluster work with only one node, you need to temporarily disable the cluster functionality on that node. This will allow you to log in and manage the virtual machines on that node, even though the other nodes are down.

It doesn't matter what spelling or terms they use, not everyone is native or an architect.
 
  • Like
Reactions: phantom5941
Did you just misspell mispell? ;)



I think he was referring to the situation


It doesn't matter what spelling or terms they use, not everyone is native or an architect.

Amazing that just with that one run-on sentance I can find a true diletant behind the keyboard.

What a wreck of a thread lmao. Guy 1 is rude to proxmox devs and can't spell. Guy 2 grills guy 1 for not spelling correctly and then proceeds to misspell multiple times in the grilling.

Good lord.
 
  • Like
Reactions: phantom5941

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!