quorum No quorum on node

tirili

Member
Sep 19, 2018
55
1
8
50
Having created a cluster using on master node

pvecm create CLUSTER

and added a new node using

pvecm add EXTIP -ring0_addr LOCALINTIP

Please enter superuser (root) password for 'EXTIP':
Password for root@EXTIP: ********
Establishing API connection with host 'EXTIP'
The authenticity of host 'EXTIP' can't be established.
X509 SHA256 key fingerprint is XX:YY:ZZ....99.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1537991275.sql.gz'
waiting for quorum...
(waiting endless, until pressed Ctrl+C)

the behaviour get weird.
On my master node the filesystem for /etc/pve went read only, so I cannot change any firewall rules.

# chmod +rw /etc/pve/firewall/cluster.fw
chmod: changing permissions of '/etc/pve/firewall/cluster.fw': Operation not permitted


On the new node, the /etc/pve is really empty

# find /etc/pve
/etc/pve
/etc/pve/.debug
/etc/pve/local
/etc/pve/.version
/etc/pve/.rrd
/etc/pve/.vmlist
/etc/pve/openvz
/etc/pve/lxc
/etc/pve/.clusterlog
/etc/pve/qemu-server
/etc/pve/.members
/etc/pve/corosync.conf

and, authorized:keys got cleared, as we have:

~/.ssh # ll
total 16
lrwxrwxrwx 1 root root 29 Sep 26 20:59 authorized_keys -> /etc/pve/priv/authorized_keys
-rw-r----- 1 root root 117 Sep 26 20:59 config
-rw------- 1 root root 1675 Sep 26 20:59 id_rsa
-rw-r--r-- 1 root root 392 Sep 26 20:59 id_rsa.pub
-rw-r--r-- 1 root root 389 Sep 26 21:44 known_hosts


How can I force / create / activate a quorum on my master node, to get it up and running again.

How can I fix this, as I do not have any shared storage.
 
On new node I have:

# pvecm status
Quorum information
------------------
Date: Wed Sep 26 22:23:19 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2/12728
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 10.40.20.84 (local)


on master node I have:

# pvecm status
Quorum information
------------------
Date: Wed Sep 26 22:23:55 2018
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/12944
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.40.20.92 (local)

Any help is highly appreciated.
 
Ok, I got it!

systemctl stop pve-cluster.service
# make local pmxcfs writeable on local node
/usr/bin/pmxcfs -l
touch /etc/pve/firewall/cluster.fw # now it works again!

# Set quorum to 1
pvecm expected 1

Still the question - How can I define a quorum for my master node, e.g. an iSCSI disk?
https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster is helpful but currently not sufficient.
 
Still the question - How can I define a quorum for my master node, e.g. an iSCSI disk?

What do you mean with SCSI disk for quorum? Quroum disk where abandoned for good. They do not exist anymore and hat lots of issues.

If you goal is to make a 2 node cluster HA or simple want to keep quorum if one node goes done then it may be worth to take a look at QDevice for corosync, which is a daemon (called qnetd) than can run on a third (non-pve) node, as long it's Linux there (FreeBSD had corosync once too, just not sure if its version is recent enough...). It communicates over TCP (not UDP multicast) and does not has the latency restriction corosync has. If the quorum configuration changes (i.e., node goes down, or the two nodes "don't see each other") then the other queries the qdevice and it will make a decision which nodes has quorum, naturally only if the qdevice connection is still up.

In a reply of mine I explain a few other steps: https://pve.proxmox.com/pipermail/pve-devel/2017-July/027732.html
This is a bit of work as we have not yet integrated tooling, but works quite well for simple 2 nodes (or, for that matter, X-node, for any even X (2, 4, 6 ...)).
 
Hello Thomas,

this sounds promising. Do you have some more information than on the Post mentioned?
A mini-howto might be helpful, e.g. which commands and firewall settings do I have to run on an external 3rd node (e.g. debian) to privide such a qnet quorum, and which commands do I have to run in the pve cluster?

Best regards
Thomas
 
Hi Thomas :)

Hmm, there are some man pages from this tools
man corosync-qdevice
man corosync-qnetd
(also web accessible, e.g.: https://www.systutorials.com/docs/linux/man/8-corosync-qdevice/ )

Process flow is, more or less:

* install packages, corosync-qdevice on the PVE nodes, corosync-qnetd on the node which will support quorum.

* setup certificates, as this works over client certificated based authentication, for this a helper which automates stuff exists.
Code:
/usr/sbin/corosync-qdevice-net-certutil -Q -n Cluster qnetd_server node1 node2 ... nodeN
if you just run corosync-qdevice-net-certutil it prints out all steps done by this helper, good to read once IMO.
(It's suggested that during this setup the PVE hosts and the qnetd hosting server can connect to each other over public key auth, as else you will need to enter each servers password around 8 times...

* ensure that all daemons are started on the respective nodes/server

* add a entry to the corosync config's quorum section, you should get all necessary entries from reading the corosync-qdevice man page, my quorum section looks currently:
Code:
quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 192.168.30.15
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}
to do that sanely please read: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#edit-corosync-conf

after that you should be done.

If you never did this I highly recommend to test it first. The simplest way (i.e., without consuming additional hardware) would be creating three VMs, install PVE in two of them (nested setup) and a Debian in another one.
And test it there out. With this you can also simply simulate powerloss (stop a VM) or network loss (set the network interface as disconected through the PVE gui).
 
Hello Thomas,

unfortuenately I always get:

# /usr/sbin/corosync-qdevice -d -f
Initializing votequorum
shm size:1048589; real_size:1052672; rb->word_size:263168
shm size:1048589; real_size:1052672; rb->word_size:263168
shm size:1048589; real_size:1052672; rb->word_size:263168
Initializing local socket
Registering qdevice models
Configuring qdevice
Configuring master_wins
Getting configuration node list
Initializing qdevice model
Initializing qdevice_net_instance
Registering algorithms
Initializing NSS
Cast vote timer remains stopped.
Initializing cmap tracking
Waiting for ring id
Votequorum nodelist notify callback:
Ring_id = (1.14)
Node list (size = 1):
0 nodeid = 1
Algorithm decided to not send list and result vote is No change
Votequorum quorum notify callback:
Quorate = 0
Node list (size = 2):
0 nodeid = 1, state = 1
1 nodeid = 0, state = 0
Algorithm decided to not send list and result vote is No change
Running qdevice model
Executing qdevice-net
Trying connect to qnetd server 1.2.3.4:5403 (timeout = 8000ms)
Sending preinit msg to qnetd
Received preinit reply msg
Connect timeout
Algorithm result vote is NACK
Cast vote timer is now scheduled every 5000ms voting NACK.
Sleeping for 835 ms before reconnect


and on qnetd server I have

# corosync-qnetd -d -f
Oct 01 13:20:32 debug Initializing nss
Oct 01 13:20:32 debug Initializing local socket
Oct 01 13:20:32 debug Creating listening socket
Oct 01 13:20:32 debug Registering algorithms
Oct 01 13:20:32 debug QNetd ready to provide service
Oct 01 13:36:52 error Unhandled error when reading from client. Disconnecting client (-5938): Encountered end of file
Oct 01 13:36:52 debug Client ::ffff:1.2.3.1:42998 (init_received 0, cluster pcluster, node_id 0) disconnect
Oct 01 13:37:01 error Unhandled error when reading from client. Disconnecting client (-5938): Encountered end of file
Oct 01 13:37:01 debug Client ::ffff:1.2.3.1:43000 (init_received 0, cluster pcluster, node_id 0) disconnect

The Clientcertificates were exchanged....
 
si el problema es esto:

No se puede inicializar el servicio CMAP
Quórum: X Actividad bloqueada (donde X es un número)

esta es la solucion:

Obviamente hay un problema con la sincronización del estado del clúster. Busqué información en Internet y encontré una solución. El nodo con el primer error se puede reparar después de ejecutar el siguiente comando en el nodo principal o maestro:


1
2
3
4
5
6
7
$ systemctl restart pve-cluster.service

$ systemctl restart pvedaemon.service

$ systemctl restart pveproxy.service

$ systemctl restart corosync.service

Y el segundo tipo de nodo incorrecto, debe ejecutar las siguientes instrucciones para reparar:

$ pvecm expected 1

$ systemctl restart pve-cluster.service

$ systemctl restart pvedaemon.service

$ systemctl restart pveproxy.service

$ systemctl restart corosync.service

estas son las referencias, puede ayudar:

https://godleon.github.io/blog/Proxmox/Proxmox-Fix-No-Quorum-issue/
http://blog.pulipuli.info/2014/08/proxmox-ve-fix-proxmox-ve-cluster-not.html#menu-anchor
https://www.hostloc.com/thread-394364-1-1.html
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!