PVE5 and quorum device

Sep 27, 2018
6
0
1
49
Got this all working now, thx for the help.

Current setup: 2 x Proxmox (intel i5 8400 6 core with 32GB ram) using GlusterFS as shared storage and a raspberry pi as witness for proxmox and arbiter for GlusterFS.

The only thing i didn't manage to get working is OpenVSwitch 2 nic LACP bond with vlans, but i'll create another post for that.

Kind regards,
Eric
 

de Thysebaert

New Member
Mar 12, 2017
28
2
3
62
Hi,
I try to setup a new witness server, the first one installed is down.
I run the following command :
Code:
corosync-qdevice-net-certutil -Q -n proxmox 172.16.0.5 172.16.0.2 172.16.0.3
and get the following error :
Code:
Node 172.16.0.2 seems to be already initialized. Please delete /etc/corosync/qdevice/net/nssdb
I remove the directory on the two promox servers and run again the command ..but failed with error
Code:
corosync-qdevice-net-certutil -Q -n proxmox 172.16.0.5 172.16.0.2 172.16.0.3
bash: corosync-qnetd-certutil: command not found
/etc/corosync/qnetd/nssdb/qnetd-cacert.crt: No such file or directory
Can't open certificate file /tmp/qnetd-cacert.crt
/etc/corosync/qnetd/nssdb/qnetd-cacert.crt: No such file or directory
Can't open certificate file /tmp/qnetd-cacert.crt
Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it
/etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq: No such file or directory
bash: corosync-qnetd-certutil: command not found
/etc/corosync/qnetd/nssdb/cluster-proxmox.crt: No such file or directory
Can't open certificate file /etc/corosync/qdevice/net/nssdb/cluster-proxmox.crt
/etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12: No such file or directory
Can't open certificate file /etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12
root@prox02:/etc/corosync# /usr/sbin/corosync-qdevice-net-certutil -i
Can't open certificate file
how to solve this issue ? and reinstall the qnet-device (witness) on the new dedicaced witness server ?

thx
 

de Thysebaert

New Member
Mar 12, 2017
28
2
3
62
Hi,
I try to setup a new witness server, the first one installed is down.
I run the following command :
Code:
corosync-qdevice-net-certutil -Q -n proxmox 172.16.0.5 172.16.0.2 172.16.0.3
and get the following error :
Code:
Node 172.16.0.2 seems to be already initialized. Please delete /etc/corosync/qdevice/net/nssdb
I remove the directory on the two promox servers and run again the command ..but failed with error
Code:
corosync-qdevice-net-certutil -Q -n proxmox 172.16.0.5 172.16.0.2 172.16.0.3
bash: corosync-qnetd-certutil: command not found
/etc/corosync/qnetd/nssdb/qnetd-cacert.crt: No such file or directory
Can't open certificate file /tmp/qnetd-cacert.crt
/etc/corosync/qnetd/nssdb/qnetd-cacert.crt: No such file or directory
Can't open certificate file /tmp/qnetd-cacert.crt
Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it
/etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq: No such file or directory
bash: corosync-qnetd-certutil: command not found
/etc/corosync/qnetd/nssdb/cluster-proxmox.crt: No such file or directory
Can't open certificate file /etc/corosync/qdevice/net/nssdb/cluster-proxmox.crt
/etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12: No such file or directory
Can't open certificate file /etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12
root@prox02:/etc/corosync# /usr/sbin/corosync-qdevice-net-certutil -i
Can't open certificate file
how to solve this issue ? and reinstall the qnet-device (witness) on the new dedicaced witness server ?

thx

I respond at myself .... corosync-qnetd must be installed first on the witness server.
now all is running well

sorry for inconvenience
 

tawh

New Member
Mar 26, 2019
8
0
1
29
I'd like to report that I got the corosync-qdevice thing to work for my 2-node cluster.

Previously I was using the raspberry-pi-as-a-third-node approach which seemed like a hacky solution. The dummy node shows up in the proxmox cluster info as unusable nodes (because they are) and it blocks me from creating a new VM until I temporarily remove those dummy nodes from the corosync config and restart corosync. It wasn't really ideal.

Based on this nugget of information from pve mail post, I did the following to make this work in my 2-node cluster environment.

For context this is my environment:
  • host: proxmox (one of the nodes in my cluster)
  • host: proxmox-b (the other node in my cluster)
  • host: witness (the non-proxmox raspberry pi node that I use as a corosync 'witness' for quorum votes)
On all three hosts, I installed corosync-qdevice & corosync-qnetd (I think that qnetd is only needed on the non-proxmox host but not sure):
Code:
apt-get install corosync-qdevice
apt-get install corosync-qnetd

Next I made sure that proxmox, proxmox-b, and witness could all ssh to each-other cleanly as root.

On proxmox, ran the following (where the last three arguments):
Code:
corosync-qdevice-net-certutil -Q -n <cluster name> <ip address for witness> <ip address for proxmox> <ip address for proxmox-b>
(you can determine the cluster name by looking at the cluster_name value in /etc/corosync/corosync.conf

Edited /etc/corosync/corosync.conf and added the following to the quorum section:
Code:
quorum {
  provider: corosync_votequorum
     device {
         model: net
         votes: 1
         net {
           tls: on
           host: <ip address for witness>
           algorithm: ffsplit
         }
     }
}

Then restarted corosync service and corosync-qdevice service:
Code:
service corosync restart
service corosync-qdevice start

Did the same steps of editing corosync.conf and restarting stuff on proxmox-b

Afterwards, corosync-quorumtool shows the following:
Code:
root@proxmox:/etc/corosync# corosync-quorumtool
Quorum information
------------------
Date:             Sun May 27 00:54:41 2018
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          1
Ring ID:          1/4656
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1    A,V,NMW <ip address of proxmox> (local)
         2          1    A,V,NMW <ip address of proxmox-b>
         0          1            Qdevice
Tested this by rebooting witness. Everything was fine. While witness was rebooting, I observed that corosync-quorumtool showed that witness did not have any votes but the cluster was otherwise healthy.

Further tested this by rebooting proxmox-b. While this was rebooting, corosync-quorumtool showed that the node dropped off but still had 2 votes and quorum and was otherwise healthy.

This may or may not work for you, so beware if you start tinkering!
I am new to proxmox and going to setup a 2 node cluster with a quorum device. However, when I install the corosync-qdevice, the following error occurred:

Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
corosync-qdevice : Depends: corosync (= 2.4.2-3+deb9u1) but 2.4.4-pve1 is to be installed
E: Unable to correct problems, you have held broken packages.

I installed proxmox VE 5.3-8 on both nodes and corosync-qnetd package could be installed successfully.
Could anyone help on this issue? Could I neglect the dependency check to install it by force? Any drawback would result ?

Thanks a lot.
 

mir

Well-Known Member
Apr 14, 2012
3,489
97
48
Copenhagen, Denmark
corosync-qdevice
You need to ensure that you have the corosync package from proxmox installed:
dpkg -s corosync
Package: corosync
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 1431
Maintainer: Debian HA Maintainers <debian-ha-maintainers@lists.alioth.debian.org>
Architecture: amd64
Version: 2.4.4-pve1
Replaces: corosync-pve (<< 2.4.2-2)
Provides: corosync-pve (= 2.4.4-pve1)
 
Last edited:

tawh

New Member
Mar 26, 2019
8
0
1
29
This package is not supposed to be installed on proxmox nodes but on the qdevice server.
:eek:

Oh, I saw the instruction given by other member:

For context this is my environment:
  • host: proxmox (one of the nodes in my cluster)
  • host: proxmox-b (the other node in my cluster)
  • host: witness (the non-proxmox raspberry pi node that I use as a corosync 'witness' for quorum votes)
On all three hosts, I installed corosync-qdevice & corosync-qnetd (I think that qnetd is only needed on the non-proxmox host but not sure):
So which one is correct?o_O
 

mir

Well-Known Member
Apr 14, 2012
3,489
97
48
Copenhagen, Denmark
And match it with this one:
dpkg -s corosync-qdevice
Package: corosync-qdevice
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 504
Maintainer: Debian HA Maintainers <debian-ha-maintainers@lists.alioth.debian.org>
Architecture: amd64
Source: corosync
Version: 2.4.4-pve1
 

tawh

New Member
Mar 26, 2019
8
0
1
29
And match it with this one:
dpkg -s corosync-qdevice
Package: corosync-qdevice
Status: install ok installed
Priority: optional
Section: admin
Installed-Size: 504
Maintainer: Debian HA Maintainers <debian-ha-maintainers@lists.alioth.debian.org>
Architecture: amd64
Source: corosync
Version: 2.4.4-pve1
Thanks for your clarification, mir.
However, I could only get version 2.4.2-3+deb9u1 of corosync-qdevice from command
Code:
apt-get install corosync-qdevice
I tried to update the repository by
Code:
apt-get update
but it still got the old version.

Is there any repository to be added to apt in order to get the latest version? or where can I get the corosync-qdevice version 2.4.4-pve1 package directly?

Thanks a lot!
 

tawh

New Member
Mar 26, 2019
8
0
1
29
After checked the apt repository at
Code:
/etc/apt/sources.list
, I realized that the content of this file differs with the recommended "pve-no-subscription" section at https://pve.proxmox.com/wiki/Package_Repositories#_proxmox_ve_enterprise_repository
I don't know why this file got modified during the installation of proxmox to retrieve packages from geographical nearby servers.

After reseting the content of
Code:
/etc/apt/sources.list
and update the repository, I am now able to install the correct version of corosync-qdevice!

Thanks for mir's assistance in this case. :)
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
1,501
213
63
South Tyrol/Italy
Always have this in mind or visit https://pve.proxmox.com/wiki/Package_Repositories after every installation.
Or Get Subscription if you want to support the project and get some more benefits ;)

BTW., Since pve-cluster in version 5.0-34 there's a built in setup and remove command for QDevices in pvecm, if you have that version (or newer) see:
Code:
pvecm help qdevice
You then only need to install the respective poackages, the rest should be doable through pvecm.
 

harvie

Member
Apr 5, 2017
88
14
8
30
BTW., Since pve-cluster in version 5.0-34 there's a built in setup and remove command for QDevices in pvecm
pvecm qdevice setup MY_IP_ADDR

says this:

INFO: initializing qnetd server
bash: corosync-qnetd-certutil: command not found


I've tried installing corosync-qnetd and corosync-qdevice but it's still not working


!!!UPDATE: SOLUTION: corosync-qnetd
and corosync-qdevice has to be installed on all cluster nodes including qdevice!

Also had to do this on nodes (not on qdevice):

systemctl enable corosync-qdevice.service
systemctl start corosync-qdevice.service

And on qdevice:

systemctl enable corosync-qnetd.service
systemctl start corosync-qnetd.service
 
Last edited:

Vigeland

New Member
May 28, 2019
2
0
1
29
Hi,
I try to set up, but I failed.

All system are able to root login from each to each.
apt install works on every system

corosync-qdevice-net-certutil -Q -n Clustername pve1 pve2 PI works only if PI is the last element, if not I got the cert error.

First start on PI, auth file missing. I copy it from on of the PVE to pi.

1. If I copy the corosync.conf the pi goes to a endless look. Which conf I needed ?
I see 'Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'

2. When I add expected_votes.
I got [TOTEM ] Received message has invalid digest... ignoring on pve

3. I stop all corosync service ( 3 ), edit the corosync.conf with the qdevice, copy the conf to all 3 system.
the restarting the service, one of the pve remove the qdevice from the conf one of the systems.

Any idea, what is going wrong. Last pve release and last debian release on PI.

Thanks.
 

n1nj4888

Member
Jan 13, 2019
93
2
8
39
Hi There,

Can someone please clarify which corosync packages need to be installed and started on which PVE nodes / qdevice?

The wiki (https://pve.proxmox.com/wiki/Cluster_Manager ) states that:

First install the corosync-qnetd package on your external server and the corosync-qdevice package on all cluster nodes.

So I've: (A) Installed, enabled and started just corosync-qnetd on the qdevice and (B) Installed, enabled and started just corosync-qdevice on my PVE Node1 and PVE Node 2.

When I try to add the qdevice using pvecm, I get the following error:

root@pve-node1:~# pvecm qdevice setup <QDEVICE IP>
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes

node 'pve-node2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve-node2': Creating new key and cert db
node 'pve-node2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve-node2': Importing CA
node 'pve-node1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve-node1': Creating new key and cert db
node 'pve-node1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve-node1': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key. This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server
INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-pve-clustertest.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve-node2': Importing cluster certificate and key
node 'pve-node2': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'pve-node1': Importing cluster certificate and key
node 'pve-node1': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve-node2'...
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.
command 'ssh -o 'BatchMode=yes' -lroot <PVE-NODE2 IP> systemctl start corosync-qdevice' failed: exit code 1


Then, the status of the cluster/qdevice is listed as the following implying that the qdevice is not adding to the "Total Votes"?


root@pve-node1:~# pvecm status
Quorum information
------------------
Date: Mon Jun 3 10:19:04 2019
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/360
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 NA,NV,NMW <PVE-NODE1 IP> (local)
0x00000002 1 NA,NV,NMW <PVE-NODE2 IP>
0x00000000 0 Qdevice (votes 1)
 
Last edited:

n1nj4888

Member
Jan 13, 2019
93
2
8
39
I managed to get it working by rebooting both PVE nodes - When they came back up, it appears they were then successfully receiving a vote from the qdevice:

root@pve-node1:~# pvecm status
Quorum information
------------------
Date: Mon Jun 3 11:21:30 2019
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/392
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW <PVE-NODE1 IP> (local)
0x00000002 1 A,V,NMW <PVE-NODE2 IP>
0x00000000 1 Qdevice
 

Vigeland

New Member
May 28, 2019
2
0
1
29
Hi,

I can confirm , it works.

My Steps.

Pre Steps, maybe not needed
make sure root login from and to pi are possible
copy keys to .ssh/authorized_keys

a) Install corosync-qnetd on pi
b) Install corosync-qdevice on all Proxmox nodes,
c) pvecm qdevice setup <PI IP Address>

It seem to be a different on which pve node the pvecm is executed.
For me, on one server I got the same error like n1nj4888 on the other pve node it works without an error. Mystery ...

 

Luc Brouard

New Member
Jul 19, 2019
1
0
1
46


INFO: start and enable corosync qdevice daemon on node 'pve-node2'...
Job for corosync-qdevice.service failed because the control process exited with error code.
See "systemctl status corosync-qdevice.service" and "journalctl -xe" for details.
command 'ssh -o 'BatchMode=yes' -lroot <PVE-NODE2 IP> systemctl start corosync-qdevice' failed: exit code 1
a) Install corosync-qnetd on pi
b) Install corosync-qdevice on all Proxmox nodes,
c) pvecm qdevice setup <PI IP Address>

It seem to be a different on which pve node the pvecm is executed.
For me, on one server I got the same error like n1nj4888 on the other pve node it works without an error. Mystery ...
I used those 3 steps.

The error was slightly different for me (but using corosync 3/PVE6).
Step c) was unable to enable corosync-qdevice for reasons described here: bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1809682
The "fix" described at comment #6 (edit file/update-rc and then enable) was OK for me. (ie. bugs.launchpad.net/ubuntu/+source/corosync-qdevice/+bug/1809682/comments/6)

Sorry for the links, just c/c and prefix them (new user not allowed to post links as an anti-spam measure)
 

Yuan Ren

New Member
Jul 29, 2019
1
0
1
26
I got a lot further, but now i can not start the qdevice on the raspberry pi

Code:
root@pve1:~# pvecm status
Quorum information
------------------
Date:             Tue Feb 19 15:35:19 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1/16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1   A,NV,NMW 10.0.200.220 (local)
0x00000002          1   A,NV,NMW 10.0.200.221
0x00000000          0            Qdevice (votes 1)
Start qnetd on raspberry pi:
Code:
Feb 19 15:35:23 pvew systemd[1]: Starting Corosync Qdevice Network daemon...
-- Subject: Unit corosync-qnetd.service has begun start-up
-- Defined-By: systemd
-- Support
--
-- Unit corosync-qnetd.service has begun starting up.
Feb 19 15:35:23 pvew corosync-qnetd[799]: Feb 19 15:35:23 crit    NSS error (-8015): The certificate/key database is in an old, unsupported format.
Feb 19 15:35:23 pvew systemd[1]: corosync-qnetd.service: Main process exited, code=exited, status=1/FAILURE
Feb 19 15:35:23 pvew systemd[1]: Failed to start Corosync Qdevice Network daemon.
-- Subject: Unit corosync-qnetd.service has failed
-- Defined-By: systemd
-- Support:
--
-- Unit corosync-qnetd.service has failed.
--
-- The result is failed.
Feb 19 15:35:23 pvew systemd[1]: corosync-qnetd.service: Unit entered failed state.
Feb 19 15:35:23 pvew systemd[1]: corosync-qnetd.service: Failed with result 'exit-code'.
Anyone an idea?
ha, this is caused by an old version of corosync-qdevice (2.4.4 or 2.90.0) doesn't sup new NSS database format. The patch is b561a902f7351e in corosync-qdevice repository
 

mir

Well-Known Member
Apr 14, 2012
3,489
97
48
Copenhagen, Denmark
ha, this is caused by an old version of corosync-qdevice (2.4.4 or 2.90.0) doesn't sup new NSS database format. The patch is b561a902f7351e in corosync-qdevice repository
corosync-qdevice is on version 3.0.0-4 in Debian Buster for ARM, so a simple upgrade of your PI to Debian Buster as well as with your PVE nodes and the issues should be solved.

PS. corosync-qdevice-3.0.0-4 is backwards compatible with the corosync-2.x so you can upgrade your PI to Buster before upgrading your PVE nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!