Can someone point me to proper docs about creating an HA environment with 2 nodes plus an external quorum device ? I know this is possible (even with a raspberry), but I don't remember how to make this.
I recently searched too for possible solutions and found this on the mailing list:
https://pve.proxmox.com/pipermail/pve-devel/2017-July/027732.html
I´m using corosync-qdevice since 2 months without problems now.
apt-get install corosync-qdevice
apt-get install corosync-qnetd
corosync-qdevice-net-certutil -Q -n <cluster name> <ip address for witness> <ip address for proxmox> <ip address for proxmox-b>
quorum {
provider: corosync_votequorum
device {
model: net
votes: 1
net {
tls: on
host: <ip address for witness>
algorithm: ffsplit
}
}
}
service corosync restart
service corosync-qdevice start
root@proxmox:/etc/corosync# corosync-quorumtool
Quorum information
------------------
Date: Sun May 27 00:54:41 2018
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 1
Ring ID: 1/4656
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW <ip address of proxmox> (local)
2 1 A,V,NMW <ip address of proxmox-b>
0 1 Qdevice
I'd like to report that I got the corosync-qdevice thing to work for my 2-node cluster.
Previously I was using the raspberry-pi-as-a-third-node approach which seemed like a hacky solution. The dummy node shows up in the proxmox cluster info as unusable nodes (because they are) and it blocks me from creating a new VM until I temporarily remove those dummy nodes from the corosync config and restart corosync. It wasn't really ideal.
Based on this nugget of information from pve mail post, I did the following to make this work in my 2-node cluster environment.
For context this is my environment:
On all three hosts, I installed corosync-qdevice & corosync-qnetd (I think that qnetd is only needed on the non-proxmox host but not sure):
- host: proxmox (one of the nodes in my cluster)
- host: proxmox-b (the other node in my cluster)
- host: lb (the non-proxmox raspberry pi node that I use as a corosync 'witness' for quorum votes)
Code:apt-get install corosync-qdevice apt-get install corosync-qnetd
Next I made sure that proxmox, proxmox-b, and lb could all ssh to each-other cleanly as root.
On proxmox, ran the following (where the last three arguments):
Code:corosync-qdevice-net-certutil -Q -n proxmox <ip address for lb> <ip address for proxmox> <ip address for proxmox-b>
Edited /etc/corosync/corosync.conf and added the following to the quorum section:
Code:quorum { provider: corosync_votequorum device { model: net votes: 1 net { tls: on host: 10.0.7.16 algorithm: ffsplit } } }
Then restarted corosync service and corosync-qdevice service:
Code:service corosync restart service corosync-qdevice start
Did the same steps of editing corosync.conf and restarting stuff on proxmox-b
Afterwards, corosync-quorumtool shows the following:
Code:root@proxmox:/etc/corosync# corosync-quorumtool Quorum information ------------------ Date: Sun May 27 00:54:41 2018 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 1 Ring ID: 1/4656 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 A,V,NMW <ip address of proxmox> (local) 2 1 A,V,NMW <ip address of proxmox-b> 0 1 Qdevice
Tested this by rebooting lb. Everything was fine. While lb was rebooting, I observed that corosync-quorumtool showed that lb did not have any votes but the cluster was otherwise healthy.
Further tested this by rebooting proxmox-b. While this was rebooting, corosync-quorumtool showed that the node dropped off but still had 2 votes and quorum and was otherwise healthy.
This may or may not work for you, so beware if you start tinkering!
Thanks for this doc very interesting.
For me at this time a always have errors when I ran this command :
corosync-qdevice-net-certutil -Q -n proxmox <p addresis for lb> <ip address for proxmox> <ip address for proxmox-
In the doc, the firts IP is the IP for box with the quorum disk, the others ip for the nodes.
The first error is :
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db
the after
Can't open certificate file /tmp/qnetd-cacert.crt
....
unable to create the qdevie at this time
thx if you have an idea
fr
corosync-qdevice need to be installed only on real proxmox nodes (proxmox and proxmox-b).On all three hosts, I installed corosync-qdevice & corosync-qnetd (I think that qnetd is only needed on the non-proxmox host but not sure):
systemctl enable corosync-qnetd
systemctl start corosync-qnetd
Edited /etc/corosync/corosync.conf and added the following to the quorum section:
...
Then restarted corosync service and corosync-qdevice service:
Code:service corosync restart service corosync-qdevice start
Did the same steps of editing corosync.conf and restarting stuff on proxmox-b
systemctl enable corosync-qdevice
systemctl start corosync-qdevice
Hi,
I come back with this post, I forget if the command "corosync-qdevice-net-certutil -Q -n proxmox <ip address for lb> <ip address for proxmox> <ip address for proxmox-b" must be run on each proxmox ???
thx for your advise
thanksauthor of the "nugget" this tutorial bases on here...
no, only once on one PVE node. It connects to all passed hosts through SSH (thus SSH public keyauth is recommended to have set up) and sets the certificates up for all of them. Just swap 'proxmox' with your respective clustername and ensure that the first one is the "witness", not a PVE node.
root@pve1:~# corosync-qdevice-net-certutil -Q -n vdbcluster 10.0.200.159 10.0.200.153 10.0.200.156
Creating /etc/corosync/qnetd/nssdb
Creating new key and cert db
password file contains no data
Creating new noise file /etc/corosync/qnetd/nssdb/noise.txt
Creating new CA
Generating key. This may take a few moments...
Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?
Generating key. This may take a few moments...
Notice: Trust flag u is set automatically if the private key is present.
QNetd CA certificate is exported as /etc/corosync/qnetd/nssdb/qnetd-cacert.crt
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
lost connection
Can't open certificate file /tmp/qnetd-cacert.crt
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
lost connection
Can't open certificate file /tmp/qnetd-cacert.crt
Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it
/etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq: No such file or directory
Can't open certificate file /tmp/qdevice-net-node.crq
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,password).
lost connection
Can't open certificate file /etc/corosync/qdevice/net/nssdb/cluster-vdbcluster.crt
/etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12: No such file or directory
Can't open certificate file /etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12
root@pve1:~# ssh 10.0.200.159
Linux raspberrypi 4.14.50+ #1122 Tue Jun 19 12:21:21 BST 2018 armv6l
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Feb 17 19:50:49 2019 from 10.0.200.153
root@raspberrypi:~#
I'm having a hard time getting this to work.
No matter what I try I always end up with the same errors:
Code:root@pve1:~# corosync-qdevice-net-certutil -Q -n vdbcluster 10.0.200.159 10.0.200.153 10.0.200.156 Creating /etc/corosync/qnetd/nssdb Creating new key and cert db password file contains no data Creating new noise file /etc/corosync/qnetd/nssdb/noise.txt Creating new CA Generating key. This may take a few moments... Is this a CA certificate [y/N]? Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]? Generating key. This may take a few moments... Notice: Trust flag u is set automatically if the private key is present. QNetd CA certificate is exported as /etc/corosync/qnetd/nssdb/qnetd-cacert.crt Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,password). lost connection Can't open certificate file /tmp/qnetd-cacert.crt Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,password). lost connection Can't open certificate file /tmp/qnetd-cacert.crt Certificate database doesn't exists. Use /usr/sbin/corosync-qdevice-net-certutil -i to create it /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq: No such file or directory Can't open certificate file /tmp/qdevice-net-node.crq Permission denied, please try again. Permission denied, please try again. Permission denied (publickey,password). lost connection Can't open certificate file /etc/corosync/qdevice/net/nssdb/cluster-vdbcluster.crt /etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12: No such file or directory Can't open certificate file /etc/corosync/qdevice/net/nssdb//qdevice-net-node.p12
ssh public key auth is used:
Code:root@pve1:~# ssh 10.0.200.159 Linux raspberrypi 4.14.50+ #1122 Tue Jun 19 12:21:21 BST 2018 armv6l The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Sun Feb 17 19:50:49 2019 from 10.0.200.153 root@raspberrypi:~#
Does someone have a clue on what might be wrong?
Thanks in advance
Thats not the case here. I can ssh to the qdevice as root without entering any password: ssh root@10.0.200.159
I think the problem is (at least in my case) root@pi (qdevice) was unable to ssh into the two PVE hosts using public key. Can you confirm the qdevice can successfully ssh into root via public key on each PVE host?
root@pve1:~# pvecm status
Quorum information
------------------
Date: Tue Feb 19 15:35:19 2019
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/16
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,NV,NMW 10.0.200.220 (local)
0x00000002 1 A,NV,NMW 10.0.200.221
0x00000000 0 Qdevice (votes 1)
Feb 19 15:35:23 pvew systemd[1]: Starting Corosync Qdevice Network daemon...
-- Subject: Unit corosync-qnetd.service has begun start-up
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit corosync-qnetd.service has begun starting up.
Feb 19 15:35:23 pvew corosync-qnetd[799]: Feb 19 15:35:23 crit NSS error (-8015): The certificate/key database is in an old, unsupported format.
Feb 19 15:35:23 pvew systemd[1]: corosync-qnetd.service: Main process exited, code=exited, status=1/FAILURE
Feb 19 15:35:23 pvew systemd[1]: Failed to start Corosync Qdevice Network daemon.
-- Subject: Unit corosync-qnetd.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- Unit corosync-qnetd.service has failed.
--
-- The result is failed.
Feb 19 15:35:23 pvew systemd[1]: corosync-qnetd.service: Unit entered failed state.
Feb 19 15:35:23 pvew systemd[1]: corosync-qnetd.service: Failed with result 'exit-code'.