Hi everyone
I just set this up a few days ago. Setup went very smoothly (no errors).
But when testing, seem to be running into an issue when I reboot one of my two nodes. It says something like "No quorum on node1" in Datacenter -> HA (where node1 is the remaining online node). From what I understand it should say "Quorum OK", right?
I have a main LAN (10.1.10.0/24) which is only 1gbps and also a dedicated 10gbps network (10.2.10.0/24) for the cluster to use. My qdevice has NICs on both networks, and I have confirmed all-way pings between my two proxmox nodes and my qdevice on both nics.
Can someone please take a look at the output of pvecm ndoes and pvecm status below, see if I have done anything obvious wrong?
Here is how it looks when things are normal:
And this is when node2 is offline (say due to a reboot or something):
Ok some progress - on my qdevice, the logs are full of
So I tried removing & re-adding the device:
As you can see, no errors are reported in the re-adding of the qdevice but these errors persist:
Ok, finally resolved this, ran this on my qdevice
(the `--purge` directive removes all the config files)
Then re-installed:
Now, quorum is retained when one node goes offline for maintenance or whatever, and the qdevice has the ability to cast a vote (note the "1" in the votes column, previously it was 0):
And corosync service is running and healthy:
I'm guessing something happened originally when I was messing around with this that broke the config files, and re-adding the qdevice does not overwrite those configs?
I just set this up a few days ago. Setup went very smoothly (no errors).
But when testing, seem to be running into an issue when I reboot one of my two nodes. It says something like "No quorum on node1" in Datacenter -> HA (where node1 is the remaining online node). From what I understand it should say "Quorum OK", right?
I have a main LAN (10.1.10.0/24) which is only 1gbps and also a dedicated 10gbps network (10.2.10.0/24) for the cluster to use. My qdevice has NICs on both networks, and I have confirmed all-way pings between my two proxmox nodes and my qdevice on both nics.
Can someone please take a look at the output of pvecm ndoes and pvecm status below, see if I have done anything obvious wrong?
Here is how it looks when things are normal:
Code:
# pvecm nodes
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,NV,NMW node2
2 1 A,NV,NMW node1 (local)
0 0 Qdevice (votes 1)
# pvecm status
Cluster information
-------------------
Name: clu01
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sat Sep 20 13:06:28 2025
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1.59
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,NV,NMW 10.2.10.131
0x00000002 1 A,NV,NMW 10.2.10.132 (local)
0x00000000 0 Qdevice (votes 1)
And this is when node2 is offline (say due to a reboot or something):
Code:
# pvecm nodes
Membership information
----------------------
Nodeid Votes Qdevice Name
2 1 A,NV,NMW node1 (local)
0 0 Qdevice (votes 1)
# pvecm status
Cluster information
-------------------
Name: clu01
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sat Sep 20 13:04:50 2025
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2.54
Quorate: No
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 1
Quorum: 2 Activity blocked
Flags: Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000002 1 A,NV,NMW 10.2.10.132 (local)
0x00000000 0 Qdevice (votes 1)
Ok some progress - on my qdevice, the logs are full of
Code:
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate
So I tried removing & re-adding the device:
Code:
# pvecm qdevice remove
Synchronizing state of corosync-qdevice.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed '/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service'.
Synchronizing state of corosync-qdevice.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed '/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service'.
Reloading corosync.conf...
Done
Code:
# pvecm qdevice setup 10.2.10.120
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@10.2.10.120's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh -i /root/.ssh/id_rsa 'root@10.2.10.120'"
and check to make sure that only the key(s) you wanted were added.
INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db
INFO: copying CA cert and initializing on all nodes
node 'node1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'node1': Creating new key and cert db
node 'node1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'node1': Importing CA
node 'node2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'node2': Creating new key and cert db
node 'node2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'node2': Importing CA
INFO: generating cert request
Creating new certificate request
Generating key. This may take a few moments...
Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq
INFO: copying exported cert request to qnetd server
INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-clu01.crt
INFO: copy exported CRT
INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12
INFO: copy and import pk12 cert to all nodes
node 'node1': Importing cluster certificate and key
node 'node1': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'node2': Importing cluster certificate and key
node 'node2': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration
INFO: start and enable corosync qdevice daemon on node 'node1'...
Synchronizing state of corosync-qdevice.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink '/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service' -> '/usr/lib/systemd/system/corosync-qdevice.service'.
INFO: start and enable corosync qdevice daemon on node 'node2'...
Synchronizing state of corosync-qdevice.service with SysV service script with /usr/lib/systemd/systemd-sysv-install.
Executing: /usr/lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink '/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service' -> '/usr/lib/systemd/system/corosync-qdevice.service'.
Reloading corosync.conf...
Done
As you can see, no errors are reported in the re-adding of the qdevice but these errors persist:
Code:
corosync-qnetd[878]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
Ok, finally resolved this, ran this on my qdevice
Code:
apt-get remove --purge corosync-qnetd corosync-qdevice corosync
(the `--purge` directive removes all the config files)
Then re-installed:
Code:
apt install corosync-qnetd -y && apt install corosync-qdevice -y
Now, quorum is retained when one node goes offline for maintenance or whatever, and the qdevice has the ability to cast a vote (note the "1" in the votes column, previously it was 0):
Code:
# pvecm nodes
Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW node1 (local)
2 1 A,V,NMW node2
0 1 Qdevice
And corosync service is running and healthy:
Code:
# systemctl status corosync-qnetd
● corosync-qnetd.service - Corosync Qdevice Network daemon
Loaded: loaded (/lib/systemd/system/corosync-qnetd.service; enabled; preset: enabled)
Active: active (running) since Sat 2025-09-20 16:13:45 NZST; 3min 31s ago
Docs: man:corosync-qnetd
Main PID: 856 (corosync-qnetd)
Tasks: 1 (limit: 4646)
Memory: 6.6M
CPU: 56ms
CGroup: /system.slice/corosync-qnetd.service
└─856 /usr/bin/corosync-qnetd -f
Sep 20 16:13:45 carnelian systemd[1]: Starting corosync-qnetd.service - Corosync Qdevice Network daemon...
Sep 20 16:13:45 carnelian systemd[1]: Started corosync-qnetd.service - Corosync Qdevice Network daemon.
I'm guessing something happened originally when I was messing around with this that broke the config files, and re-adding the qdevice does not overwrite those configs?
Last edited: