Adding QDevice fails

aweber1nj

Member
Dec 20, 2023
32
2
8
Attempting to add an "external" QDevice. Output is as follows...not sure I understand how to fix this (and on which machine)?
Code:
root@pve2:~# pvecm qdevice setup 192.168.1.58 -f
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Creating new key and cert db
password file contains no data
Creating new noise file /etc/corosync/qnetd/nssdb/noise.txt
Creating new CA


Generating key.  This may take a few moments...

Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?


Generating key.  This may take a few moments...

Notice: Trust flag u is set automatically if the private key is present.
QNetd CA certificate is exported as /etc/corosync/qnetd/nssdb/qnetd-cacert.crt

INFO: copying CA cert and initializing on all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found
bash: line 1: corosync-qdevice-net-certutil: command not found

INFO: generating cert request
command 'corosync-qdevice-net-certutil -r -n MyCluster' failed: open3: exec of corosync-qdevice-net-certutil -r -n MyCluster failed: No such file or directory at /usr/share/perl5/PVE/Tools.pm line 517.
 
command 'corosync-qdevice-net-certutil -r -n MyCluster' failed: open3: exec of corosync-qdevice-net-certutil -r -n MyCluster failed: No such file or directory at /usr/share/perl5/PVE/Tools.pm line 517.
Does this command exist on your system? "find / -name corosync-qdevice-net-certutil"

Did you install the package, and if you did - was the installation successful?

You may want to reinstall the relevant package.

Note, there are packages that you need to install on both external device and PVE:

https://pve.proxmox.com/wiki/Cluste...tup.-,QDevice-Net Setup,-We recommend running


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I just re-read 5.10.3...am I to install "corosync-qnetd" ONLY on external server(s), but I should install "corosync-qdevice" on ALL PVE Nodes and external server(s)???

That could be the problem for sure (I only installed the two packages on my external server before attempting the pvecm device setup).
 
I just re-read 5.10.3...am I to install "corosync-qnetd" ONLY on external server(s), but I should install "corosync-qdevice" on ALL PVE Nodes and external server(s)???
if you read the Wiki, the corosync-qdevice is only installed on _cluster_ nodes, not external device.

5.10.3 instructs you to do the same: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_qdevice_net_setup


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Re-reading it, I see that. It's not written very clearly. Referring to two separate packages/installs and referring to two separate sets of machines, mixing the conjunction "and" in the middle (and no punctuation to end the sentence).

It would be far more clear if it was separated into two steps, more clearly delineating which servers/machines get which packages.

Thank you for helping clear it up. Appreciate it.
 
After adding the package to my cluster nodes, the command runs to completion, but the status output is not as expected:
Code:
Quorum information
------------------
Date:             Thu Feb 20 11:04:29 2025
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1.9
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
0x00000001          1   A,NV,NMW 192.168.1.5
0x00000002          1   A,NV,NMW 192.168.1.6 (local)
0x00000000          0            Qdevice (votes 1)

The docs indicate that the Qdevice should show a 1 in the Votes column?
 
Looks like both of my nodes are filling the System Log with entries like:
Code:
corosync-qdevice[#####]: Unhandled error when reading from server. Disconnecting from server
Every 2-3 seconds!

I can confirm that ssh works via key (no password) for root between nodes and from node->Qdevice. I have not added Qdevice's ssh key to the cluster nodes (not sure why that would be necessary).

Can someone help me troubleshoot, please?
 
There are a few existing threads that deal with similar symptoms. You should go through them to see if one of them is a match for your situation.
The search brings up:
etc

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
OK, thanks for the links. I had found some of them, but the key was trying to run the daemon on the pi device with the debug command and that worked fine (everything showed up).

Which made me go back to restarting the qnetd service. That then failed with an error that it couldn't open the nssdb database. Which pointed me to a classic permissions issue. I checked, and the service was set to start as "coroqnetd" user, but the package install was as root (and most of the directories and files were only accessible/writable by root).

A chown -R on the /etc/corosync directory and the service started perfectly (and the status looks good on the nodes).

Putting that info here in case it helps someone else down-the-road!
 
  • Like
Reactions: Johannes S