[SOLVED] qdevice setup failure for new 2 node cluster

hockeyjim07

New Member
Nov 13, 2024
3
1
3
I've expanded my homelab to now have 2 nodes, and as such I decided to do this as a cluster with a qdevice.

the cluster setup had no issues, both nodes show on the cluster just fine. for setting up the qdevice, I am using a rPi 3B with RPi OS Lite. After ensuring the qdevice is up to date, I proceeded to install: corosync-qnetd && corosync-qdevice on the qdevice, and also installed corosync-qdevice on both node1 and node2. So far so good.

I then ran pvecm qdevice setup 192.168.1.130 -ffrom node1 and get the following:

HTML:
root@xxxxx-homelab-srvr01:/etc/ssh# pvecm qdevice setup 192.168.1.130 -f
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found
Certificate database already exists. Delete it to continue

INFO: generating cert request
command 'corosync-qdevice-net-certutil -r -n xxxxx-cluster-1' failed: open3: exec of corosync-qdevice-net-certutil -r -n xxxxx-cluster-1 failed: No such file or directory at /usr/share/perl5/PVE/Tools.pm line 517.

I then did some digging here and read that for whatever reason, PVE 8.2 needs some help and you need to SSH through every node / device, so I did that (node1 -> node2 | node1 -> RPi | node2 -> node1 | node2 -> RPi | RPi -> node1 | RPi -> node2) and then refreshed SSH certs on all 3 machines by doing service ssh restart

I tried again on node 1 and got the same above error. Then I tried setting it up from node2, its in the same cluster, so why not?


HTML:
root@xxxxx-homelab-srvr02:~# pvecm qdevice setup 192.168.1.130 -f
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found

node 'xxxxx-homelab-srvr02': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'xxxxx-homelab-srvr02': Creating new key and cert db
node 'xxxxx-homelab-srvr02': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'xxxxx-homelab-srvr02': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-xxxxx-cluster-1.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found
command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=xxxxx-homelab-srvr01' -o 'UserKnownHostsFile=/etc/pve/nodes/xxxxx-homelab-srvr01/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.1.110 -- corosync-qdevice-net-certutil -m -c /etc/pve/qdevice-net-node.p12' failed: exit code 127

I'm stumped at this point and would love any assistance you all are able to provide.
 
I've expanded my homelab to now have 2 nodes, and as such I decided to do this as a cluster with a qdevice.

the cluster setup had no issues, both nodes show on the cluster just fine. for setting up the qdevice, I am using a rPi 3B with RPi OS Lite. After ensuring the qdevice is up to date, I proceeded to install: corosync-qnetd && corosync-qdevice on the qdevice, and also installed corosync-qdevice on both node1 and node2. So far so good.
You don't need corosync-qdevice on the qdevice, corosync-qnetd is enough, see
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support for reference
 
okay, understood, but this doesn't really address the issue at hand.

EDIT: I've uninstalled corosync-qdevice from the RPi and did a reinstall of corosync-qnetd as well. Same errors as above.
 
Last edited:
SOLVED: 5th (?) times the charm!

My assumption is that it's related to PVE 8.2.7 as I completely rebuilt the cluster 3 or 4 times (new OS installs on all nodes).

This last time I did again a complete wipe but I used an 8.1 ISO that I still had... the setup went right through with the exact same steps, no issues.
 
  • Like
Reactions: Johannes S