[SOLVED] qdevice setup failure for new 2 node cluster

hockeyjim07

New Member
Nov 13, 2024
3
0
1
I've expanded my homelab to now have 2 nodes, and as such I decided to do this as a cluster with a qdevice.

the cluster setup had no issues, both nodes show on the cluster just fine. for setting up the qdevice, I am using a rPi 3B with RPi OS Lite. After ensuring the qdevice is up to date, I proceeded to install: corosync-qnetd && corosync-qdevice on the qdevice, and also installed corosync-qdevice on both node1 and node2. So far so good.

I then ran pvecm qdevice setup 192.168.1.130 -ffrom node1 and get the following:

HTML:
root@xxxxx-homelab-srvr01:/etc/ssh# pvecm qdevice setup 192.168.1.130 -f
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found
Certificate database already exists. Delete it to continue

INFO: generating cert request
command 'corosync-qdevice-net-certutil -r -n xxxxx-cluster-1' failed: open3: exec of corosync-qdevice-net-certutil -r -n xxxxx-cluster-1 failed: No such file or directory at /usr/share/perl5/PVE/Tools.pm line 517.

I then did some digging here and read that for whatever reason, PVE 8.2 needs some help and you need to SSH through every node / device, so I did that (node1 -> node2 | node1 -> RPi | node2 -> node1 | node2 -> RPi | RPi -> node1 | RPi -> node2) and then refreshed SSH certs on all 3 machines by doing service ssh restart

I tried again on node 1 and got the same above error. Then I tried setting it up from node2, its in the same cluster, so why not?


HTML:
root@xxxxx-homelab-srvr02:~# pvecm qdevice setup 192.168.1.130 -f
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found

node 'xxxxx-homelab-srvr02': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'xxxxx-homelab-srvr02': Creating new key and cert db
node 'xxxxx-homelab-srvr02': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'xxxxx-homelab-srvr02': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key.  This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-xxxxx-cluster-1.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes
bash: line 1: corosync-qdevice-net-certutil: command not found
command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=xxxxx-homelab-srvr01' -o 'UserKnownHostsFile=/etc/pve/nodes/xxxxx-homelab-srvr01/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.1.110 -- corosync-qdevice-net-certutil -m -c /etc/pve/qdevice-net-node.p12' failed: exit code 127

I'm stumped at this point and would love any assistance you all are able to provide.
 
I've expanded my homelab to now have 2 nodes, and as such I decided to do this as a cluster with a qdevice.

the cluster setup had no issues, both nodes show on the cluster just fine. for setting up the qdevice, I am using a rPi 3B with RPi OS Lite. After ensuring the qdevice is up to date, I proceeded to install: corosync-qnetd && corosync-qdevice on the qdevice, and also installed corosync-qdevice on both node1 and node2. So far so good.
You don't need corosync-qdevice on the qdevice, corosync-qnetd is enough, see
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support for reference
 
okay, understood, but this doesn't really address the issue at hand.

EDIT: I've uninstalled corosync-qdevice from the RPi and did a reinstall of corosync-qnetd as well. Same errors as above.
 
Last edited:
SOLVED: 5th (?) times the charm!

My assumption is that it's related to PVE 8.2.7 as I completely rebuilt the cluster 3 or 4 times (new OS installs on all nodes).

This last time I did again a complete wipe but I used an 8.1 ISO that I still had... the setup went right through with the exact same steps, no issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!