I recently rebuilt my proxmox cluster. I have 2 proxmox nodes, and this time I am using a raspberry pi as the qnetd server. I followed the guide.
For the purposes of this post, I am going to use these names:
1st proxmox node: NODE1
2nd proxmox node: NODE2
cluster name: CLUSTER
As soon as I try to run
I get this when I execute it on NODE1:
And I get this when I execute it on NODE2:
*EDIT* - I figured it out. I had all of my nodes and my qdevice configured correctly to use a non-standard ssh port (ssh_config files, sshd_config files, and firewalls), but some of the commands that are executed as part of
For the purposes of this post, I am going to use these names:
1st proxmox node: NODE1
2nd proxmox node: NODE2
cluster name: CLUSTER
As soon as I try to run
pvecm qdevice setup <QDEVICE-IP>
I get nearly the same results on both nodes.I get this when I execute it on NODE1:
Code:
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)
INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db
INFO: copying CA cert and initializing on all nodes
Host key verification failed.
node 'NODE2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'NODE2': Creating new key and cert db
node 'NODE2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'NODE2': Importing CA
INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n CLUSTER' failed: exit code 1
And I get this when I execute it on NODE2:
Code:
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)
INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db
INFO: copying CA cert and initializing on all nodes
node 'NODE1': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'NODE1': Creating new key and cert db
node 'NODE1': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'NODE1': Importing CAHost key verification failed.
INFO: generating cert request
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
command 'corosync-qdevice-net-certutil -r -n CLUSTER' failed: exit code 1
- In the results of both,
Host key verification failed.
appears. I have no idea what this could be in reference to because both hosts can SSH into each other just fine as well as into the raspberry pi qdevice. If I simply execute "ssh IP_ADDRESS" as root from one of the proxmox nodes where IP_ADDRESS equals either the address for the other node or the qdevice, I am logged into the other host as root without asking for verification.
- The guide says "If you receive an error such as Host key verification failed. at this stage, running pvecm updatecerts could fix the issue." I ran
pvecm updatecerts
on both nodes and it had no effect.
- I have turned up the firewall logs to debug level on both nodes and am seeing that the SSH connections from both hosts are being allowed by the firewall, and the syslog for both nodes show that they accepted the pubkey for root from the other node.
- The error
Certificate database doesn't exists. Use /sbin/corosync-qdevice-net-certutil -i to create it
appears in both as well. If I try to execute the/sbin/corosync-qdevice-net-certutil -i
command as root on both nodes I get the same error: "Can't open certificate file".
- If I try to start the corosync-qdevice service on either proxmox node, they fail to start with the error
Can't read quorum.device.model cmap key
- The corosync-qnetd service on the qdevice is running without error.
*EDIT* - I figured it out. I had all of my nodes and my qdevice configured correctly to use a non-standard ssh port (ssh_config files, sshd_config files, and firewalls), but some of the commands that are executed as part of
pvecm qdevice setup <QDEVICE-IP>
are hard-coded to use the default ssh port. No amount of configuring to use different ports can work around this. Once I switched all of the servers back to using port 22 for ssh, cleared out the nssdb path on the proxmox nodes (rm -R /etc/corosync/qdevice/net/nssdb/
), cleared out the nssdb path on the qdevice (rm -R /etc/corosync/qnetd/nssdb/
), re-ran pvecm qdevice setup <QDEVICE-IP>
, restarted corosync-qdevice.service on the proxmox nodes, and restarted corosync-qnetd.service on the qdevice, it all wored correctly.
Last edited: