I'm in the process of creating a 2 node cluster with a qdevice running from truenas. I do already have vm's running on one of my Nodes, node1, the one that I created the cluster via the GUI from. The 2nd node was a fresh install with some network settings already in place, Lan/Management/Corosync networks. I just joined node2 to my cluster via the GUI and afterwards I can no longer access the webgui from that node. I get an error indicating that "The site can't be reached 10.90.100.41 took too long to respond". I can ping that IP just fine. I can access the webgui from node 1 and I do see both nodes with green status's. When I drill down to node2 within the GUI I get communication failure errors shown in the screenshots provided. I can access the shell via the GUI for Node2. I can SSH into node2 via management network just fine.
Some information
Node1: yamato
management:10.90.100.42/24
Lan:10.90.20.2/24
corosync: 10.90.40.12/24
pvecm status
Node2: shinano
Management: 10.90.100.41/24
Lan: 10.90.20.4/24
corosync: 10.90.40.13/24
pvecm status
I did attempt to add the qdevice from node1. I'm adding the error here to see if it could also shed some light onto the issue. It errored out when it attempted to add Node2.
My guess is this has to do with the ssh key on node2 changing when it got added to the cluster.
Thanks
Some information
Node1: yamato
management:10.90.100.42/24
Lan:10.90.20.2/24
corosync: 10.90.40.12/24
pvecm status
Code:
root@yamato:~# pvecm status
Quorum information
------------------
Date: Tue Nov 17 19:42:54 2020
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1/36
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.90.40.12 (local)
0x00000002 1 10.90.40.13
Node2: shinano
Management: 10.90.100.41/24
Lan: 10.90.20.4/24
corosync: 10.90.40.13/24
pvecm status
Code:
root@shinano:~# pvecm status
Quorum information
------------------
Date: Tue Nov 17 19:42:29 2020
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000002
Ring ID: 1/36
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.90.40.12
0x00000002 1 10.90.40.13 (local)
I did attempt to add the qdevice from node1. I'm adding the error here to see if it could also shed some light onto the issue. It errored out when it attempted to add Node2.
Code:
root@yamato:~# pvecm qdevice setup 10.90.40.11
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@10.90.40.11's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@10.90.40.11'"
and check to make sure that only the key(s) you wanted were added.
INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db
INFO: copying CA cert and initializing on all nodes
Host key verification failed.
node 'yamato': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'yamato': Creating new key and cert db
node 'yamato': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'yamato': Importing CA
INFO: generating cert request
Creating new certificate request
Generating key. This may take a few moments...
Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq
INFO: copying exported cert request to qnetd server
INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-HomeCluster.crt
INFO: copy exported CRT
INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12
INFO: copy and import pk12 cert to all nodes
Host key verification failed.
command 'ssh -o 'BatchMode=yes' -lroot 10.90.100.41 corosync-qdevice-net-certutil -m -c /etc/pve/qdevice-net-node.p12' failed: exit code 255
My guess is this has to do with the ssh key on node2 changing when it got added to the cluster.
Thanks
Attachments
Last edited: