SSH connecting to wrong host, began after a problem adding qdevice

on node [pbs-ifire] # apt install corosync-qnetd

That was to setup qdevice on pbs-ifire. It's a PBS install and running qdevice outside of that. IP is 192.168.1.253.

Then I installed on both cluster nodes: #
# apt install corosync-qdevice corosync-qnetd

First of all remove that corosync-qnetd from the nodes, they are not supposed to run that.

# pvecm qdevice setup 192.168.1.253

And that went ok?

Just ran journalctl on [pve] and seeing this:

View attachment 64062

There is no firewall between .251 and .253 (or .252) they are on a private network at the datacenter. The 100.x.x.x IP address is tailscale showing up. Not sure why, there is no special routing or exit node or subnet tail setup with tailscale. But it seems proxmox somewhere is hearing from a tailscale IP instead of the expected one.

Can you also show what is qnetd showing in journal on the qdevice?
 
Tailscale has nothing to do with it, ignore me bringing it up.

ChatGPT suggests this:

From the error message `SSL peer cannot verify your certificate`, it appears that the issue is related to the SSL certificate used by the corosync-qnetd service.

Here are steps to diagnose this issue:

1. **Check that the certificates are present and in the correct location**. The default location should be `/etc/corosync/qdevice/net/qdevice-certs`.

2. **Check the permissions on the certs**: Make sure the permissions on the certs directory and the certs themselves are correct. Corosync should have read access to these files.

3. **Check the configuration file**: The error could be caused by a misconfiguration in the corosync setup on the node. If you have edited any of these files recently, try reverting the changes or check for any typographical errors. The configuration file usually resides at `/etc/corosync/corosync.conf`.

4. **Ensure the certificates are valid**: Use `openssl` command to check if the certificates are still valid.

```bash
openssl x509 -noout -text -in </path/to/your/certificate.pem>
```
The output will tell you when the certificate was issued and when it's due to expire.

5. **Regenerate the certificates**. If the certificates are in the wrong location, have incorrect permissions, or are expired, you may need to regenerate and redistribute them.

```bash
corosync-qnetd-certutil -r
corosync-qnetd-certutil -i
```
Afterwards, copy the newly generated certificate to all nodes, including the one running the qnetd, using

```bash
chronyc -a 'burst 4/4'; systemctl restart corosync; systemctl restart pve-cluster; pvecm updatecerts -f
```
Please note that the corosync and Proxmox versions you're using could slightly change these instructions.

If following these steps does not solve your issue, you may need to provide more detailed information about your setup for further assistance.
 
First of all remove that corosync-qnetd from the nodes, they are not supposed to run that.



And that went ok?



Can you also show what is qnetd showing in journal on the qdevice?

Ok, removed from pve and pve2.

[pbs-ifire] # journalctl -xe|grep coro

Mar 03 16:31:56 pbs-ifire.ifire.net corosync-qnetd[1157]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
Mar 03 16:31:57 pbs-ifire.ifire.net corosync-qnetd[1157]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
Mar 03 16:31:58 pbs-ifire.ifire.net corosync-qnetd[1157]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
Mar 03 16:32:00 pbs-ifire.ifire.net corosync-qnetd[1157]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
Mar 03 16:32:01 pbs-ifire.ifire.net corosync-qnetd[1157]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
Mar 03 16:32:04 pbs-ifire.ifire.net corosync-qnetd[1157]: Unhandled error when reading from client. Disconnecting client (-12271): SSL peer cannot verify your certificate.
 
And that went ok?

You avoided answering if the pvecm qdevice setup 192.168.1.253 went ok. :D

The thing is, it was supposed to generate all the certs and put them where they belong. If it did not go ok, I would suggest you do pvecm qdevice removeand redo it and show where it complained. I am too old for ChatGPT, you can go by it if you feel like you know what you are doing but be sure to include those pieces of information later if needed to troubleshoot more. :D
 
  • Like
Reactions: ctrlbrk
I thought it was fine. But no longer remember as that is when everything started breaking, lol...

OK, let me try remove and re-add.

04:40 PM [pve]~ root # pvecm qdevice remove
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed "/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service".
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Reloading corosync.conf...
Done

Removed Qdevice.
04:40 PM [pve]~ root # pvecm qdevice setup 192.168.1.253
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes

node 'pve': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve': Creating new key and cert db
node 'pve': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve': Importing CA
node 'pve2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve2': Creating new key and cert db
node 'pve2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve2': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key. This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-ifire.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve': Importing cluster certificate and key
node 'pve': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'pve2': Importing cluster certificate and key
node 'pve2': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.

INFO: start and enable corosync qdevice daemon on node 'pve2'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done

04:41 PM [pve]~ root # pvecm nodes

Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW pve (local)
2 1 A,V,NMW pve2
0 1 Qdevice
04:41 PM [pve]~ root # pvecm status
Cluster information
-------------------
Name: ifire
Config Version: 13
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Sun Mar 3 16:41:43 2024
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.4bd
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 192.168.1.251 (local)
0x00000002 1 A,V,NMW 192.168.1.252
0x00000000 1 Qdevice

Thank God!
 
I thought it was fine. But no longer remember as that is when everything started breaking, lol...

OK, let me try remove and re-add.

04:40 PM [pve]~ root # pvecm qdevice remove
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Removed "/etc/systemd/system/multi-user.target.wants/corosync-qdevice.service".
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable corosync-qdevice
Reloading corosync.conf...
Done

Removed Qdevice.
04:40 PM [pve]~ root # pvecm qdevice setup 192.168.1.253
/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
(if you think this is a mistake, you may want to use -f option)


INFO: initializing qnetd server
Certificate database (/etc/corosync/qnetd/nssdb) already exists. Delete it to initialize new db

INFO: copying CA cert and initializing on all nodes

node 'pve': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve': Creating new key and cert db
node 'pve': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve': Importing CA
node 'pve2': Creating /etc/corosync/qdevice/net/nssdb
password file contains no data
node 'pve2': Creating new key and cert db
node 'pve2': Creating new noise file /etc/corosync/qdevice/net/nssdb/noise.txt
node 'pve2': Importing CA
INFO: generating cert request
Creating new certificate request


Generating key. This may take a few moments...

Certificate request stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.crq

INFO: copying exported cert request to qnetd server

INFO: sign and export cluster cert
Signing cluster certificate
Certificate stored in /etc/corosync/qnetd/nssdb/cluster-ifire.crt

INFO: copy exported CRT

INFO: import certificate
Importing signed cluster certificate
Notice: Trust flag u is set automatically if the private key is present.
pk12util: PKCS12 EXPORT SUCCESSFUL
Certificate stored in /etc/corosync/qdevice/net/nssdb/qdevice-net-node.p12

INFO: copy and import pk12 cert to all nodes

node 'pve': Importing cluster certificate and key
node 'pve': pk12util: PKCS12 IMPORT SUCCESSFUL
node 'pve2': Importing cluster certificate and key
node 'pve2': pk12util: PKCS12 IMPORT SUCCESSFUL
INFO: add QDevice to cluster configuration

INFO: start and enable corosync qdevice daemon on node 'pve'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.

INFO: start and enable corosync qdevice daemon on node 'pve2'...
Synchronizing state of corosync-qdevice.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable corosync-qdevice
Created symlink /etc/systemd/system/multi-user.target.wants/corosync-qdevice.service -> /lib/systemd/system/corosync-qdevice.service.
Reloading corosync.conf...
Done

04:41 PM [pve]~ root # pvecm nodes

Membership information
----------------------
Nodeid Votes Qdevice Name
1 1 A,V,NMW pve (local)
2 1 A,V,NMW pve2
0 1 Qdevice
04:41 PM [pve]~ root # pvecm status
Cluster information
-------------------
Name: ifire
Config Version: 13
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Sun Mar 3 16:41:43 2024
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.4bd
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW 192.168.1.251 (local)
0x00000002 1 A,V,NMW 192.168.1.252
0x00000000 1 Qdevice

Thank God!

See, that's a case against ChaGPT. :D

Have fun!
 
(Note on the tailscale - as you probably realised, that was all 8006 port i.e. proxy traffic mixed in there. But one thing I wanted to say, if you ever need to put up a firewall or run qdevice over tailscale - keep in mind it's TCP 5403, different from the other corosync traffic.)