[SOLVED] SSL errors in cluster between certain nodes - can use the web gui SHELL feature but not migrate VMs

JamesT

New Member
Sep 10, 2020
29
16
3
Perth, Western Australia
Hi, I've searched extensively on google and in the forums and seen a couple of similar posts but nothing that is specifically matching my issue.
I've already tried doing pve updatecerts -f and rebooting nodes. pveupdate certs fixes the ssh shell between hosts for a while, but doesn't fix the error on migrating a VM. After a while, even ssh shell stops working.

My issue details:
I have a cluster with 3 nodes. Node1 has all the VMs on it. I was able to offline migrate a VM before adding the 3rd node, now I can't anymore
From Node1 (before pvecm updatecerts)
Action: "ssh node2" < ssh possible dns spoofing detected
Action: "ssh node3" < ssh possible dns spoofing detected

From node1 (after pvecm updatecerts)
Action: "ssh node2" < OK
Action: "ssh node3" < OK
Action: web gui, select node2, browse to system to view hosts file or syslog etc < OK
Action: web gui, select node3, browse to system to view hosts file or syslog etc < connection error 596

From any node web gui
Action: initiate online OR offline migration from node1 to any node: ssh possible dns spoofing detected.

When trying to view items under the System menu of node3 in the web console, doesn't matter if I'm doing this from the web console on node1 or node2, I get this error: "Connection error 596: tls_process_server_certificate: certificate verify failed". At the same time, I can click the shell menu and it works ok. At the same time, if I do "ssh node3" it gives the standard ssh key mismatch error "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!"

At this time, if I do pvecm updatecerts (with or without --force makes no difference) it will fix the "ssh nodeX" for a while, until that seems to revert after some time, I'm not sure how long. maybe 30 minutes, maybe an hour or a couple of hours.

I've tried deleting /etc/ssh/ssh_known_hosts and letting pvecm updatecerts regenerate it, I've tried pvecm updatecerts -f on all 3 nodes then rebooting node2 and node3.
I've compared the public key from the affected target node against whats in the ssh_known_hosts file on the source host, and both are the same.

Need an expert and experienced hand here - many thanks in advance.
 
After restarting a few services on node3, it is now accessible fine via the web console. I still cannot do online migration for VMs, getting the following error:
Code:
2020-09-10 14:37:51 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node2' root@192.168.0.33 /bin/true
2020-09-10 14:37:51 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2020-09-10 14:37:51 @    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
2020-09-10 14:37:51 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2020-09-10 14:37:51 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2020-09-10 14:37:51 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2020-09-10 14:37:51 It is also possible that a host key has just been changed.
2020-09-10 14:37:51 The fingerprint for the ECDSA key sent by the remote host is
2020-09-10 14:37:51 SHA256:CDD0dEayuqVl+aov5dtRlMhPs3JtRmM7zUrl38Bf5e4.
2020-09-10 14:37:51 Please contact your system administrator.
2020-09-10 14:37:51 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
2020-09-10 14:37:51 Offending RSA key in /etc/ssh/ssh_known_hosts:5
2020-09-10 14:37:51   remove with:
2020-09-10 14:37:51   ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "node2"
2020-09-10 14:37:51 ECDSA host key for node2 has changed and you have requested strict checking.
2020-09-10 14:37:51 Host key verification failed.
2020-09-10 14:37:51 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted
 
Hello,

Can you run the following commands on each node and then restart

Bash:
- cd /root/.ssh

- mv id_rsa id_rsa.old

- mv id_rsa.pub id_rsa.pub.old

- mv config config.old

- pvecm updatecerts

if that not help please post pveversion -v
 
Great! please mark the thread as [SOLVED] to help other people who have the same problem Thanks!

Have a nice day :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!