New nodes joined to existing cluster, 8.2 updates making SSH key trusts not auto fixed and manual steps needed

BloodyIron

Renowned Member
Jan 14, 2013
302
27
93
it.lanified.com
So I have two new PVE Nodes that were recently provisioned with PVE v8.2, but were later updated to latest as of today (v8.2.4?).

Anyways, I'm now having the silly SSH Key management junk that's been going around with the PVE V8.1/v8.2 changes. The other cluster nodes are on older versions of PVE (v8.1 ish) as they need updating love (tell me about it), but threads like this are demonstrating to me there's no proper solution, nor documentation, on scenarios like this : https://forum.proxmox.com/threads/host-key-verification-failed-when-trying-to-migrate-vm.146299/

running "pve updatecerts -f" (or without -f) on the "older" nodes does nothing to fix the situation.

So I'm probably going to have to go to each PVE "old" node and try to SSH to the new PVE Nodes just to skirt this silly situation. This is the very kind of thing the cluster automations should handle and this update shouldn've have left under "RELEASE" state, and considering it's a known issue for months now, it blows me away that this still doesn't have a nice fix in the non-sub repos.

Now I'm going to do the hacky-work around for now because I have work to do, and if there's something I should do instead to correct this, please let me know.
 
omfg and that method doesn't even actually solve the problem, this is just such a shit show... I just wanted to add two new PVE Nodes to this cluster and I'm burning way too much time on this SSH BS that should've been solved months ago >:|
 
Okay I actually need to ssh FROM every node in the cluster TO every node in the cluster to generate the known_hosts trust. guh this is a cluster-truck.
 
For future human purposes:

I created a list of commands containing all the nodes in it, and executed this on every node via CLI. THIS IS NOT THE IDEAL WAY TO DO THIS AND I KNOW IT IS A BAD PRACTICE BUT FOR NOW THIS IS GOOD ENOUGH:

ssh -o StrictHostKeyChecking=accept-new -t HostName1 'exit'
ssh -o StrictHostKeyChecking=accept-new -t HostName2 'exit'
ssh -o StrictHostKeyChecking=accept-new -t HostName3 'exit'

etc..


So what this does is force-accept the fingerprint of what it is connecting to and issuing the exit command to disconnect. Pasting all the lines means this happens in rapid succession. IF YOU DO NOT UNDERSTAND THE SECURITY RAMIFICATIONS OF WHY THIS MAY NOT BE A GOOD IDEA PLEASE SLOW DOWN AND MAYBE NOT USE THIS METHOD! Blindly accepting SSH Server key fingerprints has security implications if it is connecting to a system that is compromised or untrusted you might blindly trust something you don't want to!

I'm posting this to help other humans and hopefully light more fire stuff under the devs to correct this silliness because PVE is awesome and I'd love for this to be fixed.
 
Ugh this didn't even properly fix it anyways... randomly logged into webGUI of a rando node and tried to webCLI Shell to one of the new nodes and still asked me for fingerprint... fuck this is so stupid
 
Yeah doing the same method by IPs for all nodes on all nodes did the trick. Hoping this has a proper solution soon that is automated by the cluster. Hopefully someone gets helped by this work-around for now.
 
  • Like
Reactions: Archer_TRS

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!