Can anyone please help me! I am burning out! Issues after deleting and upgrading node.

jtwicker23

New Member
Mar 9, 2024
5
0
1
TLDR: Deleted and upgraded a node and now I get Host key verification failed on old nodes and cannot view the console on my VM due to failed to connect to server error.

Back story:
I setup my config many moons ago, but always wanted to have 3 nodes for some replication/HA. Worst timing ever, but I came down with COVID, so I get to spend the holidays mostly alone which is beyond depressing for me. Now that I am feeling well enough to get out of bed, I am taking this alone time getting everything configured correctly and I have ran into issue after issue further depressing me. I realized I should have slowed down and read more documentation before going forward. but I am hopeful someone is jolly enough to help! :)

What I did was backed up my VMs to external NAS and restored them onto one of my ZFS Nodes. It took me a while to figure that out and I am not 100% I did it to the books, but all the VMs are working correctly and most importantly my Pi-hole is running.

Next, I took the original node offline, and "reimaged' the computer with the latest version of prox via USB. Once I got that setup, I realized I didn't remove the old node fully and had to research that. Finally got that removed.

Now when I try to go to the Shell via the GUI I get Host Key verification failed. I also cannot see my VMs console as I get a failed to connect to server error. I can reach the nodes via SSH though. I have tried all the commands I can find online to edit the host files and what not, but nothing seems to work. I then thought it was because of a mismatch of the node versions because the newest one works fine. It's just the oldest two that are not.

I kept trying to get updates, but they would fail. I think it might have been my pi hole, so I set the network interface gateway on all nodes to 8.8.8.8 and ran apt-get dist-upgrade and this thing has been going forever. Hopefully this is the fix, but I am starting to get dangerously tired and need to rest before my symptoms kick my ass even more, so I figured I would make this post and hope someone can tell me if I am going down the right path or if there are other suggestions I should try when I get up tomorrow.

It would really mean a lot. Happy Holidays!
 
Last edited:
So I assume:

1. You have already added the new node correctly to the cluster.
2. You have finally deleted/removed that old node.
3. All nodes have been rebooted.

Have you tried the following:

Code:
pvecm updatecerts
systemctl restart pveproxy

As a further step you could try:
Code:
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "<new.node.ip>" on every node

so I set the network interface gateway on all nodes to 8.8.8.8
I don't understand - you should choose the actual GW of that NW e.g. 192.168.1.1 etc.
 
Sounds like your newly installed host has dns resolution issues.

check that:
/etc/hosts entry for your hostname matches hostname, as well as the hostnames of the rest of the cluster resolving the ip addresses you use for corosync
your /etc/resolv.conf has valid dns servers. use dig to verify.
that both systemctl status pvestatd and pveproxy show up and running.