Remote Host ID (pvecm updatecerts doesnt resolve)

Loxion

New Member
Feb 9, 2024
8
0
1
Hi all

Been using Proxmox for a while and never had any issues, at least none I couldn't find a solution to online or on these forums.

Unfortunately this time I have been unsuccessful in finding detail.

I have 3 nodes in my cluster, lets call then node 1, node 2 and node 3 for simplicity, until now I have had no issues and have removed nodes, rebuilt the servers, readded nodes etc with no problems.

So, I had the good old "WARNING REMOTE HOST IDENTIFICATION HAS CHANGED" message on Node 2 when accessiing the Shell via the Proxmox web gui. I could SSH into the server fine using Mobaxterm. Searched around and found the command update certs (pvecm updatecerts -f), ran than on Node 2 and could get back into the shell via the webgui.

However, after doing that I then go the error on Node 1. If I run the pvecm updatecerts -f command on node 1 I can get back in, but I get the error again on node 2.
Node 3 seems to be unaffected.

Points to note that could be factors, but I dont know enough to resolve:
- I have rebuilt node 2 today to try and resolve. There were not VMs on it previously and it is blank now.
- I had rebuilt node 1 a couple of days ago, all working before and after, I just wanted to completely rebuild my drives and config setup.
- Node 3 has been up and running for some time, running a single VM.
- Getting the error only stops me accessing the Shell from the web GUI, SSH via mobaxterm works fine.

Any info, ideas or things to have a look at would be very much appreciated.

Thanks
 
Thanks, obviously didnt look deep enough. Found various posts about single node updatecerts but not those.

Noted for future reference.
 
Thanks, obviously didnt look deep enough. Found various posts about single node updatecerts but not those.

Noted for future reference.
No worries. Actually it's good that it keeps popping up on the forum with more and more information. Hopefully it helps to get it fixed sooner rather than later.

The symptoms by you are very good description for others to recognise as well.
found the command update certs (pvecm updatecerts -f), ran than on Node 2 and could get back into the shell via the webgui.

However, after doing that I then go the error on Node 1. If I run the pvecm updatecerts -f command on node 1 I can get back in, but I get the error again on node 2.
Node 3 seems to be unaffected.

Other times this goes unnoticed as missing symliks or failing VNC relay connections or disrupted replications.
 
No worries. Actually it's good that it keeps popping up on the forum with more and more information. Hopefully it helps to get it fixed sooner rather than later.

The symptoms by you are very good description for others to recognise as well.


Other times this goes unnoticed as missing symliks or failing VNC relay connections or disrupted replications.
I was getting failed VNC relay connections also.

Workaround from my perspective is to just use SSH via Mobaxterm or Putty. Just means having to manually run the apt update / apt upgrade commands which isnt a big issue as I do that from my Ubuntu VMs anyway. All mine are CLI installs so no GUI to worry about.
 
I was getting failed VNC relay connections also.

Workaround from my perspective is to just use SSH via Mobaxterm or Putty. Just means having to manually run the apt update / apt upgrade commands which isnt a big issue as I do that from my Ubuntu VMs anyway. All mine are CLI installs so no GUI to worry about.

It's basically because of the corruption of known_hosts, the SSH-dependent [1] features are impacted. Unless you have shared storage for everything, a migration would fail as well. If you have ZFS replication, that would fail, so HA is impacted too. Anything other added on by the user that utilises the built-in SSH host keys (connecting from one node to another, connecting to nodes from a non-node machine will keep working) would be failing too. You cannot run ssh-keygen -R -f without causing even more damage [2].

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_role_of_ssh_in_proxmox_ve_clusters
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4252
 
It's basically because of the corruption of known_hosts, the SSH-dependent [1] features are impacted. Unless you have shared storage for everything, a migration would fail as well. If you have ZFS replication, that would fail, so HA is impacted too. Anything other added on by the user that utilises the built-in SSH host keys (connecting from one node to another, connecting to nodes from a non-node machine will keep working) would be failing too. You cannot run ssh-keygen -R -f without causing even more damage [2].

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_role_of_ssh_in_proxmox_ve_clusters
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4252
Thanks.

So a direct migration would fail (I haven't done that) but I can restore OK from backups that were machines running on a different host. My VMs are backed up daily to a NAS.

For me, I can work around it as the only direct impact is doing updates via the Proxmox GUI, but I can do them OK via direct SSH to the host.

Each host has its own local storage with the backups being done daily to a NAS using Proxmox Backup Server. I can restore those backups to any host (have done one today) if needed, so all good. I am a pretty light home user to be honest :), and don't utilise or have any real need for HA.

Thanks for all the info.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!