Remote Host ID (pvecm updatecerts doesnt resolve)

Loxion · Feb 9, 2024

Hi all

Been using Proxmox for a while and never had any issues, at least none I couldn't find a solution to online or on these forums.

Unfortunately this time I have been unsuccessful in finding detail.

I have 3 nodes in my cluster, lets call then node 1, node 2 and node 3 for simplicity, until now I have had no issues and have removed nodes, rebuilt the servers, readded nodes etc with no problems.

So, I had the good old "WARNING REMOTE HOST IDENTIFICATION HAS CHANGED" message on Node 2 when accessiing the Shell via the Proxmox web gui. I could SSH into the server fine using Mobaxterm. Searched around and found the command update certs (pvecm updatecerts -f), ran than on Node 2 and could get back into the shell via the webgui.

However, after doing that I then go the error on Node 1. If I run the pvecm updatecerts -f command on node 1 I can get back in, but I get the error again on node 2.
Node 3 seems to be unaffected.

Points to note that could be factors, but I dont know enough to resolve:
- I have rebuilt node 2 today to try and resolve. There were not VMs on it previously and it is blank now.
- I had rebuilt node 1 a couple of days ago, all working before and after, I just wanted to completely rebuild my drives and config setup.
- Node 3 has been up and running for some time, running a single VM.
- Getting the error only stops me accessing the Shell from the web GUI, SSH via mobaxterm works fine.

Any info, ideas or things to have a look at would be very much appreciated.

Thanks

esi_y · Feb 10, 2024

Have a look here: https://forum.proxmox.com/threads/pvecm-updatecert-f-not-working.135812/page-3#post-619501

Or a quick one here: https://forum.proxmox.com/threads/s...s-how-to-bypass-ssh-known_hosts-bug-s.137809/

Known bug.

Loxion · Feb 10, 2024

Thanks, obviously didnt look deep enough. Found various posts about single node updatecerts but not those.

Noted for future reference.

esi_y · Feb 10, 2024

Loxion said:
Thanks, obviously didnt look deep enough. Found various posts about single node updatecerts but not those.

Noted for future reference.

No worries. Actually it's good that it keeps popping up on the forum with more and more information. Hopefully it helps to get it fixed sooner rather than later.

The symptoms by you are very good description for others to recognise as well.

Loxion said:
found the command update certs (pvecm updatecerts -f), ran than on Node 2 and could get back into the shell via the webgui.

However, after doing that I then go the error on Node 1. If I run the pvecm updatecerts -f command on node 1 I can get back in, but I get the error again on node 2.
Node 3 seems to be unaffected.

Other times this goes unnoticed as missing symliks or failing VNC relay connections or disrupted replications.

Loxion · Feb 11, 2024

tempacc346235 said:
No worries. Actually it's good that it keeps popping up on the forum with more and more information. Hopefully it helps to get it fixed sooner rather than later.

The symptoms by you are very good description for others to recognise as well.

Other times this goes unnoticed as missing symliks or failing VNC relay connections or disrupted replications.

I was getting failed VNC relay connections also.

Workaround from my perspective is to just use SSH via Mobaxterm or Putty. Just means having to manually run the apt update / apt upgrade commands which isnt a big issue as I do that from my Ubuntu VMs anyway. All mine are CLI installs so no GUI to worry about.

esi_y · Feb 11, 2024

Loxion said:
I was getting failed VNC relay connections also.

Workaround from my perspective is to just use SSH via Mobaxterm or Putty. Just means having to manually run the apt update / apt upgrade commands which isnt a big issue as I do that from my Ubuntu VMs anyway. All mine are CLI installs so no GUI to worry about.

It's basically because of the corruption of known_hosts, the SSH-dependent [1] features are impacted. Unless you have shared storage for everything, a migration would fail as well. If you have ZFS replication, that would fail, so HA is impacted too. Anything other added on by the user that utilises the built-in SSH host keys (connecting from one node to another, connecting to nodes from a non-node machine will keep working) would be failing too. You cannot run ssh-keygen -R -f without causing even more damage [2].

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_role_of_ssh_in_proxmox_ve_clusters
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4252

Loxion · Feb 11, 2024

tempacc346235 said:
It's basically because of the corruption of known_hosts, the SSH-dependent [1] features are impacted. Unless you have shared storage for everything, a migration would fail as well. If you have ZFS replication, that would fail, so HA is impacted too. Anything other added on by the user that utilises the built-in SSH host keys (connecting from one node to another, connecting to nodes from a non-node machine will keep working) would be failing too. You cannot run ssh-keygen -R -f without causing even more damage [2].

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_role_of_ssh_in_proxmox_ve_clusters
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4252

Thanks.

So a direct migration would fail (I haven't done that) but I can restore OK from backups that were machines running on a different host. My VMs are backed up daily to a NAS.

For me, I can work around it as the only direct impact is doing updates via the Proxmox GUI, but I can do them OK via direct SSH to the host.

Each host has its own local storage with the backups being done daily to a NAS using Proxmox Backup Server. I can restore those backups to any host (have done one today) if needed, so all good. I am a pretty light home user to be honest

, and don't utilise or have any real need for HA.

Thanks for all the info.

Search

Search

Remote Host ID (pvecm updatecerts doesnt resolve)

Loxion

New Member

esi_y

Renowned Member

Loxion

New Member

esi_y

Renowned Member

Loxion

New Member

esi_y

Renowned Member

Loxion

New Member