VM noVNC failures traceable to machines built from 8.1-1 November 23 ISO

TimRyan · Dec 2, 2023

I have been chasing this problem for some time. I have an 8 node ProxMox cluster built from Dell R610 and R710 servers. Some of these were built from the PVE 7 build ISO's but the majority now have been built from a clean scrubbed proxmox-ve_8.1-1 disc image written to a USB device using Rufus 4.3

The upgraded from PVE7 ISO servers are reliably able to deliver a noVNC image of the hosted VM's. The PVE 8.1-1 builds are not. Occasionally I have been able to make thus work by way of external ssh sessions to a defined IP address but never from the PVE central GUI.

The primary host node prox-1 and all nodes running the PVE-Debian os are able to deliver noVNC connections to their CLI. However the hosted VM instances on the nodes built from the PVE 8.1-1 ISO's all fail with this sort of error message;

https://192.168.0.112:8006/?console=kvm&novnc=1&vmid=126&vmname=ubuserve-9&node=prox-8&resize=off&cmd=
The cluster host node prox-1, a Dell R610 built on the PVE 7 ISO is the source of the error message.

Nodes prox-1 through prox-5 were built from PVE 6 and 7 ISO's and upgraded to the latest releases 6.5.11-6-pve 8.1.3 Nodes prox-6 to prox-8 have been wiped and rebuilt completely from the PVE 8.1-1 ISO and updated as well, but none of them have been able to support delivering noVNC console sessions from the Ubuntu 22.04.3 iso VM's.

Yesterday after a 2 SAS disk failure I was forced to update and upgrade the prox-3 node with a new SAS controller and disks using the new 8.1-1 ISO and the resulting reinstalled node prox-3 exhibits the same anomaly of noVNC console session failures.

What am I missing here? Proxmox is a wonderful solution but for this single issue.

dcsapak · Dec 7, 2023

TimRyan said:
Nodes prox-1 through prox-5 were built from PVE 6 and 7 ISO's and upgraded to the latest releases 6.5.11-6-pve 8.1.3 Nodes prox-6 to prox-8 have been wiped and rebuilt completely from the PVE 8.1-1 ISO and updated as well, but none of them have been able to support delivering noVNC console sessions from the Ubuntu 22.04.3 iso VM's.

Yesterday after a 2 SAS disk failure I was forced to update and upgrade the prox-3 node with a new SAS controller and disks using the new 8.1-1 ISO and the resulting reinstalled node prox-3 exhibits the same anomaly of noVNC console session failures.

what do you mean exactly with 'novnc console session failures' ?

because this:

TimRyan said:
However the hosted VM instances on the nodes built from the PVE 8.1-1 ISO's all fail with this sort of error message;

https://192.168.0.112:8006/?console=kvm&novnc=1&vmid=126&vmname=ubuserve-9&node=prox-8&resize=off&cmd=

is not an error message?

did you apply updates on the new 8.1 installations after installing from the iso? there might be newer packages available in the repos already
if yes, can you please post the output of 'pveversion -v' on an upgraded machine vs a newly installed one? (if all are up-to-date, there should not really be a difference, aside maybe from installed kernels)

TimRyan · Dec 9, 2023

Let me clarify this. I have an 8 node proxmox system built on two Dell R610's prox-1 and prox-2, and prox-1 is the cluster host. The other 6 nodes prox-3 to prox-8 are all Dell R710's . All nodes have BBU SAS RAID storage arrays and a pair of Xeon CPU's and 144 GB of ram.
All units have all been upgraded to Linux prox-1 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64

The problem I have is that KVM VM's built using Ubuntu 22.04.3 built on some of the nodes function for vncproxy and some do not. These are examples drawn from the prox-1 cluster host node syslog

Dec 08 15:16:05 prox-1 pvedaemon[305194]: starting vnc proxy UPID

rox-1:0004A82A:0080985F:6573A3B5:vncproxy:100:root@pam:
Dec 08 15:16:05 prox-1 pvedaemon[1191]: <root@pam> starting task UPID

rox-1:0004A82A:0080985F:6573A3B5:vncproxy:100:root@pam:

Dec 08 15:11:58 prox-1 pvedaemon[303639]: starting vnc proxy UPID

rox-1:0004A217:008037CC:6573A2BE:vncproxy:131:root@pam:
Dec 08 15:11:58 prox-1 pvedaemon[303639]: Failed to run vncproxy.
Dec 08 15:11:58 prox-1 pvedaemon[1190]: <root@pam> end task UPID

rox-1:0004A217:008037CC:6573A2BE:vncproxy:131:root@pam: Failed to run vncproxy.

VM131 is an Ubuntu 22.04.3 server VM that was built on the prox-8 node built from the PVE 8.0.1 ISO and added to the cluster. It was on the cluster before the VM was constructed.

VM100 is an Ubuntu 22.04.2 server VM that was built on the prox-2 node built from the PVE 7 ISO and added to the cluster. It was on the cluster before the VM was constructed.

Since the original cluster was built all nodes have been updated to the same release at the same time. The challenge is that only some of the VM's built on the earlier nodes are able to use the vncProxy successfully. I am able to work around the problem by using a root session from the actual physical host node instead of the cluster host to be able to complete the VM installation and configure the VM to be able to handle external ssh sessions, but its a workaround at best.

I need to understand why this occurs and what is required to eliminate the problem.
One more question, why does this string seen below have a smiley substituting for : p when I preview?
UPID

rox-1

dcsapak · Dec 11, 2023

is the ssh access between nodes working correctly (e.g. from node1 to node2 'ssh -e none -o BatchMode=yes -o HostKeyAlias=node2nodename /bin/true' ) from all nodes to all nodes?
can the nodes resolve each other correctly with the nodename ?

TimRyan · Dec 11, 2023

Please let me know if I am using this properly. I am assuming that I execute this from the node 1 shell as root ?

ssh -e none -o BatchMode=yes -o HostKeyAlias=node2nodename /bin/true

And I get this response;

ssh: Could not resolve hostname /bin/true: Name or service not known

dcsapak · Dec 12, 2023

ah sorry, i forgot to add the ip of the node of course ^^
so the correct command would be:

Code:

ssh -e none -o BatchMode=yes -o HostKeyAlias=node2nodename <ip-of-node2> /bin/true

TimRyan · Dec 12, 2023

All the nodes with the failing noVNC display issues return this response from the host node shell

Command string,

ssh -e none -o BatchMode=yes -o HostKeyAlias=prox-8 192.168.0.124 /bin/true

result,

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:m1VTmhISFnc7EMWZSruLZ0n7sXELG3ng6pa1/C3E51c.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending RSA key in /etc/ssh/ssh_known_hosts:15
remove with:
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-8"
Host key for prox-8 has changed and you have requested strict checking.
Host key verification failed.

In my case, the host node is prox-1
the failing nodes are prox-3, prox-6, and prox-8
All three of these nodes show this error and fail to load noVNC sessions from the console.

executing the ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-8" on prox-8 for example returned evidence of a fixed known hosts entry, but did not resolve the failing noVNC session issue in the console.

running that command on the prox-1 shell produced this result for each of the failing nodes

root@prox-1:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-8"
Host prox-8 not found in /etc/ssh/ssh_known_hosts
root@prox-1:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-6"
Host prox-6 not found in /etc/ssh/ssh_known_hosts
root@prox-1:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-3"
Host prox-3 not found in /etc/ssh/ssh_known_hosts

Examination of the prox-1 /etc/ssh/ssh_known_hosts file revealed that there were no entries at all for the prox-3, 6, and 8 nodes exhibiting the problem and returning the error message.

What would cause this and how do I fix it? I have worked around this problem by way of using powershell ssh sessions to the failing nodes VM's but its not a pretty fix.

After a close examination of the node-1 ssh_known_hosts file there are no entries for the three nodes whose sessions fail. How can I force a rebuild of the file?

esi_y · Dec 13, 2023

TimRyan said:
All the nodes with the failing noVNC display issues return this response from the host node shell

Command string,

ssh -e none -o BatchMode=yes -o HostKeyAlias=prox-8 192.168.0.124 /bin/true

result,

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:m1VTmhISFnc7EMWZSruLZ0n7sXELG3ng6pa1/C3E51c.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending RSA key in /etc/ssh/ssh_known_hosts:15
remove with:
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-8"
Host key for prox-8 has changed and you have requested strict checking.
Host key verification failed.

In my case, the host node is prox-1
the failing nodes are prox-3, prox-6, and prox-8
All three of these nodes show this error and fail to load noVNC sessions from the console.

executing the ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-8" on prox-8 for example returned evidence of a fixed known hosts entry, but did not resolve the failing noVNC session issue in the console.

running that command on the prox-1 shell produced this result for each of the failing nodes

root@prox-1:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-8"
Host prox-8 not found in /etc/ssh/ssh_known_hosts
root@prox-1:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-6"
Host prox-6 not found in /etc/ssh/ssh_known_hosts
root@prox-1:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-3"
Host prox-3 not found in /etc/ssh/ssh_known_hosts

Examination of the prox-1 /etc/ssh/ssh_known_hosts file revealed that there were no entries at all for the prox-3, 6, and 8 nodes exhibiting the problem and returning the error message.

What would cause this and how do I fix it? I have worked around this problem by way of using powershell ssh sessions to the failing nodes VM's but its not a pretty fix.

After a close examination of the node-1 ssh_known_hosts file there are no entries for the three nodes whose sessions fail. How can I force a rebuild of the file?

You are chasing a 10 years old bug, for which there is a patch, there is a workaround and it is being ignored by PVE team as it is "a niche case".

See (and many others): https://forum.proxmox.com/threads/cannot-migrate-vm-ct-due-to-ssh-key-error.133560/#post-614183

dcsapak · Dec 13, 2023

/etc/ssh/ssh_known_hosts on pve hosts is normally a symlink to /etc/pve/priv/known_hosts so you probably use that as a path (and restore the symlink if it's not one anymore)

TimRyan said:
. How can I force a rebuild of the file?

if the known_hosts file is a symlink to /etc/pve/priv/known_hosts this should be automatic, since that is clusterwide.
otherwise simply connecting via ssh with the command above, but without '-o BatchMode yes'

esi_y · Dec 13, 2023

dcsapak said:
/etc/ssh/ssh_known_hosts on pve hosts is normally a symlink to /etc/pve/priv/known_hosts so you probably use that as a path (and restore the symlink if it's not one anymore)

if the known_hosts file is a symlink to /etc/pve/priv/known_hosts this should be automatic, since that is clusterwide.
otherwise simply connecting via ssh with the command above, but without '-o BatchMode yes'

@dcsapak His symlink gets corrupted when he runs the ssh-keygen -R [1], it does get reinstated after pvecm updatecerts, but that tool uses a vhash when traversing the known_hosts in such a way that it drops everything but the oldest entry (it uses just the alias as the Perl hash key).

How can one get ETA on a patch [2]?

https://github.com/proxmox/pve-clus...a11b0f864f5b9dc/src/PVE/Cluster/Setup.pm#L223

Perl:

--- Setup.pm    2023-07-01
+++ Setup.pm    2023-11-18
@@ -263,11 +263,13 @@
     return if $line =~ m/^#/; # skip comments
 
     if ($line =~ m/^(\S+)\s(ssh-rsa\s\S+)(\s.*)?$/) {
-        my $key = $1;
+
+        my $pattern = $1;
         my $rsakey = $2;
+        my $key = $1.$2;
         if (!$vhash->{$key}) {
         $vhash->{$key} = 1;
-        if ($key =~ m/\|1\|([^\|\s]+)\|([^\|\s]+)$/) {
+        if ($pattern =~ m/\|1\|([^\|\s]+)\|([^\|\s]+)$/) {
             my $salt = decode_base64($1);
             my $digest = $2;
             my $hmac = Digest::HMAC_SHA1->new($salt);
@@ -291,10 +293,10 @@
             return;
             }
         } else {
-            $key = lc($key); # avoid duplicate entries, ssh compares lowercased
-            if ($key eq $ip_address) {
+            $pattern = lc($pattern); # avoid duplicate entries, ssh compares lowercased
+            if ($pattern eq $ip_address) {
             $found_local_ip = 1 if $rsakey eq $hostkey;
-            } elsif ($key eq $nodename) {
+            } elsif ($pattern eq $nodename) {
             $found_nodename = 1 if $rsakey eq $hostkey;
             }
         }

It's been waiting there as attachment since November 18.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=4252
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=4886

dcsapak · Dec 13, 2023

if you want your patches to be applied, please adhere to https://pve.proxmox.com/wiki/Developer_Documentation (CLA, mailing list, etc.) or wait until someone of us implements it

esi_y · Dec 13, 2023

dcsapak said:
if you want your patches to be applied, please adhere to https://pve.proxmox.com/wiki/Developer_Documentation (CLA, mailing list, etc.) or wait until someone of us implements it

That's the problem, we had some sort of disagreement in the initial discussion of the bugreports (the two linked and related ones), I have since nailed down the bug - it's 5 lines to edit, literally. Once you see it, you cannot unsee it - it traverses the known_hosts line by line but uses just the alias/IP part of the pattern token. I gave up right there, I don't need anything, I have the patch, it's been posted into the bug, I posted it into whenever someone comes up with SSH-related issues such as migrations, replications, proxying, qdevice setup... that is weekly because like it or not, every time someone "reinstalls" any node they will go on to assign same IP to it, they will run into this bug...

I then went on to also have the PVE use SSH certs as POC:
https://forum.proxmox.com/threads/s...ass-ssh-known_hosts-bug-s.137809/#post-614017

Thomas himself had told me before it was too disruptive to implement (2+2 lines as it turns out), so when one is turned down with obvious things because we did not start on a good note, the motivation is sub-zero (to join dev list).

TimRyan · Dec 13, 2023

Hi Folks

I had no idea I was going to trigger a shitstorm of this magnitude, but let me add these comments and suggestions;

First to the ProxMox staff:

I have spent the last two years in an R&D effort to develop a strategy for Edge Cloud operations in rural fiber networks. I have tested and discarded a number of platforms. When I encountered ProxMox I had just given openSUSE the heave after wasting the better part of two years with OpenInfra and then openSUSE as platforms. I believe that ProxMox has and and is capable of being a critical and strategic platform for development of Edge Cloud systems to support rural Fiber Networks and community clouds. Are there snags, of course.

This particular snag is one that straddles both PVE 8, its supported VM and container OS's and the entire TLS3 security layer that underlies all of today's internet. It's not an edge or niche problem, its pervasive.

Whats more, its compounded by many less than fully defined DHCP and routing practices, and like much of what has been put together by the global IETF community, is a messy multi party community development process.

I am going to persevere with Proxmox. It's not perfect, but its the best I have seen in several years of looking. For the record I read the docs piece meal over the last six months and built an 8 node cluster behind a 1G fiber network point with a single static IP address. In order to manage this network I have fiber router holding the service SFP and providing a Gigabit network with routed access and DHCP services.

One of the first items I built out was a Caddy 2 reverse proxy host to provide TLS3 and LetsEncrypt certs for a number of PVE 7 and now PVE 8 hosted VM servers and web services. The snags in this support request were when after adding three more 2U Dell servers and upgrading to PVE 8 the cert headaches prevented use of the noVNC services on some of the cluster nodes.

For the Record, this forum is one of the things that the ProxMox proprietors have done very well. What I have come away with is a much better understanding of the issue, and how the ProxMox staff have managed to deal with many of them with or without paid support.

What today's traffic has revealed is both a better look at the certification complexities as they present inside the cluster, and the limitation of the cert process both internally in PVE 8 and externally in networks at large. Both the IETF and the ProxMox systems could do with a comprehensive review of best practices, technical requirements, and a user accessible documentation of the design goals as deployed and applied within PVE 8.

To DCSAPAK from Vienna;

Your shell script was a very enlightening tool which while it didn't solve the problem was instrumental in defining for me the specifics of the cause of the problem and a much better picture of what needs to be accomplished. Is there a specific documentation of the practices and procedures of pems and certs and the implementation in PVE 8 that I can use to chase this down?

Thanks and my regards to the ProxMox crew.

esi_y · Dec 13, 2023

TimRyan said:
This particular snag is one that straddles both PVE 8, its supported VM and container OS's and the entire TLS3 security layer that underlies all of today's internet. It's not an edge or niche problem, its pervasive.

The issue is ALSO that no one who ever comes across the same bug and finds out here in the forum, even after having been advised - goes on to +1 themselves in the Bugzilla reports. And so it may indeed appear that there's almost no one experiencing this issue from that perspective.

TimRyan said:
Your shell script was a very enlightening tool which while it didn't solve the problem was instrumental in defining for me the specifics of the cause of the problem and a much better picture of what needs to be accomplished. Is there a specific documentation of the practices and procedures of pems and certs and the implementation in PVE 8 that I can use to chase this down?

Which script do you mean?

TimRyan · Dec 13, 2023

tempacc346235 said:
Which script do you mean?

ssh -e none -o BatchMode=yes -o HostKeyAlias=node2nodename <ip-of-node2> /bin/true

Which made it possible for me to quickly identify both the immediate issue and the nodes it affected. I suggest to you that while there is an issue, and that it is pervasive, that it is also one that has compounding problems across the entirety of the global internet.

I like a lot of what I have seen so far with PVE. I started just as PVE 7.1 was released, and as it was fully FOSS i gave it a test. I was expanding my cluster as PVE 8 was released and that's when issues began to show up. A platform of these capabilities is inevitably complex, and I am just now fully aware of the complexity of the certs issue. What is obvious to me now is that this has been somewhat obfuscated probably for security reasons. Many software and systems providers and developers are using what is a complex stack cert systems and applications to secure them and because they are prone to attack if publicly defined we get to live in a grey area.

I now understand how I triggered the issues, I have a functional workaround that allows me to use the platform, but I need to fully understand where the problem triggers are and how to avoid causing issues.

To give you an example, run on the host shell on prox-1
root@prox-1:~# ssh -e none -o BatchMode=yes -o HostKeyAlias=prox-8 192.168.0.124 /bin/true
Host key verification failed.

This is one of three nodes that exhibit the noVNC session behaviour and the reason why.

I assume that you have done this deep dive on SSH https://en.wikipedia.org/wiki/Secure_Shell

esi_y · Dec 13, 2023

TimRyan said:
ssh -e none -o BatchMode=yes -o HostKeyAlias=node2nodename <ip-of-node2> /bin/true

I think Dominik simply gave you what he knew is being done when proxying VNC conns to see the actual error message. Eventually it would be for @dcsapak to answer you, I think it's well past the staff regular working time in Vienna now though.

The nodes rely on pveproxy in good portion of their comms, this is SSL based, but features such as proxying VNC rely on SSH. The SSH uses HostKeyAlias flag simply as indication which host key to check against the known_hosts file, which as the idea was, should be shared for all the nodes as a symlink through the RAM-held cluster filesystem.

Did you not consider to use the SSH certs instead? It is completely bypassing the bug (as I figured you do not want to apply the patch):
https://forum.proxmox.com/threads/s...ass-ssh-known_hosts-bug-s.137809/#post-614017

esi_y · Dec 13, 2023

tempacc346235 said:
Did you not consider to use the SSH certs instead? It is completely bypassing the bug (as I figured you do not want to apply the patch):
https://forum.proxmox.com/threads/s...ass-ssh-known_hosts-bug-s.137809/#post-614017

If you feel comfortable with Perl, you may simply patch it yourself - it does work with 8.1, basically it's just a script there that should be cleaning up duplicates from the shared known_hosts, mode details here: https://forum.proxmox.com/threads/pvecm-updatecert-f-not-working.135812/page-3#post-606413

But if PVE team does not patch it for next version, you'd lose this fix after update. Not that it would break anything, the bug would just be back.

Last option (for some preferred) is to simply wipe out the known_hosts and re-populate it again by running pvecm updatecerts on every single node manually. But again, this does not prevent the bug in the future.

Note: The SSH connections are only at play with VNC when you are connecting through another node's GUI, i.e. when the connection needs proxying into the host node (in respect to the VM/CTs).

TimRyan · Dec 14, 2023

My observations at this point,

I have one node of the eight I have that seems the generate a panic like this
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:m1VTmhISFnc7EMWZSruLZ0n7sXELG3ng6pa1/C3E51c.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending RSA key in /etc/ssh/ssh_known_hosts:15
remove with:
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-7"
Host key for prox-7 has changed and you have requested strict checking.
Host key verification failed.

When it does all of the later nodes fail their session requests from the host node. It can be quickly fixed

with an ssh session from my win11 workstation and this CLI call
root@prox-7:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "192.168.0.122"
# Host 192.168.0.122 found: line 24
/etc/ssh/ssh_known_hosts updated.
Original contents retained as /etc/ssh/ssh_known_hosts.old

And then pvecm update certs on the otherwise functional nodes to restore noVNC auths from the cluster host.

What the trigger cause of these issues are is the question I needs to resolve.

All of the VM's in service, roughly 40 Ubuntu 220.4.3 servers and my Caddy 2.7.6 Reverse proxy server are fully functiona and being served without issue, including the web server I am working on at https://comxpertise.ca/

Why this SSH issue arises in at this point a question and a small irritation.

esi_y · Dec 14, 2023

TimRyan said:

My observations at this point,

I have one node of the eight I have that seems the generate a panic like this

Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:m1VTmhISFnc7EMWZSruLZ0n7sXELG3ng6pa1/C3E51c.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending RSA key in /etc/ssh/ssh_known_hosts:15
 remove with:
 ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "prox-7"
Host key for prox-7 has changed and you have requested strict checking.
Host key verification failed.

Not sure if you meant one node that you get issues connecting to (likely) or from. This is the same issue discussed in bug 4886:
https://bugzilla.proxmox.com/show_bug.cgi?id=4886

TimRyan said:
When it does all of the later nodes fail their session requests from the host node. It can be quickly fixed

with an ssh session from my win11 workstation and this CLI call

Code:

root@prox-7:~# ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "192.168.0.122" # Host 192.168.0.122 found: line 24 /etc/ssh/ssh_known_hosts updated. Original contents retained as /etc/ssh/ssh_known_hosts.old

The problem is, at this point, you inadvertently killed the symlink, as described in bug:
https://bugzilla.proxmox.com/show_bug.cgi?id=4252

TimRyan said:
And then pvecm updatecerts on the otherwise functional nodes to restore noVNC auths from the cluster host.

Well, try to run the same (unpatched) also on the offending node, it will likely break it down again. It really depends in which order you run on the nodes, which nodes (if selectively) and which nodes may be re-added later that cause this to come back. This is why this advice is not for everyone and will come back later on "surprisingly".

TimRyan said:
What the trigger cause of these issues are is the question I needs to resolve.

VM noVNC failures traceable to machines built from 8.1-1 November 23 ISO

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Member

Renowned Member

Member

Renowned Member

Renowned Member

Member

Renowned Member

Renowned Member