VM noVNC failures traceable to machines built from 8.1-1 November 23 ISO

dcsapak · Dec 15, 2023

@TimRyan
do i understand correctly that it still does not work?

i'd make sure that on every node the file /etc/ssh/ssh_known_hosts is a symlink to /etc/pve/priv/known_hosts
e.g. with

Code:

ln -s /etc/pve/priv/known_hosts /etc/ssh/ssh_known_hosts

(you might need to remove the original file first)
and then update the known_hosts file by executing

Code:

pvecm updatecerts

on every node

this should fix the known_hosts file and allow proper ssh tunneling again
if in the future you remove a node from the cluster, please do as the documentation says:

After removal of the node, its SSH fingerprint will still reside in the known_hosts of the other nodes. If you receive an SSH error after rejoining a node with the same IP or hostname, run pvecm updatecerts once on the re-added node to update its fingerprint cluster wide.

TimRyan · Dec 15, 2023

Thanks
You have restored my faith in ProxMox. Thanks for the Moximum Effort!

esi_y · Dec 15, 2023

TimRyan said:
Thanks
You have restored my faith in ProxMox. Thanks for the Moximum Effort!

Tim, did you get it working before or with the last piece of advice?

TimRyan · Dec 15, 2023

The problem was a messy and confused set of ssh keys compounded by the pattern of upgrades and my lack of understanding the mechanisms used internally at the cluster level. When you create confused garbage and it is replicated all over the cluster. It becomes obvious that you have to houseclean the existing rubbish to restore order. Its a tough way to learn and a lot of lost time, but know that I know how the cluster host and key replication works, even if I break it again I will know how to fix it.

TimRyan · Dec 15, 2023

dcsapak said:
@TimRyan
do i understand correctly that it still does not work?

i'd make sure that on every node the file /etc/ssh/ssh_known_hosts is a symlink to /etc/pve/priv/known_hosts
e.g. with

Code:

ln -s /etc/pve/priv/known_hosts /etc/ssh/ssh_known_hosts

(you might need to remove the original file first)
and then update the known_hosts file by executing

Code:

pvecm updatecerts

on every node

this should fix the known_hosts file and allow proper ssh tunneling again
if in the future you remove a node from the cluster, please do as the documentation says:

TimRyan · Dec 15, 2023

Let's face it, the construction and "logical operational plumbing" of a cluster is anything but simple. When that is compounded by the complexities of key encrypted communications amongst the pile of components in a well populated cluster, maintaining secure and reliable functionality is anything but simple.

I have read almost all of the documentation once, and the challenge is getting a really good picture of how all the pieces fit together and interact, compounded by the continuous stream of updates and changes. I have been hacking at this platform for about 8 months now from the PVE 6 to 7 transition which is where I started to PVE 8 today. I have seen a number of complications including this last one which was triggered by my expansion build out coinciding with PVE8. now that I fully understand how the Host node and all others interact as far as communications and secure sessions I am pretty sure that its a viable platform. The tests I need to do now to satisfy my self of the platform viability are these;

How easy is it to secure while using it as a reverse proxy routed server platform to maximize static IP efficiency with Caddy 2.7.6

Where might there be security issues with segmented user groups using parts of a single physical site cluster

What will have to be done to make this viable as a bridged cluster with physically separated sites on secured fiber backbones. Will is still be manageable from a single location.

Once I have dealt with these issues it will be ready for prime time and commercial services offerings. When I get there I will write up the build and system so others can benefit from my effort. I think I have what I need for my Proof of Concept first site.

esi_y · Dec 16, 2023

TimRyan said:
Let's face it, the construction and "logical operational plumbing" of a cluster is anything but simple. When that is compounded by the complexities of key encrypted communications amongst the pile of components in a well populated cluster, maintaining secure and reliable functionality is anything but simple.

I have read almost all of the documentation once, and the challenge is getting a really good picture of how all the pieces fit together and interact, compounded by the continuous stream of updates and changes. I have been hacking at this platform for about 8 months now from the PVE 6 to 7 transition which is where I started to PVE 8 today. I have seen a number of complications including this last one which was triggered by my expansion build out coinciding with PVE8. now that I fully understand how the Host node and all others interact as far as communications and secure sessions I am pretty sure that its a viable platform. The tests I need to do now to satisfy my self of the platform viability are these;

How easy is it to secure while using it as a reverse proxy routed server platform to maximize static IP efficiency with Caddy 2.7.6

Where might there be security issues with segmented user groups using parts of a single physical site cluster

What will have to be done to make this viable as a bridged cluster with physically separated sites on secured fiber backbones. Will is still be manageable from a single location.

Once I have dealt with these issues it will be ready for prime time and commercial services offerings. When I get there I will write up the build and system so others can benefit from my effort. I think I have what I need for my Proof of Concept first site.

I just find it incredible that you got advised the very same thing that is in the docs, which is the very same thing that got you to this situation in the first place, but because it got too complex to reproduce, this so-called advice (with the implication that you must have done something wrong) will live on.

The reason why you were getting those error was that the pvecm updatecerts corrupts known_hosts file. Now of course if you remove those files entirely and then go on to run pvecm updatecerts, there's nothing to corrupt so it goes on to add the missing entries and it will go working, but that's the very point of why threads like yours appear and reappear and go on to live since 10 years.

Meanwhile filed bug reports get ignored, available patches got ignored and staff is (some knowingly, Dominik probably not) cajoling unsuspecting users that all is well (or something was wrong between the keyboard and the screen), but yes otherwise doing a good PR job. Instead of editing 5 lines of code for everyone, pretty scary realisation of how the PR effort works.

esi_y · Dec 16, 2023

If anyone follows this thread in the future, please see also: https://forum.proxmox.com/threads/incorrect-docs-on-node-removal.138159/

Search

Search

VM noVNC failures traceable to machines built from 8.1-1 November 23 ISO

dcsapak

Proxmox Staff Member

TimRyan

Member

esi_y

Renowned Member

TimRyan

Member

TimRyan

Member

TimRyan

Member

esi_y

Renowned Member

esi_y

Renowned Member

We value your privacy