[SOLVED] Proxmox VE Cluster: noVNC console does not work for other servers

Jan 7, 2022
9
2
8
Good Nighttime,
I've set up cluster with three proxmox ve servers and joined them as a datacenter. to secure ssh I've restricted the root login via ssh with the following setting

Code:
PermitRootLogin no
in combination with
Code:
Match Address 10.0.0.0/24
    PermitRootLogin yes
    PasswordAuthentication yes

As a result I'm unable to use the noVNC console on the other servers, however I'm able ssh'ing on the console as root from one to another host.

My questions here is how I've to adjust the setting so noVNC would work over two nodes.

Thank you in advance

Solution:
  • The proxmox services revolves the hostname on each node itself
  • each node itself had configured both fqdn and hostname for external ip
  • Proxmox always uses the external ip.
  • This issue can be prevented by setting the ip addresses explicitly: see here
 
Last edited:
do you have multiple addresses for your nodes? which one does the hostname resolve to? likely PVE doesn't use an address from your matched subnet..
 
PVE uses the resolved hostname as ssh connection target. e.g. what the followng will return:

perl -e 'use strict; use warnings; use PVE::Cluster; my $ip = PVE::Cluster::remote_node_ip("HOST"); print "$ip\n";' (replace HOST with actual target hostame)
 
Thanks for your prompt reply. I'm getting here the local ip within the 10.0.0.0/24 subnet for the hostname and the fqdn
 
could you post the full sshd config and the exact error you get? any log messages visible on either node when you attempt to open the console?
 
the /etc/ssh/sshd_config is the following
Code:
PermitRootLogin no
ChallengeResponseAuthentication no
UsePAM yes
X11Forwarding yes
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
AcceptEnv LANG LC_*
Subsystem    sftp    /usr/lib/openssh/sftp-server

Match Address 10.0.0.0/24
    PermitRootLogin yes
    PasswordAuthentication yes


The issue is in noVNC "Failure to connect to server".

in the syslog on host1
Code:
Jpvedaemon[14064]: starting vnc proxy UPID::000036F0:0B19F30A:61F26125:vncproxy:163:root@pam:
pvedaemon[22069]: <root@pam> starting task UPID::000036F0:0B19F30A:61F26125:vncproxy:163:root@pam:
pvedaemon[14064]: Failed to run vncproxy.
pvedaemon[22069]: <root@pam> end task UPID::000036F0:0B19F30A:61F26125:vncproxy:163:root@pam: Failed to run vncproxy.

on host2
Code:
sshd[3021]: ROOT LOGIN REFUSED FROM $EXTERNAL_IP_HOST1 port 36204
sshd[3021]: ROOT LOGIN REFUSED FROM $EXTERNAL_IP_HOST1 port 36204 [preauth]
sshd[3021]: Connection closed by authenticating user root $EXTERNAL_IP_HOST1 port 36204 [preauth]
 
sounds like it does use the external IP for some reason (any routing or ssh client config peculiarities that might explain it?).. you could try dumping the full command by adding

Code:
use Data::Dumper;
warn Dumper($cmd), "\n";

before the run_command here: https://git.proxmox.com/?p=qemu-ser...6af48cc51d04ac408895d01d9f9594a;hb=HEAD#l1904 (in /usr/share/perl5/PVE/API2/Qemu.pm) and reloading pveproxy/pvedaemon afterwards (systemctl reload pveproxy pvedaemon). re-installing qemu-server will revert to the stock code again (apt install --reinstall qemu-server).
 
sounds like it does use the external IP for some reason (any routing or ssh client config peculiarities that might explain it?).. you could try dumping the full command by adding
There is no special routing. I've rebooted the server already to clean any caches.

Regarding the dump: the ssh command itself looks fine, but there is the external IP in it.

Code:
          '/usr/bin/ssh',
          '-e',
          'none',
          '-o',
          'BatchMode=yes',
          '-o',
          'HostKeyAlias=host2',
          '-T',
          'root@EXTERNAL_IP',
          '/usr/sbin/qm',
          'vncproxy',
          '167'
        ];
 
could you try this updated command:

perl -e 'use strict; use warnings; use PVE::Cluster; PVE::Cluster::cfs_update(); my $ip = PVE::Cluster::remote_node_ip("HOST"); print "$ip\n";'
 
I guess that means your hostname doesn't resolve to the internal, but the external IP.. if you change that (and possibly restart pve-cluster) it should work.
 
Yeah, but on host level it resolves correctly. I've restarted pve cluster explicitly and as a result I'm unable to use the webconsole too, since it connects to the external ip.

the updated per command shows in both cases the local ip.
 
Thanks for your answer - I think I've the problem, since each host resolves the ip on its own* and then it's joined, the /etc/hosts is bascially ignored…

* hostname -i returns an external ip
 
  • Like
Reactions: fabian
After some investigation, I've resolved the issue the following way:
  • explicit mappings in /etc/hosts for each local node with the local address
  • explicit mappings for fqdn in /etc/hosts for all nodes
  • distribution of /etc/hosts on all nodes
  • restarting pve-cluster.service on all nodes
 
  • Like
Reactions: fabian
I know this thread is a little older.
But I stumbled upon this problem as well after changing my /etc/ssh/sshd_config.
I fixed it by checking the output of the command /usr/bin/ssh -e none -T -o BatchMode=yes <IP of other VM> /usr/sbin/qm vncproxy <id of vm on that machine>
It returned, Host key verification failed..

So I checked the file /root/.ssh/known_hosts.
This file had all the correct entries, but it was ignored.
After trying to ssh directly to one of the nodes, I noticed that the file /etc/ssh/ssh_known_hosts was used instead of the file /root/.ssh/known_hosts.
I made sure that the entries in the file /root/.ssh/known_hosts were indeed correct and that it included all the entries for all nodes of the cluster. This also includes the node where the file is stored.

I then fixed it by copying the file /root/.ssh/known_hosts to /etc/ssh/ssh_known_hosts and then distributed that file to all the other nodes with rsync.
rsync -avh --info=progress2 /etc/ssh/ssh_known_hosts <IP of proxmox-node>:/etc/ssh/ssh_known_hosts
rsync -avh --info=progress2 /etc/ssh/ssh_known_hosts <IP of proxmox-node>:/root/.ssh/known_hosts

Then I finally restarted the service "sshd" by using systemctl restart sshd.service on every node and I ran the first command again.
Now I got the correct output and when checking the web gui on every node, everything worked normally.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!