vnc does not work in cluster

One thing here, there's no "main" node, even if it was the first node you used to create a cluster and others then "joined" in, they are all equal. The only thing is, if you are on one node (accessing it's GUI), for instance Auriga, and you want to VNC to some VM running on Yautja it will have to proxy through that node, which is why I thought originally it must have to do with the SSH.

But you are saying that from the GUI of Auriga, you actually can see the SHELL of e.g. Yautja or Newton just fine? Just the VMs access is toast?
Exactly!
as you'll see in the log, it does try to use the IP assigned to Yautja which is *.69, Newton's is *.155 and Auriga is *.166
syslog to newton:

Code:
Nov 21 13:23:26 newton sshd[1182281]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 21 13:23:27 newton sshd[1182281]: Received disconnect from 192.168.69.69 port 33974:11: disconnected by user
Nov 21 13:23:27 newton sshd[1182281]: Disconnected from user root 192.168.69.69 port 33974
Nov 21 13:23:27 newton sshd[1182281]: pam_unix(sshd:session): session closed for user root
Nov 21 13:23:27 newton systemd[1]: session-47.scope: Deactivated successfully.
Nov 21 13:23:27 newton systemd[1]: session-47.scope: Consumed 1.316s CPU time.
Nov 21 13:23:27 newton systemd-logind[1446]: Session 47 logged out. Waiting for processes to exit.
Nov 21 13:23:27 newton systemd-logind[1446]: Removed session 47.
Nov 21 13:23:27 newton pmxcfs[1838]: [status] notice: received log
Nov 21 13:23:37 newton systemd[1]: Stopping user@0.service - User Manager for UID 0...
Nov 21 13:23:37 newton systemd[1182284]: Activating special unit exit.target...
Nov 21 13:23:37 newton systemd[1182284]: Stopped target default.target - Main User Target.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target basic.target - Basic System.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target paths.target - Paths.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target sockets.target - Sockets.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target timers.target - Timers.
Nov 21 13:23:37 newton systemd[1182284]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Nov 21 13:23:37 newton systemd[1182284]: Removed slice app.slice - User Application Slice.
Nov 21 13:23:37 newton systemd[1182284]: Reached target shutdown.target - Shutdown.
Nov 21 13:23:37 newton systemd[1182284]: Finished systemd-exit.service - Exit the Session.
Nov 21 13:23:37 newton systemd[1182284]: Reached target exit.target - Exit the Session.
Nov 21 13:23:37 newton systemd[1]: user@0.service: Deactivated successfully.
Nov 21 13:23:37 newton systemd[1]: Stopped user@0.service - User Manager for UID 0.
Nov 21 13:23:37 newton systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Nov 21 13:23:37 newton systemd[1]: run-user-0.mount: Deactivated successfully.
Nov 21 13:23:37 newton systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Nov 21 13:23:37 newton systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Nov 21 13:23:37 newton systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
Nov 21 13:23:37 newton systemd[1]: user-0.slice: Consumed 1.449s CPU time.
Nov 21 13:36:21 newton pmxcfs[1838]: [status] notice: received log
Nov 21 13:42:21 newton pmxcfs[1838]: [dcdb] notice: data verification successful
Nov 21 13:51:29 newton pmxcfs[1838]: [status] notice: received log
Nov 21 13:53:28 newton pvedaemon[1095257]: worker exit
Nov 21 13:53:28 newton pvedaemon[2158]: worker 1095257 finished
Nov 21 13:53:28 newton pvedaemon[2158]: starting 1 worker(s)
Nov 21 13:53:28 newton pvedaemon[2158]: worker 1216161 started
Nov 21 13:54:31 newton pveproxy[1152931]: worker exit
Nov 21 13:54:31 newton pveproxy[2240]: worker 1152931 finished
Nov 21 13:54:31 newton pveproxy[2240]: starting 1 worker(s)
Nov 21 13:54:31 newton pveproxy[2240]: worker 1217235 started
 
Hang on a second. :) For me to tidy it up in my mind too. I looked back at the screenshots, the TASK screenshot was from Yautja, the task was to VNC into VM on Newton, the syslog now you are showing is from Newton. Can you check what's in the syslog for this time in Yautja?
 
Also, the syslog on Newton would be interesting what happened before:
Received disconnect from 192.168.69.69 port 33974:11: disconnected by user
 
Hang on a second. :) For me to tidy it up in my mind too. I looked back at the screenshots, the TASK screenshot was from Yautja, the task was to VNC into VM on Newton, the syslog now you are showing is from Newton. Can you check what's in the syslog for this time in Yautja?
this is Yautja's output when I try to access newton's VM VNC


Code:
Nov 21 14:16:05 yautja pvedaemon[672252]: starting vnc proxy UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:05 yautja pvedaemon[1479]: <root@pam> starting task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:07 yautja pveproxy[663129]: Use of uninitialized value $statuscode in concatenation (.) or string at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 648.
Nov 21 14:16:07 yautja pvedaemon[1479]: <root@pam> end task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam: OK
 
this is Yautja's output when I try to access newton's VM VNC


Code:
Nov 21 14:16:05 yautja pvedaemon[672252]: starting vnc proxy UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:05 yautja pvedaemon[1479]: <root@pam> starting task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:07 yautja pveproxy[663129]: Use of uninitialized value $statuscode in concatenation (.) or string at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 648.
Nov 21 14:16:07 yautja pvedaemon[1479]: <root@pam> end task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam: OK

This is weird, it basically means there's no statuscode of that task available (I suppose) at the moment in time it wants to show it. All nodes are on PVE 7.4-17 ?

What about the log on Newton, anything interesting before the actual received disconnect?
 
This is weird, it basically means there's no statuscode of that task available (I suppose) at the moment in time it wants to show it. All nodes are on PVE 7.4-17 ?
Yautja is older 7.4, rest are v8.
Same thing on newton log

Code:
 21 14:35:27 newton systemd[1]: Started user@0.service - User Manager for UID 0.
Nov 21 14:35:27 newton systemd[1]: Started session-52.scope - Session 52 of User root.
Nov 21 14:35:27 newton sshd[1262974]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 21 14:35:29 newton sshd[1262974]: Received disconnect from 192.168.69.69 port 44702:11: disconnected by user
Nov 21 14:35:29 newton sshd[1262974]: Disconnected from user root 192.168.69.69 port 44702
Nov 21 14:35:29 newton sshd[1262974]: pam_unix(sshd:session): session closed for user root
Nov 21 14:35:29 newton systemd[1]: session-52.scope: Deactivated successfully.
Nov 21 14:35:29 newton systemd[1]: session-52.scope: Consumed 1.328s CPU time.
Nov 21 14:35:29 newton systemd-logind[1446]: Session 52 logged out. Waiting for processes to exit.
Nov 21 14:35:29 newton systemd-logind[1446]: Removed session 52.
Nov 21 14:35:29 newton pmxcfs[1838]: [status] notice: received log
Nov 21 14:35:39 newton systemd[1]: Stopping user@0.service - User Manager for UID 0...
Nov 21 14:35:39 newton systemd[1262977]: Activating special unit exit.target...
Nov 21 14:35:39 newton systemd[1262977]: Stopped target default.target - Main User Target.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target basic.target - Basic System.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target paths.target - Paths.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target sockets.target - Sockets.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target timers.target - Timers.
Nov 21 14:35:39 newton systemd[1262977]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Nov 21 14:35:39 newton systemd[1262977]: Removed slice app.slice - User Application Slice.
Nov 21 14:35:39 newton systemd[1262977]: Reached target shutdown.target - Shutdown.
Nov 21 14:35:39 newton systemd[1262977]: Finished systemd-exit.service - Exit the Session.
Nov 21 14:35:39 newton systemd[1262977]: Reached target exit.target - Exit the Session.
Nov 21 14:35:39 newton systemd[1]: user@0.service: Deactivated successfully.
Nov 21 14:35:39 newton systemd[1]: Stopped user@0.service - User Manager for UID 0.
Nov 21 14:35:39 newton systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Nov 21 14:35:39 newton systemd[1]: run-user-0.mount: Deactivated successfully.
Nov 21 14:35:39 newton systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Nov 21 14:35:39 newton systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Nov 21 14:35:39 newton systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
Nov 21 14:35:39 newton systemd[1]: user-0.slice: Consumed 1.459s CPU time.
Nov 21 14:35:52 newton pveproxy[1178257]: worker exit
Nov 21 14:35:52 newton pveproxy[2240]: worker 1178257 finished
Nov 21 14:35:52 newton pveproxy[2240]: starting 1 worker(s)
Nov 21 14:35:52 newton pveproxy[2240]: worker 1263755 started
 
Yautja is older 7.4, rest are v8.

So I suppose at some point the other two at some point you upgraded ( or you joined them first time fresh already as v8)?

Same thing on newton log

Code:
 21 14:35:27 newton systemd[1]: Started user@0.service - User Manager for UID 0.
Nov 21 14:35:27 newton systemd[1]: Started session-52.scope - Session 52 of User root.
Nov 21 14:35:27 newton sshd[1262974]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 21 14:35:29 newton sshd[1262974]: Received disconnect from 192.168.69.69 port 44702:11: disconnected by user
Nov 21 14:35:29 newton sshd[1262974]: Disconnected from user root 192.168.69.69 port 44702
Nov 21 14:35:29 newton sshd[1262974]: pam_unix(sshd:session): session closed for user root

...

Nov 21 14:35:52 newton pveproxy[1178257]: worker exit
Nov 21 14:35:52 newton pveproxy[2240]: worker 1178257 finished
Nov 21 14:35:52 newton pveproxy[2240]: starting 1 worker(s)
Nov 21 14:35:52 newton pveproxy[2240]: worker 1263755 started

:D Man I always try to match the times to see what happened on one machine and at the same time what happened on the other, this is from 14:35 so I assume you tried to connect again and caught the newton part again. But okay, whatever is happening, it's likely the same event.

Have you at any time been changing the IP addresses or names of the nodes (ever, since first creating the cluster)?
 
Ok, how about you check this - not using Yautja (as it is the only one on 7.4 and might have some older code not showing us more detailed error output) ... can you load up GUI of e.g. Auriga ... and try to VNC from there into a VM on Newton?
 
So I suppose at some point the other two at some point you upgraded ( or you joined them first time fresh already as v8)?



:D Man I always try to match the times to see what happened on one machine and at the same time what happened on the other, this is from 14:35 so I assume you tried to connect again and caught the newton part again. But okay, whatever is happening, it's likely the same event.

Have you at any time been changing the IP addresses or names of the nodes (ever, since first creating the cluster)?
The two other nodes are newly built servers that have been clustered with this older one. Should've mayne upgraded that one beforehand lol.
I havent changed IP since clustering.`

Ok, how about you check this - not using Yautja (as it is the only one on 7.4 and might have some older code not showing us more detailed error output) ... can you load up GUI of e.g. Auriga ... and try to VNC from there into a VM on Newton?
Unfortunately I can't do that from work atm as I haven't set DNS for the other two to access them remotely. I will try this when I get home, it's actually a good shout haha
 
this is Yautja's output when I try to access newton's VM VNC


Code:
Nov 21 14:16:05 yautja pvedaemon[672252]: starting vnc proxy UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:05 yautja pvedaemon[1479]: <root@pam> starting task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:07 yautja pveproxy[663129]: Use of uninitialized value $statuscode in concatenation (.) or string at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 648.


NOTE TO SELF: So this should have been "$self->dprint("websocket received close. status code: '$statuscode'");"
https://github.com/proxmox/pve-http...0e787dac7a/src/PVE/APIServer/AnyEvent.pm#L648

Code:
Nov 21 14:16:07 yautja pvedaemon[1479]: <root@pam> end task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam: OK
 
Last edited:
The two other nodes are newly built servers that have been clustered with this older one. Should've mayne upgraded that one beforehand lol.
I havent changed IP since clustering.`


Unfortunately I can't do that from work atm as I haven't set DNS for the other two to access them remotely. I will try this when I get home, it's actually a good shout haha

No worries, let's do that when you can. Because the error (actually there's no status code at that point to show which would have been able to tell us more) has to do with how the proxy connections go (or not) and they definitely from the one you access the GUI on to the one where the VM is running on. When you always access from Yautja, if we have issue just there we can't rule it out. Also ... since there have been some code changes done in this proxy code between 7.4 and 8 if the error happens on between the other two as well, then we are more likely to see proper error output.
 
Unfortunately I can't do that from work atm as I haven't set DNS for the other two to access them remotely. I will try this when I get home, it's actually a good shout haha
Just a second here, are you saying you are accessing the Yautja GUI across the internet (no ipsec, no tunnels) or via reverse proxy / cloudflare etc?

(I won't go on this thread if that's a good idea or in what setup it is. ;))

But did you have this same issue when locally connected to Yautja (from the same network)?
 
I would be curious to see if it works from e.g. Auriga to Newton, post the result. If that is working, then of course your primary suspect is the one odd node running different version.

I did not find anything documented in terms of clustering compatibility between 7 and 8:
https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

That said, in this sort of situation and if I am the developer, I would not guarantee anything. :) So your next step would be to have all three nodes same version. If your migrations work, I suppose you might migrate the VMs away to the "new" nodes and then upgrade. If you were to reinstall, I would avoiding reusing names and even IPs (given the known issues) for that "new" cluster node being added.

If this was "all on v8" situation, my shot in the dark would be - given the proxy failing (detailed error unknown at this point) to run pvecm updatecerts -f (the -f regenerates the SSL certificates) on all nodes, but since I am aware of the other issue (we thought this one originally was just that), I would be careful about that because it might go break otherwise working migrations (which you want before upgrade).
 
I would be curious to see if it works from e.g. Auriga to Newton, post the result. If that is working, then of course your primary suspect is the one odd node running different version.
Yeah, just tested this out. Same result from Auriga -> Newton/Yautja or from Newton > Auriga/Yautja.

I'll get started on clearing it again and give that another try. Fingers crossed.
 
Yeah, just tested this out. Same result from Auriga -> Newton/Yautja or from Newton > Auriga/Yautja.

I'll get started on clearing it again and give that another try. Fingers crossed.
But was the same in the syslog? I hoped the v8 gets better error message between two v8 nodes.
 
Hello all,

I've noticed the same issue with accessing a guest console for any VM's (novnc) from within the GUI - this only works if I am logged in to the respective host' UI console ( I can then view the vnc console for the guest on that host) otherwise for any other hosts in the cluster I get the same error, failed to connect to server.

Digging a bit further in both chrome and firefox, I opened the browsers console (control + shift + i) and then in the proxmox web gui trying to access any of the vm's novnc consoles, I did notice in both browsers, their is an error indicating "failed when connecting: invalid server version - see screenshot) all host are updated to proxmox 8.

Happy to provide any additional info that may assist with further troubleshooting.

Ren
 

Attachments

  • novnc-error.png
    novnc-error.png
    595.4 KB · Views: 8
Hello all,

I've noticed the same issue with accessing a guest console for any VM's (novnc) from within the GUI - this only works if I am logged in to the respective host' UI console ( I can then view the vnc console for the guest on that host) otherwise for any other hosts in the cluster I get the same error, failed to connect to server.
Hey, this is interesting. It looks like it's the same issue, can you also check your TASK output and syslogs when this is happening?
 
Last edited:
Digging a bit further in both chrome and firefox, I opened the browsers console (control + shift + i) and then in the proxmox web gui trying to access any of the vm's novnc consoles, I did notice in both browsers, their is an error indicating "failed when connecting: invalid server version - see screenshot) all host are updated to proxmox 8.
Do you happen to have any changes to the default ~/.bashrc?
 
Do you happen to have any changes to the default ~/.bashrc?
yes, actually I did add the following to each hosts .bashrc file at the bottom:

clear
neofetch

- Its just to show basic stats for each host when I ssh in - could this be causing the issue? I'll comment out and retry!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!