vnc does not work in cluster

sherbmeister · Nov 21, 2023

tempacc375924 said:
One thing here, there's no "main" node, even if it was the first node you used to create a cluster and others then "joined" in, they are all equal. The only thing is, if you are on one node (accessing it's GUI), for instance Auriga, and you want to VNC to some VM running on Yautja it will have to proxy through that node, which is why I thought originally it must have to do with the SSH.

tempacc375924 said:

But you are saying that from the GUI of Auriga, you actually can see the SHELL of e.g. Yautja or Newton just fine? Just the VMs access is toast?

Click to expand...

Exactly!

as you'll see in the log, it does try to use the IP assigned to Yautja which is *.69, Newton's is *.155 and Auriga is *.166
syslog to newton:

Code:

Nov 21 13:23:26 newton sshd[1182281]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 21 13:23:27 newton sshd[1182281]: Received disconnect from 192.168.69.69 port 33974:11: disconnected by user
Nov 21 13:23:27 newton sshd[1182281]: Disconnected from user root 192.168.69.69 port 33974
Nov 21 13:23:27 newton sshd[1182281]: pam_unix(sshd:session): session closed for user root
Nov 21 13:23:27 newton systemd[1]: session-47.scope: Deactivated successfully.
Nov 21 13:23:27 newton systemd[1]: session-47.scope: Consumed 1.316s CPU time.
Nov 21 13:23:27 newton systemd-logind[1446]: Session 47 logged out. Waiting for processes to exit.
Nov 21 13:23:27 newton systemd-logind[1446]: Removed session 47.
Nov 21 13:23:27 newton pmxcfs[1838]: [status] notice: received log
Nov 21 13:23:37 newton systemd[1]: Stopping user@0.service - User Manager for UID 0...
Nov 21 13:23:37 newton systemd[1182284]: Activating special unit exit.target...
Nov 21 13:23:37 newton systemd[1182284]: Stopped target default.target - Main User Target.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target basic.target - Basic System.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target paths.target - Paths.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target sockets.target - Sockets.
Nov 21 13:23:37 newton systemd[1182284]: Stopped target timers.target - Timers.
Nov 21 13:23:37 newton systemd[1182284]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Nov 21 13:23:37 newton systemd[1182284]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Nov 21 13:23:37 newton systemd[1182284]: Removed slice app.slice - User Application Slice.
Nov 21 13:23:37 newton systemd[1182284]: Reached target shutdown.target - Shutdown.
Nov 21 13:23:37 newton systemd[1182284]: Finished systemd-exit.service - Exit the Session.
Nov 21 13:23:37 newton systemd[1182284]: Reached target exit.target - Exit the Session.
Nov 21 13:23:37 newton systemd[1]: user@0.service: Deactivated successfully.
Nov 21 13:23:37 newton systemd[1]: Stopped user@0.service - User Manager for UID 0.
Nov 21 13:23:37 newton systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Nov 21 13:23:37 newton systemd[1]: run-user-0.mount: Deactivated successfully.
Nov 21 13:23:37 newton systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Nov 21 13:23:37 newton systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Nov 21 13:23:37 newton systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
Nov 21 13:23:37 newton systemd[1]: user-0.slice: Consumed 1.449s CPU time.
Nov 21 13:36:21 newton pmxcfs[1838]: [status] notice: received log
Nov 21 13:42:21 newton pmxcfs[1838]: [dcdb] notice: data verification successful
Nov 21 13:51:29 newton pmxcfs[1838]: [status] notice: received log
Nov 21 13:53:28 newton pvedaemon[1095257]: worker exit
Nov 21 13:53:28 newton pvedaemon[2158]: worker 1095257 finished
Nov 21 13:53:28 newton pvedaemon[2158]: starting 1 worker(s)
Nov 21 13:53:28 newton pvedaemon[2158]: worker 1216161 started
Nov 21 13:54:31 newton pveproxy[1152931]: worker exit
Nov 21 13:54:31 newton pveproxy[2240]: worker 1152931 finished
Nov 21 13:54:31 newton pveproxy[2240]: starting 1 worker(s)
Nov 21 13:54:31 newton pveproxy[2240]: worker 1217235 started

tempacc375924 · Nov 21, 2023

Hang on a second.

For me to tidy it up in my mind too. I looked back at the screenshots, the TASK screenshot was from Yautja, the task was to VNC into VM on Newton, the syslog now you are showing is from Newton. Can you check what's in the syslog for this time in Yautja?

tempacc375924 · Nov 21, 2023

Also, the syslog on Newton would be interesting what happened before:


Received disconnect from 192.168.69.69 port 33974:11: disconnected by user

sherbmeister · Nov 21, 2023

tempacc375924 said:
Hang on a second. For me to tidy it up in my mind too. I looked back at the screenshots, the TASK screenshot was from Yautja, the task was to VNC into VM on Newton, the syslog now you are showing is from Newton. Can you check what's in the syslog for this time in Yautja?

this is Yautja's output when I try to access newton's VM VNC

Code:

Nov 21 14:16:05 yautja pvedaemon[672252]: starting vnc proxy UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:05 yautja pvedaemon[1479]: <root@pam> starting task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:07 yautja pveproxy[663129]: Use of uninitialized value $statuscode in concatenation (.) or string at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 648.
Nov 21 14:16:07 yautja pvedaemon[1479]: <root@pam> end task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam: OK

tempacc375924 · Nov 21, 2023

sherbmeister said:

this is Yautja's output when I try to access newton's VM VNC

Code:

Nov 21 14:16:05 yautja pvedaemon[672252]: starting vnc proxy UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:05 yautja pvedaemon[1479]: <root@pam> starting task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:07 yautja pveproxy[663129]: Use of uninitialized value $statuscode in concatenation (.) or string at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 648.
Nov 21 14:16:07 yautja pvedaemon[1479]: <root@pam> end task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam: OK

This is weird, it basically means there's no statuscode of that task available (I suppose) at the moment in time it wants to show it. All nodes are on PVE 7.4-17 ?

What about the log on Newton, anything interesting before the actual received disconnect?

sherbmeister · Nov 21, 2023

tempacc375924 said:
This is weird, it basically means there's no statuscode of that task available (I suppose) at the moment in time it wants to show it. All nodes are on PVE 7.4-17 ?

Yautja is older 7.4, rest are v8.
Same thing on newton log

Code:

 21 14:35:27 newton systemd[1]: Started user@0.service - User Manager for UID 0.
Nov 21 14:35:27 newton systemd[1]: Started session-52.scope - Session 52 of User root.
Nov 21 14:35:27 newton sshd[1262974]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 21 14:35:29 newton sshd[1262974]: Received disconnect from 192.168.69.69 port 44702:11: disconnected by user
Nov 21 14:35:29 newton sshd[1262974]: Disconnected from user root 192.168.69.69 port 44702
Nov 21 14:35:29 newton sshd[1262974]: pam_unix(sshd:session): session closed for user root
Nov 21 14:35:29 newton systemd[1]: session-52.scope: Deactivated successfully.
Nov 21 14:35:29 newton systemd[1]: session-52.scope: Consumed 1.328s CPU time.
Nov 21 14:35:29 newton systemd-logind[1446]: Session 52 logged out. Waiting for processes to exit.
Nov 21 14:35:29 newton systemd-logind[1446]: Removed session 52.
Nov 21 14:35:29 newton pmxcfs[1838]: [status] notice: received log
Nov 21 14:35:39 newton systemd[1]: Stopping user@0.service - User Manager for UID 0...
Nov 21 14:35:39 newton systemd[1262977]: Activating special unit exit.target...
Nov 21 14:35:39 newton systemd[1262977]: Stopped target default.target - Main User Target.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target basic.target - Basic System.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target paths.target - Paths.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target sockets.target - Sockets.
Nov 21 14:35:39 newton systemd[1262977]: Stopped target timers.target - Timers.
Nov 21 14:35:39 newton systemd[1262977]: Closed dirmngr.socket - GnuPG network certificate management daemon.
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent-browser.socket - GnuPG cryptographic agent and passphrase cache (access for web browsers).
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent-extra.socket - GnuPG cryptographic agent and passphrase cache (restricted).
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent-ssh.socket - GnuPG cryptographic agent (ssh-agent emulation).
Nov 21 14:35:39 newton systemd[1262977]: Closed gpg-agent.socket - GnuPG cryptographic agent and passphrase cache.
Nov 21 14:35:39 newton systemd[1262977]: Removed slice app.slice - User Application Slice.
Nov 21 14:35:39 newton systemd[1262977]: Reached target shutdown.target - Shutdown.
Nov 21 14:35:39 newton systemd[1262977]: Finished systemd-exit.service - Exit the Session.
Nov 21 14:35:39 newton systemd[1262977]: Reached target exit.target - Exit the Session.
Nov 21 14:35:39 newton systemd[1]: user@0.service: Deactivated successfully.
Nov 21 14:35:39 newton systemd[1]: Stopped user@0.service - User Manager for UID 0.
Nov 21 14:35:39 newton systemd[1]: Stopping user-runtime-dir@0.service - User Runtime Directory /run/user/0...
Nov 21 14:35:39 newton systemd[1]: run-user-0.mount: Deactivated successfully.
Nov 21 14:35:39 newton systemd[1]: user-runtime-dir@0.service: Deactivated successfully.
Nov 21 14:35:39 newton systemd[1]: Stopped user-runtime-dir@0.service - User Runtime Directory /run/user/0.
Nov 21 14:35:39 newton systemd[1]: Removed slice user-0.slice - User Slice of UID 0.
Nov 21 14:35:39 newton systemd[1]: user-0.slice: Consumed 1.459s CPU time.
Nov 21 14:35:52 newton pveproxy[1178257]: worker exit
Nov 21 14:35:52 newton pveproxy[2240]: worker 1178257 finished
Nov 21 14:35:52 newton pveproxy[2240]: starting 1 worker(s)
Nov 21 14:35:52 newton pveproxy[2240]: worker 1263755 started

tempacc375924 · Nov 21, 2023

sherbmeister said:
Yautja is older 7.4, rest are v8.

So I suppose at some point the other two at some point you upgraded ( or you joined them first time fresh already as v8)?

sherbmeister said:

Same thing on newton log

Code:

 21 14:35:27 newton systemd[1]: Started user@0.service - User Manager for UID 0.
Nov 21 14:35:27 newton systemd[1]: Started session-52.scope - Session 52 of User root.
Nov 21 14:35:27 newton sshd[1262974]: pam_env(sshd:session): deprecated reading of user environment enabled
Nov 21 14:35:29 newton sshd[1262974]: Received disconnect from 192.168.69.69 port 44702:11: disconnected by user
Nov 21 14:35:29 newton sshd[1262974]: Disconnected from user root 192.168.69.69 port 44702
Nov 21 14:35:29 newton sshd[1262974]: pam_unix(sshd:session): session closed for user root

...

Nov 21 14:35:52 newton pveproxy[1178257]: worker exit
Nov 21 14:35:52 newton pveproxy[2240]: worker 1178257 finished
Nov 21 14:35:52 newton pveproxy[2240]: starting 1 worker(s)
Nov 21 14:35:52 newton pveproxy[2240]: worker 1263755 started

Man I always try to match the times to see what happened on one machine and at the same time what happened on the other, this is from 14:35 so I assume you tried to connect again and caught the newton part again. But okay, whatever is happening, it's likely the same event.

Have you at any time been changing the IP addresses or names of the nodes (ever, since first creating the cluster)?

tempacc375924 · Nov 21, 2023

Ok, how about you check this - not using Yautja (as it is the only one on 7.4 and might have some older code not showing us more detailed error output) ... can you load up GUI of e.g. Auriga ... and try to VNC from there into a VM on Newton?

sherbmeister · Nov 21, 2023

tempacc375924 said:
So I suppose at some point the other two at some point you upgraded ( or you joined them first time fresh already as v8)?

Man I always try to match the times to see what happened on one machine and at the same time what happened on the other, this is from 14:35 so I assume you tried to connect again and caught the newton part again. But okay, whatever is happening, it's likely the same event.

Have you at any time been changing the IP addresses or names of the nodes (ever, since first creating the cluster)?

The two other nodes are newly built servers that have been clustered with this older one. Should've mayne upgraded that one beforehand lol.
I havent changed IP since clustering.`

tempacc375924 said:
Ok, how about you check this - not using Yautja (as it is the only one on 7.4 and might have some older code not showing us more detailed error output) ... can you load up GUI of e.g. Auriga ... and try to VNC from there into a VM on Newton?

Unfortunately I can't do that from work atm as I haven't set DNS for the other two to access them remotely. I will try this when I get home, it's actually a good shout haha

tempacc375924 · Nov 21, 2023

sherbmeister said:

this is Yautja's output when I try to access newton's VM VNC

Code:

Nov 21 14:16:05 yautja pvedaemon[672252]: starting vnc proxy UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:05 yautja pvedaemon[1479]: <root@pam> starting task UPID:yautja:000A41FC:007BF3D0:655CBBA5:vncproxy:501:root@pam:
Nov 21 14:16:07 yautja pveproxy[663129]: Use of uninitialized value $statuscode in concatenation (.) or string at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 648.

NOTE TO SELF: So this should have been "$self->dprint("websocket received close. status code: '$statuscode'");"
https://github.com/proxmox/pve-http...0e787dac7a/src/PVE/APIServer/AnyEvent.pm#L648

sherbmeister said:

tempacc375924 · Nov 21, 2023

sherbmeister said:
The two other nodes are newly built servers that have been clustered with this older one. Should've mayne upgraded that one beforehand lol.
I havent changed IP since clustering.`

Unfortunately I can't do that from work atm as I haven't set DNS for the other two to access them remotely. I will try this when I get home, it's actually a good shout haha

No worries, let's do that when you can. Because the error (actually there's no status code at that point to show which would have been able to tell us more) has to do with how the proxy connections go (or not) and they definitely from the one you access the GUI on to the one where the VM is running on. When you always access from Yautja, if we have issue just there we can't rule it out. Also ... since there have been some code changes done in this proxy code between 7.4 and 8 if the error happens on between the other two as well, then we are more likely to see proper error output.

tempacc375924 · Nov 21, 2023

sherbmeister said:
Unfortunately I can't do that from work atm as I haven't set DNS for the other two to access them remotely. I will try this when I get home, it's actually a good shout haha

Just a second here, are you saying you are accessing the Yautja GUI across the internet (no ipsec, no tunnels) or via reverse proxy / cloudflare etc?

(I won't go on this thread if that's a good idea or in what setup it is.

)

But did you have this same issue when locally connected to Yautja (from the same network)?

sherbmeister · Nov 21, 2023

tempacc375924 said:
Just a second here, are you saying you are accessing the Yautja GUI across the internet (no ipsec, no tunnels) or via reverse proxy / cloudflare etc?

It goes through cloudflare, yes.

tempacc375924 said:
But did you have this same issue when locally connected to Yautja (from the same network)?

Same issue locally. That's when I started this thread hehe

tempacc375924 · Nov 21, 2023

I would be curious to see if it works from e.g. Auriga to Newton, post the result. If that is working, then of course your primary suspect is the one odd node running different version.

I did not find anything documented in terms of clustering compatibility between 7 and 8:
https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

That said, in this sort of situation and if I am the developer, I would not guarantee anything.

So your next step would be to have all three nodes same version. If your migrations work, I suppose you might migrate the VMs away to the "new" nodes and then upgrade. If you were to reinstall, I would avoiding reusing names and even IPs (given the known issues) for that "new" cluster node being added.

If this was "all on v8" situation, my shot in the dark would be - given the proxy failing (detailed error unknown at this point) to run pvecm updatecerts -f (the -f regenerates the SSL certificates) on all nodes, but since I am aware of the other issue (we thought this one originally was just that), I would be careful about that because it might go break otherwise working migrations (which you want before upgrade).

sherbmeister · Nov 21, 2023

tempacc375924 said:
I would be curious to see if it works from e.g. Auriga to Newton, post the result. If that is working, then of course your primary suspect is the one odd node running different version.

Yeah, just tested this out. Same result from Auriga -> Newton/Yautja or from Newton > Auriga/Yautja.

I'll get started on clearing it again and give that another try. Fingers crossed.

tempacc375924 · Nov 21, 2023

sherbmeister said:
Yeah, just tested this out. Same result from Auriga -> Newton/Yautja or from Newton > Auriga/Yautja.

I'll get started on clearing it again and give that another try. Fingers crossed.

But was the same in the syslog? I hoped the v8 gets better error message between two v8 nodes.

renvirtual · Nov 21, 2023

Hello all,

I've noticed the same issue with accessing a guest console for any VM's (novnc) from within the GUI - this only works if I am logged in to the respective host' UI console ( I can then view the vnc console for the guest on that host) otherwise for any other hosts in the cluster I get the same error, failed to connect to server.

Digging a bit further in both chrome and firefox, I opened the browsers console (control + shift + i) and then in the proxmox web gui trying to access any of the vm's novnc consoles, I did notice in both browsers, their is an error indicating "failed when connecting: invalid server version - see screenshot) all host are updated to proxmox 8.

Happy to provide any additional info that may assist with further troubleshooting.

Ren

tempacc375924 · Nov 21, 2023

renvirtual said:
Hello all,

I've noticed the same issue with accessing a guest console for any VM's (novnc) from within the GUI - this only works if I am logged in to the respective host' UI console ( I can then view the vnc console for the guest on that host) otherwise for any other hosts in the cluster I get the same error, failed to connect to server.

Hey, this is interesting. It looks like it's the same issue, can you also check your TASK output and syslogs when this is happening?

tempacc375924 · Nov 21, 2023

renvirtual said:
Digging a bit further in both chrome and firefox, I opened the browsers console (control + shift + i) and then in the proxmox web gui trying to access any of the vm's novnc consoles, I did notice in both browsers, their is an error indicating "failed when connecting: invalid server version - see screenshot) all host are updated to proxmox 8.

Do you happen to have any changes to the default ~/.bashrc?

renvirtual · Nov 22, 2023

tempacc375924 said:
Do you happen to have any changes to the default ~/.bashrc?

yes, actually I did add the following to each hosts .bashrc file at the bottom:

clear
neofetch

- Its just to show basic stats for each host when I ssh in - could this be causing the issue? I'll comment out and retry!

vnc does not work in cluster

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

New Member

Attachments

Member

Member

New Member