[SOLVED] No VNC (code 1006) and no quorum... how to recover?

soupdiver

Member
Feb 24, 2021
25
1
8
54
I have a delicate situation.

I have a Proxmox server at home and one on a dedicated server in a data center.
I run pfSense inside the remote Proxmox and from there connect to my home pfSense via wireguard.
I also use wireguard tunnel to connect my Proxmox instances as a cluster.
This worked fine ... so far.
Now it seems the whole system is trashed to some degree.

Some days ago I misconfigured a route on the remote pfSense and therefor broke the VPN connection between my home and remote Proxmox instance. This happened before... I just logged into the web GUI, open a shell to my pfSense, fix the config, reboot and be done.

Today I tried to do the same... but nothing works anymore.
I can't open a web shell on any Proxmox host or any VM or container. Always just VNC "undefined code 1006"). Great. I tried on 3 computers with Safari, FF and Brave browser. All the same behavior. The web inspector shows "websocket connection error".

Seems lots of people report that this could have to do something with SSL. Not sure.. it always worked but now suddenly without any package update or config change it's broken.

Someone suggested to run "pvecm updatecerts --force". But I can't do this because I have no quorum. Great.
I can still access web UI on both Proxmox instances. But can't start/stop/restart any VM or container. The VMs that are still running still work fine but I'm totally locked out.

A) Why does the missing quorum block every other operation? I mean can't the node just work on itself and then re-join the cluster? What's the reason for making the whole node unusable?
B) Any idea on how I can recover from this situation? Is there a way that I can open a shell into a VM from the Proxmox host on CLI? Then I could fix the pfSense instance and the quorum should be fine again.

Running version 6.3-4
 
Last edited:
A) Why does the missing quorum block every other operation? I mean can't the node just work on itself and then re-join the cluster? What's the reason for making the whole node unusable?
A PVE cluster works on the majority principle. Since you have 2 node cluster, the majority is lost once the cluster cannot be established.

B) Any idea on how I can recover from this situation? Is there a way that I can open a shell into a VM from the Proxmox host on CLI? Then I could fix the pfSense instance and the quorum should be fine again.
If you can SSH from the PVE node further to the pfSense.
You could also manually configure the VNC for a VM, see https://pve.proxmox.com/wiki/VNC_Client_Access
To access the monitor via the CLI run qm monitor <VMID>

You could also try to set the expected corosync votes to 1 for the time being with pvecm expected 1.

In the future, I would avoid such a delicate setup that depends on services running inside PVE which themselves depend on PVE working.
 
  • Like
Reactions: soupdiver
> You could also try to set the expected corosync votes to 1 for the time being with pvecm expected 1.

Yes that's what I did and afterwards I had my VNC access back. Then I could fix the pfSense as I thought and everything is fine again. Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!