Console timeout (again), in cluster. Bug?

Jan 28, 2025
6
0
1
This seems to be a recurring issue, with many different solutions. But I haven't seen/found this yet:

Situation:
2 fully up to date PVE installations (8.4.14), in a "Datacenter" setup, but without any HA setup. (anyway, HA would require 3 hosts)

Whenever one of the 2 machines gets shut down, and when the sole remaining host starts to complain "Cluster not quorate - extending auth key lifetime!", it is no longer possible to open any console. The Web UI and SSH work, but "shell" and VM "console" web interfaces all give the same error:

Code:
failed waiting for client: timed out
TASK ERROR: command '/usr/bin/termproxy 5901 --path /nodes/pve1 --perm Sys.Console -- /bin/login -f root' failed: exit code 1

Once the other host joins, all is OK again.

Now this seems like a not wanted "feature" to me. When one of the 2 hosts is down due to a problem, especially at that moment, one should maintain full control of the remaining host.

Maybe it is more complicated, but I have been able to reproduce this several times.
 
  • Like
Reactions: BobhWasatch
HA or cluster is not the point here. The point is:

why would anyone need a quorum to use functionality that is limited to 1 machine?

For me, critical and local functionality, like console access to the machine to which you are connected, should be possible irrespective of quorum.

If a quorum is needed for console, then why do you allow web site or SSH access when the quorum is not reached?

By the way, lowering the quorum to 1 doesn't work, even if man pvecm says so:

> pvecm expected 1
Unable to set expected votes: CS_ERR_INVALID_PARAM

https://pve.proxmox.com/pve-docs/pvecm.1.html says:

pvecm expected <expected>
Tells corosync a new value of expected votes.
<expected>: <integer> (1 - N)
 
Last edited:
  • Like
Reactions: UdoB
Sorry to insist:

If a quorum is needed for console, then why is a quorum not needed for the web site or SSH access?

To me, that is exactly the same.

But yes, I will set up a qdevice. Was already doing it.
It would have been great to have the error messages or a help page mention this limitation though.
 
Last edited: