when just few cluster nodes are up - no possibility to use shell at all

AndroGen

New Member
Feb 3, 2022
21
1
3
Germany
Hi, your help is needed, as I am a bit lost in searching for an answer.
The situation is following:
4 modes are in the cluster.
The setting of the nodes says: cluster quorum needs 3 nodes.
(Even when all nodes were up, I could not change quorum parameter to lower number, but this is a topic by itself.)
When just few of the nodes are running - there is no possibility to connect via shell to those nodes, which are up.

Why I cannot connect to the nodes via shell when quorum is not reached?
 
Hi,
if you loose quorum, the cluster will switch into a read-only state https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_quorum
nevertheless, you should be able to ssh into each of your nodes, independent of quorum... check if the nodes can reach each other via `ping` and check the status of corosync for further clues `systemctl status corosync`
 
when I try to connect via Shell dialog i have this error:
Code:
failed waiting for client: timed out
TASK ERROR: command '/usr/bin/termproxy 5900 --path /nodes/pve1002 --perm Sys.Console -- /bin/login -f root' failed: exit code 1

My understanding: shell should be reachable without cluster quorum. - Is this correct?

The rest about read only - my understanding: it is related to the changes on the cluster level, the rest should work e.g. VM creation, VMs part of the functionality should work with no interruption. Do I miss something?

My scenario / use case: only few nodes are up an running permanently - the rest is up only when it is needed. it is home lab when I do not need all machines bein up and running all the time.

Do I need to decrease the quorum number in this case to e.g. 1 ?
 
I've started 2 additional nodes to get to quorum, and Shell starts working again.

The synchronization across nodes seems to be working fine.

Code:
root@pve1002:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2022-12-25 21:05:19 CET; 1 weeks 0 days ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 974 (corosync)
      Tasks: 9 (limit: 18820)
     Memory: 139.8M
        CPU: 1h 27min 55.382s
     CGroup: /system.slice/corosync.service
             └─974 /usr/sbin/corosync -f

Jan 02 14:34:33 pve1002 corosync[974]:   [KNET  ] rx: host: 3 link: 0 is up
Jan 02 14:34:33 pve1002 corosync[974]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Jan 02 14:34:33 pve1002 corosync[974]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Jan 02 14:34:33 pve1002 corosync[974]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Jan 02 14:34:35 pve1002 corosync[974]:   [QUORUM] Sync members[3]: 1 2 3
Jan 02 14:34:35 pve1002 corosync[974]:   [QUORUM] Sync joined[1]: 3
Jan 02 14:34:35 pve1002 corosync[974]:   [TOTEM ] A new membership (1.1ab) was formed. Members joined: 3
Jan 02 14:34:35 pve1002 corosync[974]:   [QUORUM] This node is within the primary component and will provide service.
Jan 02 14:34:35 pve1002 corosync[974]:   [QUORUM] Members[3]: 1 2 3
Jan 02 14:34:35 pve1002 corosync[974]:   [MAIN  ] Completed service synchronization, ready to provide service.

Back to my initial question:
Why Shell is not working when cluster quorum is not reached?
 
Why Shell is not working when cluster quorum is not reached?
It seems that by "shell" you mean GUI applet that opens a terminal window inside the browser.
As was mentioned before, when there is no quorum and parts of the system are in undetermined state - the GUI/shell applet are not guaranteed to work as there are many interdependencies in play.
You should use a 3rd party SSH client to connect to the nodes. If you are on Windows, you can download Putty. If you are using *ix or *BSD based system (Mac), you can open a local terminal and use "ssh" toolset to connect.

If you meant something else by "shell" - please elaborate.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
@bbgeek17 thanks for the reply.
What I understand from your reply: no cluster quorum - no proper GUI / Shell...
Ok, taken as an axiom, even this sounds a bit strange to me.
This ultimately means: SSH ports need to be opened in order to be able to manage the cluster, "just GUI" is not enough - this is not a pleasant discovery.

What SSH and how to use it - is known to me.
 
Quorum is required for the WebUI and the whole cluster to work correctly: https://forum.proxmox.com/threads/proxmox-ve-login-failed-please-try-again.55488/post-461690
You might consider setting up an external voter for your cluster, so you can shutdown unused nodes, see: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support
Ok, I hope I understand it correctly, I need to have a "quorum" even when few (or even one of my nodes) is running.
This would mean either one or another of serves needs to have a higher "voting power", or I need to decrease quorum to the required minimum.
That's what I was trying to achieve with no success:

Code:
root@pve1002:~# pvecm expected 1
Unable to set expected votes: CS_ERR_INVALID_PARAM

This command was performed when 3 out of 4 nodes were up and quorum was in place.

How could I change a voting power / or set Quorum level to 1?

This is not production environment (no plans to have HA or auto-migration for individual VMs, especially for those where pass through is used) - this would be perfectly ok for me.

EDITED:
I've tried to edit the /etc/pve/corosync.conf increasing the votes per node. The system has instantly recalculated quorum.
Is there a possibility to define the quorum level "manually"?
 
Last edited:
Ok, I hope I understand it correctly, I need to have a "quorum" even when few (or even one of my nodes) is running.
This would mean either one or another of serves needs to have a higher "voting power", or I need to decrease quorum to the required minimum.
That's what I was trying to achieve with no success:

Code:
root@pve1002:~# pvecm expected 1
Unable to set expected votes: CS_ERR_INVALID_PARAM

This command was performed when 3 out of 4 nodes were up and quorum was in place.

How could I change a voting power / or set Quorum level to 1?

This is not production environment (no plans to have HA or auto-migration for individual VMs, especially for those where pass through is used) - this would be perfectly ok for me.

EDITED:
I've tried to edit the /etc/pve/corosync.conf increasing the votes per node. The system has instantly recalculated quorum.
Is there a possibility to define the quorum level "manually"?
pvecm expected <value> will only allow you to set values which would give you quorum, e.g. if you have a single node online in a 3 node cluster.
Further, you can set the voting power in `/etc/pve/corosync.conf` (don't forget to increase the config version), while making sure all nodes are up and the cluster is quorate (so that the changes are propagated to all nodes).

Nevertheless using an additional external voter is the recommended way to go.
 
pvecm expected <value> will only allow you to set values which would give you quorum, e.g. if you have a single node online in a 3 node cluster.
Further, you can set the voting power in `/etc/pve/corosync.conf` (don't forget to increase the config version), while making sure all nodes are up and the cluster is quorate (so that the changes are propagated to all nodes).

Nevertheless using an additional external voter is the recommended way to go.
I've tried to set pvecm expected 1 - this should be fine to get to the quorum, but system has rejected this - see an error code above.

When it comes to the vote changes in the config file: all worked as expected, changes have been populated to all nodes - I've checked the config files on all systems - all have been updated instantly.

As I've understood from the documentation: there is only one external voter possible and it will allow to have +1 node being down. This would mean additional piece of HW to manage and would not really address my (home lab) needs - I would like to avoid this.

My wish would be: define quorum level manually, that when one-two of four nodes are up - cluster is fully functional.

Do I understand it correctly, it should be possible to achieve just by running pvecm expected 1 - it is correct assumption?
If "yes" - how to fix the situation with this error?
 
I've tried to set pvecm expected 1 - this should be fine to get to the quorum, but system has rejected this - see an error code above.

When it comes to the vote changes in the config file: all worked as expected, changes have been populated to all nodes - I've checked the config files on all systems - all have been updated instantly.

As I've understood from the documentation: there is only one external voter possible and it will allow to have +1 node being down. This would mean additional piece of HW to manage and would not really address my (home lab) needs - I would like to avoid this.

My wish would be: define quorum level manually, that when one-two of four nodes are up - cluster is fully functional.

Do I understand it correctly, it should be possible to achieve just by running pvecm expected 1 - it is correct assumption?
If "yes" - how to fix the situation with this error?
pvecm expected 1 is intended to lower the expected votes in cases where you have no quorum (so for maintenance when the cluster is broken).

You can give those nodes you want to keep online while shutting down the others a higher quorum_vote in the config, so that the cluster remains quorate. E.g. a 3 node cluster with one node having quorum_votes: 3, while the others are offline:

Bash:
# pvecm status
...
Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      3
Quorum:           3 
Flags:            Quorate
...
 
pvecm expected 1 is intended to lower the expected votes in cases where you have no quorum (so for maintenance when the cluster is broken).

indeed, it worked now, when only one node was in the net, but did not when 3 of 4 were up and running.
ok, at least this helps a bit.

You can give those nodes you want to keep online while shutting down the others a higher quorum_vote in the config, so that the cluster remains quorate. E.g. a 3 node cluster with one node having quorum_votes: 3, while the others are offline:...

ok, I'll try to assign much higher number to one of the nodes and see how this impacts the situation, however, this would work for one node only.
if I need to keep "another" node up and running and shut down the "first" one - this would no longer work.

Is there a possibility to set the target quorum level manually, in e.g. any of the config files? - if not, could it be implemented in one of the next releases?

Edited:
BTW, yesterday there were two (GUI) packages updated and since then, even when no quorum, shell is reachable via Web GUI - thanks for the fix!
 
Last edited:
Is there a possibility to set the target quorum level manually, in e.g. any of the config files? - if not, could it be implemented in one of the next releases?
The whole point of corosync and it's quorum based cluster state management is to avoid inconsistent state changes by network partitioning, possibly leaving some nodes in a different state than others. So this is not something you want to implement, as it counteracts what one actually wants to achieve.
 
The whole point of corosync and it's quorum based cluster state management is to avoid inconsistent state changes by network partitioning, possibly leaving some nodes in a different state than others. So this is not something you want to implement, as it counteracts what one actually wants to achieve.
I am not sure I understand the statement "inconsistent state changes".

The use case is rather clear:
Cluster has multiple nodes, not all of them need to be up and running all the time (also due to the energy constraints), e.g. node is in the "cold reserve", and node is driven up in case of some scenarios, and then moved back down when the need is no longer there.

Adding the possibility to add the target quorum level manually would not change the overall concept, it just enables "manual" fine tuning of the cluster.
At this moment quorum is calculated automatically what is not most convenient way for non "industrial" environment or special scenarios.
 
I am not sure I understand the statement "inconsistent state changes".
I'm referring to a possible split brain situation, meaning because of some sort of connection issue the cluster is segmented into two sub-clusters.

Then only nodes in the sub-cluster with the majority of nodes (with quorum) should be allowed to perform changes to shared state (e.g. the Proxmox Cluster Filesystem). The other sub-cluster should not be allowed to do that, unless it becomes part of the quorate partition of the cluster again and therefore can catch up with all the state changes which might have happened on the quorate sub-cluster in the mean time.
 
Thanks for explanation. This makes sense, especially in productive environment.

In small Lab environment, the ability to set the quorum level manually would be a nice possibility.
 
This happens again. The shell WEB GUI is not reachable as there is no quorum.
It means SSH has to be exposed to the net all the time to be able to do something with the system in case of no quorum.
And this also means that for home lab, when not all systems are up and running, the Cluster becomes practically useless.
And that all could be fixed easily...

My use case:
  • multiple proxmox servers are in the cluster
  • one / two of them are up and running
  • clusters might be in different locations (with no permanent connection)
  • another server(s) is(are) used as a standby / backup / another functionality, which is not needed all the time, e.g. file server where data is stored as a back-up, but this system is not always up; or another system, which does specific things, but used on specific occasion. All the rest of the time these systems are off, and Proxmox servers are also off. These systems can be switched on by using wake on lan - this would be perfectly fine.
The need:
  • to be able to set the minimum quorum level for the cluster.
Understandable constraints:
  • cluster setup, changes should be done on one (two) machine(s), which is(are) up and running all the time, all others should be some sort of "read-only" (not technically, but organizationally restricted)

It should be so simple to implement...
Is there any formal process for feature requests?
 
It should be so simple to implement...
Well this is a more complex topic than you might think it is, not so straight forward.

clusters might be in different locations (with no permanent connection)
So here you are referring to multi cluster setup, not a cluster of nodes. Multi cluster management is being actively developed at the moment.

another server(s) is(are) used as a standby / backup / another functionality, which is not needed all the time, e.g. file server where data is stored as a back-up, but this system is not always up; or another system, which does specific things, but used on specific occasion. All the rest of the time these systems are off, and Proxmox servers are also off. These systems can be switched on by using wake on lan - this would be perfectly fine.
The main question is why cluster your nodes in the first place, when you are not really using them as cluster?

Is there any formal process for feature requests?
Feature requests please on https://bugzilla.proxmox.com/
 
The main question is why cluster your nodes in the first place, when you are not really using them as cluster?
I feel this is the main question, which might help to understand the situation.
My home lab has few servers, and I want to keep them organized. I am moving the VMs from server to server when it is needed, again, it is a true lab, not a production environment (besides very few VMs, which need to be up and running almost all the time).
Being in the cluster it allows me to keep that all together and manager them from one Web GUI window, not many individual systems with individual GUIs. The migration of VMs becomes a singly (almost) click exercise. being in separated systems it is not possible today.
As this is a home lab I do not have a luxury to keep all servers up and running all the time: it costs a lot due to the electricity, and I simply do not need all servers being up and running all the time.
There is a separated system (which I consider to deploy), and this system will need the same capabilities (to add, remove, migrate VM from system to system). If this is 100% another cluster - ok, I could live with this part.
But the home lab and the restriction to keep quorum in order to be able to do something, even connect the system via Web Gui is an issue.
My desire would be: there is a setting in the system, which allows me to define the minimum quorum, that i can reach it by these one/two systems, without a need to keep entire landscape up and running.
Today, in order to create another lab VM on one of servers I either have to get to SSH and define that this system can operate along (and it is possible only when only one system is up and running), or fire up remaining servers in the cluster in order to get to the quorum (and this is the only option if more than one server is up and running). Remaining option (and the less favorite option) - forget about cluster and have all this pain of individual systems.
I hope this helps to understand the situation. I guess I am not along with this situation.
And honestly, I love Cluster set-up, and with to keep it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!