unable to login to webui with 2 node cluster

Jun 23, 2022
5
2
3
I have two servers with PVE 7.2 installed.
One of the two is always powered on, the other one just sporadically (when the VMs on the node are actually required).

I created a cluster consisting of those two servers.
The idea was to move VMs from one server to another from time to time. Nothing spectecular, no HA required.

However, I soon noticed (as many others did as well) that using a cluster in proxmox with only 2 servers does not seem like a good idea.

After I shut down one node, I was NOT able to log in to the running node anymore.
It did not even give a proper error message that would indicate what was going on, just a very useless "login failed".

I found the multiple threads on several sites with people having the same issue, complaining about the same behavior.
So I issued the pvecm expected 1 command and was able to log in to the running node again.

HOWEVER: after powering on the second node the next day, the expected vote automatically changed back to 2.
So after shutting down the "sporadic" node, I had the SAME LOGIN ISSUE AGAIN.

So before I start a rant here, explaining how this is from start to finish the most unintuitive and worst way how a 2 node cluster could be handled (even if I think hard, I cannot come up with any idea how to implement this even worse, starting from the non-descript error to preventing login to a running node in the first place), I have some questions (because maybe I m missing something here, maybe there is a great idea behind it, which I dont see yet).

1. (and most importantly) How do I make pvecm expected 1 PERSISTENT, so I dont have to SSH into the running node again after powering up and shutting down the second node (which will happen on an almost daily basis).
2. Why should my cluster care about the number of votes if no HA is configured?
3. Why would you even consider blocking access to the webui of a running node? The correct way would be to allow login and show a message that this node is in read-only mode due to a missing node.

Please change this behavior as soon as possible.
 
1. (and most importantly) How do I make pvecm expected 1 PERSISTENT
You wouldn't really do so in cluster. But rather reconfigure/disband the cluster. The admin guide has a chapter on removing a node, please take your time to carefully read it to be aware of the risks involved.

2. Why should my cluster care about the number of votes if no HA is configured?
Because clustering comes with a multitude of other features which are not dependent on HA. I.e. making management easier (access via one single webinterface), replication, live migration, etc.
If you don't want your cluster to care about the amount of nodes in your cluster and their health status. Then you should not be using a cluster ;)

3. Why would you even consider blocking access to the webui of a running node? The correct way would be to allow login and show a message that this node is in read-only mode due to a missing node.
This is a side effect of the login ticket logic in the backend. If your cluster is not quorate (I.e. <= 50% of nodes are healthy) the cluster filesystem will be set to read-only, as you could run into split-brain problems otherwise. However, the ticket call needs to acquire a lock in the cluster filesystem, which would mean writing to the filesystem. Therefore the call fails.

Not telling the user why a login call failed is just a security principle. You usually don't want to give an unpriviliged attacker any information, e.g.. telling them which users exist on a system...


There is an open discussion on possibly changing the handling of the user login, making an exception for the case of a non-quorate cluster and to better communicate what's wrong. However, there are no concrete plans for this yet.
 
Last edited:
  • Like
Reactions: sterzy
1. Neither do I want to disband the cluster nor do I want to remove any node.
I just want to have a plain and simple cluster to manually move VMs from one node to another.
You wouldn't really do so in cluster.
Well, the amount of people having a problem with this approach here in this forum and on reddit alone should tell you that users really want do so in a cluster.

2.
Then you should not be using a cluster ;)
Ok, then let me know the preferred / recommended setup in this case.
I want to connect two nodes with each other for very basic tasks like manually moving VMs from one host to another. No HA required.

Also: I have not heard any valid point yet, why the cluster should care about the votes.
Could you provide a specific example please.
The worst thing that I could think of is a network split with both nodes being online but not able to talk to each other.

Even then, in this worst case, you could still only manage one of the nodes at a time (the one where you connected to the web interface, if that was possible).
So it would not even be possible to get to an inconsistent state.

When the nodes are able to reach each other again, each node could propaged the changes that were made to the other node.
Read-only would not even be required.
I mean every single file-sync program has been doing this for 30 years now.



If you chose to not provide a solution on how to make the number of votes persistent, then I will
1. cronjob the pvecm expected 1 command every 15 minutes for now as a workaround
2. look elsewhere on the internet. surely there must be configuration files that can be adjusted
3. (worst case) look into using another virtualization solution

If anyone should read this thread at some point in the future (as there will be many other users in the future who will run into this exact same issue), feel free to leave a comment on how you solved the situation.
Thank you.
 
  • Like
Reactions: Morty
Hi,

For anyone seeing this in the future, this seems to be the correct way of solving the issue:
https://www.youtube.com/watch?v=sjS9oDEw9EQ
since this seems to be specifically addressing homelab-style users who usually don't run anything too critical on their clusters, this might be fine. Since it's already mentioned in this video I assume you already know about QDevices, but for future reference: that is the preferred solution for actual two node clusters [1].

However, if you need any kind of real reliability, this will cause troubles eventually. For example, your second node in this scenario cannot function at all without the first unless you adjust the configuration again. And for good reason: Clustering is not just a "file-sync program". In a "simple" file sync scenario you can always keep version histories of the files (as most of these solutions do, because issues do arise here too) and can just refer to the user to choose between two copies (because again, conflicts still happen there too).

The idea of a cluster, however, is to increase reliability, which means the cluster needs to be able to make these decisions by itself. Otherwise VMs might not be available for a while until a user that has enough authority to make a decision can be found. And even then: humans make mistakes, this just isn't ideal.

Keeping version histories of entire VMs isn't trivial either. Depending on the storage/file format of the VM's image you might use snapshots, but that would fill disk space rather quickly. Since you don't know when a failure occurs, you'd need to pick a rather small interval at which to create snapshots in order to not lose too much information.

If you want to get a sense of how complex finding consensus between nodes can be, you can read the wikipedia article on the Paxos algorithm [2]. I can also recommend checking out its reference section.

Ok, then let me know the preferred / recommended setup in this case.
I want to connect two nodes with each other for very basic tasks like manually moving VMs from one host to another. No HA required.
If this is all you want to do, then just use back-up and restore [3]. Don't cluster your nodes, just set up some kind of shared storage on e.g. node 1 that you make backups on. Then add it to node 2 and restore there when needed. A small follow up question: how often you move VMs?

As a side node: we are working on a solution to move VMs between clusters. This would likely also solve your problem.

[1]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
[2]: https://en.wikipedia.org/wiki/Paxos_(computer_science)
[3]: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_vzdump
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!