2-Nodes Cluster: activity permitted since one node is down.

italian01

Member
Feb 23, 2012
57
0
6
Italy
Hello,

what is actvity permitted in a 2-nodes cluster enviroment when one node is down? In detail, can I create a new VM on the node which is up?

I'm asking that because I tried it with no-success result. Proxmox said me: cluster hasn't quorum!

Thank's in advance.
 
Basically, since config is replicated across the nodes, there is the necessity to a "quorum" to tell each node which config is the right one (it's own or the one of the cluster). With 2 nodes, think the case you disconnect them from each other and start do incompatible modification on them... what would happen once you have reconnected them? Probably a disaster, so the configuration is "locked".
If you really know what you are doing, you can change the expected votes to reach quorum and "unlock" the node, see:
Code:
   pvecm expected 1
If you want to start the other node's VM on your current node, you have to copy the config from one path to the other, i.e. like:
Code:
root@prox01:~# mv /etc/pve/nodes/prox02/qemu-server/102.conf  /etc/pve/nodes/prox01/qemu-server/102.conf
Beware that if you are on shared storage, and then the same VM are accessed (run) from the other node, you will DESTROY them!
 
Thank you for your reply.

Reading your answer, MMenaz, I saw I have to add other infos; before all, writing something more about what I did.

So, my goal was (and is) to make up a 2-nodes cluster starting from an under-performance machine. Therefore, following four steps I thought to be needed to get it :

1, make up a 2-nodes cluster including old (proxmox) and new (proxmox2) workstations;
2, migrate all VMs from old workstation to new one;
3, dismiss the old workstation;
4, integrate the second new workstation.

So, till up 3rd step all went right!

Therefore, I've ordered the 2nd new workstation and I'm going on working with new "orphan node" while I'm waiting the coming of the new 2nd one.

The issue has came up when, in this enviroment, I had needed to make a new VM. Then, Proxmox said me that there wasn't a quorum.

I searched little more infos about it on this forum and through internet yet not success. The only think I've noticed cocerns CMAN: it is stopped! And when I try to run it said:

Code:
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ] 
   Checking Network Manager... [  OK  ] 
   Global setup... [  OK  ] 
   Loading kernel modules... [  OK  ] 
   Mounting configfs... [  OK  ] 
   Starting cman... two_node set but there are more than 2 nodes
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED] 
TASK ERROR: command '/etc/init.d/cman start' failed: exit code 1

So, I cannot run any pvecm* command in this manner...

What can I do, now?

Thank's in advance.
 
Does it mean that I cannot install any new VM till I dismantle the cluster?

Thank's in advance for your reply.
 
Last edited:
Issue Solved!

...and I'm going to explain you how I did it.

First of all, I have to admit that the cause has been on one my own error: in fact, after third step of those listed above, I erroneously modified the cluster.conf file from

Code:
<?xml version="1.0"?>
<cluster name="ProxMCls" config_version="4">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" expected_votes="1" two_node="1"/>

  <clusternodes>

  <clusternode name="proxmox" votes="1" nodeid="1"/>
  <clusternode name="proxmox2" votes="1" nodeid="2"/>

</clusternodes>

</cluster>

to

Code:
<?xml version="1.0"?>
<cluster name="ProxMCls" config_version="5">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" expected_votes="1" two_node="1"/>

  <clusternodes>

  <clusternode name="proxmox2" votes="1" nodeid="2"/></clusternodes>

</cluster>

I did it because I thought that it was the better configuration to represent the current physical cluster condition. Instead, it was not so: having kept a cluster enviroment, I should have kept original configuration in cluster.conf file.

That admitted, my error brought the survival node in protection-mode with following two consequences:

1) CMAN service was stopped and not runnable;
2) /etc/pve folder was not writeable.

In this condition, I was freezed because it seemed there wasn't any way to change that wrong configuration and to restore the right cluster enviroment.

But I'm lucky and I found a valued info on this forum about pmxcfs.

Pmxcfs is the binary executable file which manages the Cluster File System (that is, /etc/pve: folder which is replicated among nodes) and it is NOT WELL DOCUMENTED binary!! (note to P.XMOX developer group). It get running as a service by the launching of the /etc/init.d/pve-cluster script. It has a command-line parameters which can be set by the following configuration file: /etc/default/pve-cluster. In this file there is a variable, DAEMON_OPTS="", which can be customized.

There is a switch, "-l", that it sets up a node like a standalone machine. Hence, here was the trick:

1) bring the node as a standalone machine;
2) make appropriate modifications;
3) restor the node as original cluster component.

Step by step, this meaning:

Code:
# /etc/init.d/pve-cluster stop                 #stop cluster service
# vi /etc/default/pve-cluster
DAEMON_OPTS="-l"
# /etc/init.d/pve-cluster start                #start cluster service

{ here, make appropriate modification in any config file inside in /etc/pve/ dir }

# /etc/init.d/pve-cluster stop                 #stop cluster service
# vi /etc/default/pve-cluster
DAEMON_OPTS=""
# /etc/init.d/pve-cluster start                #start cluster service

At this time, I was able to run CMAN service (with some no-blocking error) and I solved anything running this last command:

Code:
# pvecm expected 1

That's all, folk.
 
Last edited:
Excellent!
I'm glad you were able to devise a solution, but it troubles me that something isn't well documented. Would you care to expound on what isn't well documented, and what the end result should be of it being well documented? I will research what I can to get something started once I know some overall. Hopefully, what I start, others will help to guide me so that we can have a Good Final Solution.

Also, can you mark this as SOLVED? Helps to keep some order
 
OK Raymond: I'm going to do all two request of yours!

1.

The thing "not well documented" which I have pointed is "pmxcfs" service. In fact, there is wrote nowhere how to set DEAMON_OPTS variable inside /etc/default/cluster file, and the command "# man pmxcfs" doesn't give any answer. Pmxcfs is mentioned at follow link only:

http://pve.proxmox.com/wiki/Proxmox_Cluster_file_system_(pmxcfs)

and, how you can see, there is not mention about command line arguments.


2.

About "marking it solved", how does it do it?

Regards.