2-Nodes Cluster: activity permitted since one node is down.

italian01 · Jul 10, 2013

Hello,

what is actvity permitted in a 2-nodes cluster enviroment when one node is down? In detail, can I create a new VM on the node which is up?

I'm asking that because I tried it with no-success result. Proxmox said me: cluster hasn't quorum!

Thank's in advance.

mmenaz · Jul 10, 2013

Basically, since config is replicated across the nodes, there is the necessity to a "quorum" to tell each node which config is the right one (it's own or the one of the cluster). With 2 nodes, think the case you disconnect them from each other and start do incompatible modification on them... what would happen once you have reconnected them? Probably a disaster, so the configuration is "locked".
If you really know what you are doing, you can change the expected votes to reach quorum and "unlock" the node, see:

Code:

   pvecm expected 1

If you want to start the other node's VM on your current node, you have to copy the config from one path to the other, i.e. like:

Code:

root@prox01:~# mv /etc/pve/nodes/prox02/qemu-server/102.conf  /etc/pve/nodes/prox01/qemu-server/102.conf

Beware that if you are on shared storage, and then the same VM are accessed (run) from the other node, you will DESTROY them!

italian01 · Jul 10, 2013

Thank you for your reply.

Reading your answer, MMenaz, I saw I have to add other infos; before all, writing something more about what I did.

So, my goal was (and is) to make up a 2-nodes cluster starting from an under-performance machine. Therefore, following four steps I thought to be needed to get it :

1, make up a 2-nodes cluster including old (proxmox) and new (proxmox2) workstations;
2, migrate all VMs from old workstation to new one;
3, dismiss the old workstation;
4, integrate the second new workstation.

So, till up 3rd step all went right!

Therefore, I've ordered the 2nd new workstation and I'm going on working with new "orphan node" while I'm waiting the coming of the new 2nd one.

The issue has came up when, in this enviroment, I had needed to make a new VM. Then, Proxmox said me that there wasn't a quorum.

I searched little more infos about it on this forum and through internet yet not success. The only think I've noticed cocerns CMAN: it is stopped! And when I try to run it said:

Code:

Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ] 
   Checking Network Manager... [  OK  ] 
   Global setup... [  OK  ] 
   Loading kernel modules... [  OK  ] 
   Mounting configfs... [  OK  ] 
   Starting cman... two_node set but there are more than 2 nodes
cman_tool: corosync daemon didn't start Check cluster logs for details
[FAILED] 
TASK ERROR: command '/etc/init.d/cman start' failed: exit code 1

So, I cannot run any pvecm* command in this manner...

What can I do, now?

Thank's in advance.

Raymond Burns · Jul 10, 2013

I believe the OP is looking for a way to dismantle the cluster. He used the cluster to transfer all the VM to new hardware, and now wants to operate in a single VM again.
I think

italian01 · Jul 11, 2013

Does it mean that I cannot install any new VM till I dismantle the cluster?

Thank's in advance for your reply.

Raymond Burns · Jul 11, 2013

Try this.
Removing the cluster and/or master should allow your single node to operate as a regular PVE and not a Cluster PVE or PVE HA.
http://forum.proxmox.com/threads/7278-how-to-delete-uninstall-the-whole-cluster

italian01 · Jul 12, 2013

Issue Solved!

...and I'm going to explain you how I did it.

First of all, I have to admit that the cause has been on one my own error: in fact, after third step of those listed above, I erroneously modified the cluster.conf file from

Code:

<?xml version="1.0"?>
<cluster name="ProxMCls" config_version="4">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" expected_votes="1" two_node="1"/>

  <clusternodes>

  <clusternode name="proxmox" votes="1" nodeid="1"/>
  <clusternode name="proxmox2" votes="1" nodeid="2"/>

</clusternodes>

</cluster>

to

Code:

<?xml version="1.0"?>
<cluster name="ProxMCls" config_version="5">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" expected_votes="1" two_node="1"/>

  <clusternodes>

  <clusternode name="proxmox2" votes="1" nodeid="2"/></clusternodes>

</cluster>

I did it because I thought that it was the better configuration to represent the current physical cluster condition. Instead, it was not so: having kept a cluster enviroment, I should have kept original configuration in cluster.conf file.

That admitted, my error brought the survival node in protection-mode with following two consequences:

1) CMAN service was stopped and not runnable;
2) /etc/pve folder was not writeable.

In this condition, I was freezed because it seemed there wasn't any way to change that wrong configuration and to restore the right cluster enviroment.

But I'm lucky and I found a valued info on this forum about pmxcfs.

Pmxcfs is the binary executable file which manages the Cluster File System (that is, /etc/pve: folder which is replicated among nodes) and it is NOT WELL DOCUMENTED binary!! (note to P.XMOX developer group). It get running as a service by the launching of the /etc/init.d/pve-cluster script. It has a command-line parameters which can be set by the following configuration file: /etc/default/pve-cluster. In this file there is a variable, DAEMON_OPTS="", which can be customized.

There is a switch, "-l", that it sets up a node like a standalone machine. Hence, here was the trick:

1) bring the node as a standalone machine;
2) make appropriate modifications;
3) restor the node as original cluster component.

Step by step, this meaning:

Code:

# /etc/init.d/pve-cluster stop                 #stop cluster service
# vi /etc/default/pve-cluster
DAEMON_OPTS="-l"
# /etc/init.d/pve-cluster start                #start cluster service

{ here, make appropriate modification in any config file inside in /etc/pve/ dir }

# /etc/init.d/pve-cluster stop                 #stop cluster service
# vi /etc/default/pve-cluster
DAEMON_OPTS=""
# /etc/init.d/pve-cluster start                #start cluster service

At this time, I was able to run CMAN service (with some no-blocking error) and I solved anything running this last command:

Code:

# pvecm expected 1

That's all, folk.

Raymond Burns · Jul 12, 2013

Excellent!
I'm glad you were able to devise a solution, but it troubles me that something isn't well documented. Would you care to expound on what isn't well documented, and what the end result should be of it being well documented? I will research what I can to get something started once I know some overall. Hopefully, what I start, others will help to guide me so that we can have a Good Final Solution.

Also, can you mark this as SOLVED? Helps to keep some order

italian01 · Jul 14, 2013

OK Raymond: I'm going to do all two request of yours!

1.

The thing "not well documented" which I have pointed is "pmxcfs" service. In fact, there is wrote nowhere how to set DEAMON_OPTS variable inside /etc/default/cluster file, and the command "# man pmxcfs" doesn't give any answer. Pmxcfs is mentioned at follow link only:

http://pve.proxmox.com/wiki/Proxmox_Cluster_file_system_(pmxcfs)

and, how you can see, there is not mention about command line arguments.

2.

About "marking it solved", how does it do it?

Regards.

Search

Search

2-Nodes Cluster: activity permitted since one node is down.

italian01

Member

mmenaz

Renowned Member

italian01

Member

Raymond Burns

Member

italian01

Member

Raymond Burns

Member

italian01

Member

Raymond Burns

Member

italian01

Member

We value your privacy