Restore cluster configuration

decibel83

Renowned Member
Oct 15, 2008
210
1
83
Hi.
I'm doing some tests with a PVE 2.1 cluster to learn it.

I create a new cluster configuration on the master server:

Code:
root@vetest1:/var/lib/vz/template/iso# pvecm create testcluster
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
79:f1:5a:d4:4f:d6:3d:7a:2a:ab:a3:ae:30:14:48:73 root@vetest1
The key's randomart image is:
+--[ RSA 2048]----+
| o E             |
|. +          .  o|
| . .      . . ..=|
|    .    . +  .+.|
|   .    S . o. ..|
|  .      . o  o  |
|   o      .. .   |
|    o    .  o    |
|     .oo..o.     |
+-----------------+
Restarting pve cluster filesystem: pve-cluster[dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]

And I enabled the unicast communication in the cluster:

Code:
<?xml version="1.0"?>
<cluster name="testcluster" config_version="1">


  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu">
  </cman>


  <clusternodes>
  <clusternode name="vetest1" votes="1" nodeid="1"/>
  </clusternodes>


</cluster>

I added the nodes to /etc/hosts:

Code:
127.0.0.1       localhost
123.123.123.1       vetest1.domain.com    vetest1
123.123.123.2    vetest2.domain.com    vetest2

The cluster is correctly created:

Code:
root@vetest1:/var/lib/vz/template/iso# pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: testcluster
Cluster Id: 31540
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: vetest1
Node ID: 1
Multicast addresses: 239.192.123.175 
Node addresses: 123.123.123.1

Now to the second node:

Code:
root@vetest2:~# pvecm add vetest1.domain.com
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1d:bb:f6:0b:39:a3:62:41:6e:5b:a6:92:9b:d8:b2:46 root@vetest1
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|          .      |
|      .  . o     |
|     o  S o      |
|  E   + o  o     |
| .   o *  B      |
|  ooo.=  o =     |
| .oo++ ..   o.   |
+-----------------+
The authenticity of host 'vetest1.domain.com (123.123.123.1)' can't be established.
RSA key fingerprint is cb:76:18:7d:b9:01:cc:43:90:5c:cb:e7:a4:14:ea:3a.
Are you sure you want to continue connecting (yes/no)? yes+
root@vetest1.domain.com's password: 
node vetest1 already defined
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... 
Timed-out waiting for cluster Check cluster logs for details
[FAILED]
waiting for quorum...

No quorum from the master server.

Now I cannot retry:

Code:
root@vetest2:~# pvecm add vetest1.domain.com
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
authentication key already exists







How I can restore the cluster configuration reverting it back before the "pvecm add" command to retry to add it to the master server?
I see that the /etc/pve/priv folder does not exist and I cannot create it:

Code:
root@vetest2:~# mkdir /etc/pve/priv
mkdir: cannot create directory `/etc/pve/priv': Permission denied

The network does not support multicast because the servers are not connected in a local network (they are in housing at Hetzner).

Thank you very much!
Bye.
 
did you added all your hosts in /etc/hosts (as described in the wiki)?
 
ok, missed it in your post. did you reboot each node after you added /etc/hosts?
 
I reinstalled and retried, this time using the IP address instead of the DNS name:

Code:
root@vetest2:~# pvecm add 123.123.123.2
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ff:4a:99:9a:a2:22:32:36:66:1f:d7:50:ce:5e:55:b9 root@vetest2
The key's randomart image is:
+--[ RSA 2048]----+
|             ..  |
|            ..   |
|       .   .  .  |
|      +   .  E   |
|     . oS.       |
|      + .. o     |
|   . . o  =      |
|+=. o .  + .     |
|=ooo.. .o ...    |
+-----------------+
The authenticity of host '123.123.123.2 (123.123.123.2)' can't be established.
RSA key fingerprint is cb:76:18:7d:b9:01:cc:43:90:5c:cb:e7:a4:14:ea:3a.
Are you sure you want to continue connecting (yes/no)? yes
root@123.123.123.2's password: 
root@123.123.123.2's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

I waited for about 10 minutes but nothing happened...

Could you help me please?
 
did you rebooted now both nodes after changing to unicast and setting /etc/hosts?
 
did you rebooted now both nodes after changing to unicast and setting /etc/hosts?

Yes.

Now another strange behavior.

I see the second node at the master node:

Code:
root@vetest1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M   2160   2012-06-11 17:07:09  vetest1
   2   X      0                        vetest2

And I see it in the vetest1 web interface, but I cannot use it. It asks me for the username and password everytime I try to access to the second node configuration, and it does not accept the password even if I try to connect directly to the second node web interface.

This is the status of the cluster on the second node:

Code:
root@vetest2:~# pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: testcluster
Cluster Id: 31540
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: vetest2
Node ID: 2
Multicast addresses: 255.255.255.255 
Node addresses: 123.123.123.2

Code:
root@vetest2:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        vetest1
   2   M      4   2012-06-11 17:12:04  vetest2

This is the screenshot of the web interface on vetest1 (the cluster master node): http://imageshack.us/photo/my-images/254/schermata062456090alle1.png/
 
Last edited:
Yes.

Now another strange behavior.

I see the second node at the master node:

Code:
root@vetest1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M   2160   2012-06-11 17:07:09  vetest1
   2   X      0                        vetest2
...

Code:
root@vetest2:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        vetest1
   2   M      4   2012-06-11 17:12:04  vetest2

This is the screenshot of the web interface on vetest1 (the cluster master node): http://imageshack.us/photo/my-images/254/schermata062456090alle1.png/
Hi,
how are the nodes connected together? Do you got the quorum, if you connect them via cross-over-kabel?

Udo
 
Hi,
how are the nodes connected together? Do you got the quorum, if you connect them via cross-over-kabel?

Udo

I cannot do that, because the server are hosted in a webfarm from Hetzner, so they are connected with the Internet.
 
The output of 'pvecm status' on vestest2 looks wrong - still only one node? Try to restart cman/pve-cluster on both nodes:

# /etc/init.d/cman start
# /etc/init.d/pve-cluster restart
 
The output of 'pvecm status' on vestest2 looks wrong - still only one node? Try to restart cman/pve-cluster on both nodes:

# /etc/init.d/cman start
# /etc/init.d/pve-cluster restart

Now I have a different behaviour:

Code:
root@vetest1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  37896   2012-06-12 01:29:13  vetest1
   2   M  37908   2012-06-12 01:29:16  vetest2

Code:
root@vetest2:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M  37908   2012-06-12 01:29:16  vetest1
   2   M  37904   2012-06-12 01:29:16  vetest2

So they seems to be connected each other.

But when I try to access to the web interface on vetest1 I see vetest2 offline, and when I try to access it, the root password is not accepted.
Here you can find a screenshot: http://imageshack.us/photo/my-images/802/schermata062456091alle1.png/

I tried to start cman and restart pve-cluster on both nodes:

Code:
root@vetest1:~# /etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]
root@vetest1:~# /etc/init.d/pve-cluster restart
Restarting pve cluster filesystem: pve-cluster.

Code:
root@vetest2:~# /etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]
root@vetest2:~# /etc/init.d/pve-cluster restart
Restarting pve cluster filesystem: pve-cluster.

Now vetest2 is shown online in the web interface, but the root password is not accepted (screenshot: http://imageshack.us/photo/my-images/849/schermata062456091alle1.png/).

When I insert the root password for vetest2, it seems to be correct on vetest1 in /var/log/syslog:

Code:
Jun 12 11:59:03 vetest1 pvedaemon[8439]: <root@pam> successful auth for user 'root@pam'
Jun 12 11:59:19 vetest1 pvedaemon[8440]: <root@pam> successful auth for user 'root@pam'

please note that the root password on vetest1 and vetest2 is the same.
 
Last edited:
I suggest you go for a support subscription and we take a look via remote login.
 
After update on PVE 2.2 I can not manage my vm. Two my servers shown as turned off.
Follow this instruction
http://pve.proxmox.com/wiki/Downloads
I found that my sources.list little different, point to russion ftp server and
pveversion -v
pve-manager: 2.2-24 (pve-manager/2.2/7f9cfa4c)
qemu-server: 2.0-62
libpve-common-perl: 1.0-36
is older version.

apt-get update && apt-get upgrade
all work!!!
 
Last edited: