Hi.
Just tried to set up cluster on PVE 2.1 and I failed. Don't know what to do with that.
Here is the situation:
1) 3 servers (vmserv (10.101.0.10), vmserv1 (10.101.0.11), vmserv2 (10.101.0.12) ).
2) "vmserv" has VMs, other servers - no.
3) I started setup according to wiki http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster . But something went wrong and I can't finish with cluster. I even can't revert system state back. I thought it should be possible to revert system state back (e.g. by cleaning configs), but can't see a way to do it.
root@vmserv:~# cman_tool nodes -a
Node Sts Inc Joined Name
1 M 12 2012-07-27 03:20:26 vmserv1
Addresses: 10.101.0.10
------- This is wrong. because IP of the "vmserv1" is 10.101.0.11. I can't say how that happened. I can't remove vmserv1 from cluster, because it
can't remove itself, I think so. And I can't do anything with that, because it is not fully configured....
Tried at "vmserv1":
root@vmserv:~# pvecm e 1
root@vmserv:~# pvecm delnode vmserv1
cluster not ready - no quorum?
------- Seems like the problem with quorum appeared in time of cluster setup. Tried "service cman restart" and " service pve-cluster restart" everywhere, didn't help.
Here is my /etc/hosts (all servers has the same):
root@vmserv:~# cat /etc/hosts
127.0.0.1 localhost
10.101.0.10 vmserv.atz.dmza.bogus vmserv
10.101.0.11 vmserv1.atz.dmza.bogus vmserv1
10.101.0.12 vmserv1.atz.dmza.bogus vmserv2
Here is my cluster config from "vmserv":
root@vmserv:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="cluster" config_version="3">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="vmserv1" votes="1" nodeid="1"/></clusternodes>
</cluster>
In /var/log/syslog at "vmserv" we can see that it trying to do something on failed cluster (I think so):
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144420
Jul 30 10:10:31 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144430
Jul 30 10:10:32 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144440
Jul 30 10:10:33 vmserv dlm_controld[2184]: daemon cpg_leave error retrying
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144450
Jul 30 10:10:34 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144460
In /var/log/syslog at "vmserv1":
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144420
Jul 30 10:10:31 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144430
Jul 30 10:10:32 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144440
Jul 30 10:10:33 vmserv dlm_controld[2184]: daemon cpg_leave error retrying
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144450
Jul 30 10:10:34 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144460
The problem is that "vmserv" already has VMs and I worry about them. So, I decided to ask community first. What should I do with that? How can I stop the cluster and back things as they were before?
Thanks.
Just tried to set up cluster on PVE 2.1 and I failed. Don't know what to do with that.
Here is the situation:
1) 3 servers (vmserv (10.101.0.10), vmserv1 (10.101.0.11), vmserv2 (10.101.0.12) ).
2) "vmserv" has VMs, other servers - no.
3) I started setup according to wiki http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster . But something went wrong and I can't finish with cluster. I even can't revert system state back. I thought it should be possible to revert system state back (e.g. by cleaning configs), but can't see a way to do it.
root@vmserv:~# cman_tool nodes -a
Node Sts Inc Joined Name
1 M 12 2012-07-27 03:20:26 vmserv1
Addresses: 10.101.0.10
------- This is wrong. because IP of the "vmserv1" is 10.101.0.11. I can't say how that happened. I can't remove vmserv1 from cluster, because it
can't remove itself, I think so. And I can't do anything with that, because it is not fully configured....
Tried at "vmserv1":
root@vmserv:~# pvecm e 1
root@vmserv:~# pvecm delnode vmserv1
cluster not ready - no quorum?
------- Seems like the problem with quorum appeared in time of cluster setup. Tried "service cman restart" and " service pve-cluster restart" everywhere, didn't help.
Here is my /etc/hosts (all servers has the same):
root@vmserv:~# cat /etc/hosts
127.0.0.1 localhost
10.101.0.10 vmserv.atz.dmza.bogus vmserv
10.101.0.11 vmserv1.atz.dmza.bogus vmserv1
10.101.0.12 vmserv1.atz.dmza.bogus vmserv2
Here is my cluster config from "vmserv":
root@vmserv:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="cluster" config_version="3">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="vmserv1" votes="1" nodeid="1"/></clusternodes>
</cluster>
In /var/log/syslog at "vmserv" we can see that it trying to do something on failed cluster (I think so):
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144420
Jul 30 10:10:31 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144430
Jul 30 10:10:32 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144440
Jul 30 10:10:33 vmserv dlm_controld[2184]: daemon cpg_leave error retrying
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144450
Jul 30 10:10:34 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144460
In /var/log/syslog at "vmserv1":
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:30 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144420
Jul 30 10:10:31 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144430
Jul 30 10:10:32 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144440
Jul 30 10:10:33 vmserv dlm_controld[2184]: daemon cpg_leave error retrying
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [status] crit: cpg_send_message failed: 9
Jul 30 10:10:33 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144450
Jul 30 10:10:34 vmserv pmxcfs[457959]: [dcdb] notice: cpg_join retry 144460
The problem is that "vmserv" already has VMs and I worry about them. So, I decided to ask community first. What should I do with that? How can I stop the cluster and back things as they were before?
Thanks.
Last edited: