Two-Node High Availability Cluster problem with RgManager

ap87 · Mar 7, 2012

Hi!
There is a problem with the configuration of the HA cluster.
On Boot:

cluster.conf

Code:

<?xml version="1.0"?>
<cluster config_version="34" name="cluster">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.0.120" login="LOGIN" name="ipmi0" passwd="PASS"/>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.0.121" login="LOGIN" name="ipmi1" passwd="PASS"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node-0" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="ipmi0"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node-1" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="114"/>
  </rm>
</cluster>

RGManager Start

Code:

Starting Cluster Service Manager: [FAILED]

TASK ERROR: command '/etc/init.d/rgmanager start' failed: exit code 1

syslog

Code:

Mar  7 15:56:29 node-0 pvedaemon[1952]: <root@pam> starting task UPID:node-0:00000896:00008F53:4F574CED:srvstart:rgmanager:root@pam:
Mar  7 15:56:29 node-0 pvedaemon[2198]: starting service rgmanager: UPID:node-0:00000896:00008F53:4F574CED:srvstart:rgmanager:root@pam:
Mar  7 15:56:29 node-0 kernel: dlm: Using TCP for communications
Mar  7 15:56:29 node-0 dlm_controld[1833]: dlm_join_lockspace no fence domain
Mar  7 15:56:29 node-0 dlm_controld[1833]: process_uevent online@ error -1 errno 2
Mar  7 15:56:29 node-0 kernel: dlm: rgmanager: group join failed -1 -1
Mar  7 15:56:29 node-0 pvedaemon[2198]: command '/etc/init.d/rgmanager start' failed: exit code 1
Mar  7 15:56:29 node-0 pvedaemon[1952]: <root@pam> end task UPID:node-0:00000896:00008F53:4F574CED:srvstart:rgmanager:root@pam: command '/etc/init.d/rgmanager start' failed: exit code 1

/var/log/cluster/fence_na.log

Code:

Node Assassin: . [].
TCP Port: ...... [238].
Node: .......... [00].
Login: ......... [].
Password: ...... [].
Action: ........ [metadata].
Version Request: [no].
Done reading args.
Connection to Node Assassin: [] failed.
Error was: [unknown remote host: ]
Username and/or password invalid. Did you use the command line switches properly?

apparently the error in this. But the manual does not say anything about fencing na

fenced.log

Code:

Mar 07 15:50:38 fenced fenced 1324544458 started

clustat

Code:

Cluster Status for cluster @ Wed Mar  7 16:50:32 2012
Member Status: Quorate


 Member Name                             ID   Status
 ------ ----                             ---- ------
 node-0                                      1 Online, Local
 node-1                                      2 Online

cman_tool status

Code:

Version: 6.2.0
Config Version: 34
Cluster Name: cluster
Cluster Id: 13364
Cluster Member: Yes
Cluster Generation: 360
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags: 2node
Ports Bound: 0
Node name: node-0
Node ID: 1
Multicast addresses: 239.192.52.104
Node addresses: 192.168.0.10

pveversion -v

Code:

pve-manager: 2.0-38 (pve-manager/2.0/af81df02)
running kernel: 2.6.32-7-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-23
qemu-server: 2.0-25
pve-firmware: 1.0-15
libpve-common-perl: 1.0-17
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-12
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-5
ksm-control-daemon: 1.1-1

Maybe I made a mistake in configuring?
Fence works
DRDB works

ap87 · Mar 7, 2012

Sorry

http://pve.proxmox.com/wiki/Fencing#Enable_fencing_on_all_nodes
Problem solved
Remove theme, plz

hercous · May 29, 2012

Hi!

I have the same problem with starting RGmanager:

May 29 14:40:20 polina kernel: dlm: Using TCP for communications
May 29 14:40:20 polina dlm_controld[3027]: dlm_join_lockspace no fence domain
May 29 14:40:20 polina dlm_controld[3027]: process_uevent online@ error -1 errno 2
May 29 14:40:20 polina kernel: dlm: rgmanager: group join failed -1 -1

I have 2node cluster (dell M600) and it seems the problem has started after intalling Dell OpenManage (http://linux.dell.com/repo/community/deb/latest/)

After reboot I have to join fence domain (fence_tool join) and to start RGmanager (service rgmanager start) manually. Then the cluster works OK.

Does anyone have any suggestion?

Thanks in advance

dietmar · May 30, 2012

hercous said:
Does anyone have any suggestion?

You forgot to enable fencing in etc/default/redhat-cluster-pve?

FENCE_JOIN="yes"

hercous · May 30, 2012

Thanks for reply, but
FENCE_JOIN="yes" is enabled.

basanisi · Jul 23, 2012

Hello,

I have the same problem, and all work perfectly since I install dell OMSA. when I restart the cluster rgmaner fail withe thsi error

kernel: dlm: rgmanager: group join failed -1 -1

When I remove dell OMSA and restart the cluster all start normally without touch any config files

macday · Jul 23, 2012

Thank you for the info. I´m planing to configure HA on an Dell Cluster with OMSA. Hope there will be a fix (need OMSA for Raid Management).

jforeman · Aug 8, 2012

basanisi said:
Hello,

I have the same problem, and all work perfectly since I install dell OMSA. when I restart the cluster rgmaner fail withe thsi error

kernel: dlm: rgmanager: group join failed -1 -1

When I remove dell OMSA and restart the cluster all start normally without touch any config files

Please see the bug that I've reported against cman. I've just noted in the bug that you are being affected by the same issue. There is an easy fix, and I've attached a patch to the bug report. The init script for cman (the cluster manager) uses an unreliable method for determining the Linux distribution and doesn't start properly on PVE if the folder /etc/sysconfig is present.

Please let me know if I can help further with this issue.

https://bugzilla.proxmox.com/show_bug.cgi?id=112

johjoh · Feb 22, 2013

Same problem with fence_ilo rgmanager don't start and give me error:
"kernel: dlm: rgmanager: group join failed -1 -1"

I haven't apply any .patch file before, how does it work? Can you send me cman instead?

Thank you

Search

Search

Two-Node High Availability Cluster problem with RgManager

ap87

New Member

ap87

New Member

hercous

Guest

dietmar

Proxmox Staff Member

hercous

Guest

basanisi

Renowned Member

macday

Member

jforeman

New Member

johjoh

Member

We value your privacy