Two-Node High Availability Cluster problem with RgManager

ap87

New Member
Feb 8, 2012
10
0
1
Russia, Kirov
Hi!
There is a problem with the configuration of the HA cluster.
On Boot:
1331121096-clip-9kb.png

cluster.conf
Code:
<?xml version="1.0"?>
<cluster config_version="34" name="cluster">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.0.120" login="LOGIN" name="ipmi0" passwd="PASS"/>
    <fencedevice agent="fence_ipmilan" ipaddr="192.168.0.121" login="LOGIN" name="ipmi1" passwd="PASS"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="node-0" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="ipmi0"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node-1" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="ipmi1"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="114"/>
  </rm>
</cluster>

RGManager Start
Code:
Starting Cluster Service Manager: [FAILED]

TASK ERROR: command '/etc/init.d/rgmanager start' failed: exit code 1

syslog
Code:
Mar  7 15:56:29 node-0 pvedaemon[1952]: <root@pam> starting task UPID:node-0:00000896:00008F53:4F574CED:srvstart:rgmanager:root@pam:
Mar  7 15:56:29 node-0 pvedaemon[2198]: starting service rgmanager: UPID:node-0:00000896:00008F53:4F574CED:srvstart:rgmanager:root@pam:
Mar  7 15:56:29 node-0 kernel: dlm: Using TCP for communications
Mar  7 15:56:29 node-0 dlm_controld[1833]: dlm_join_lockspace no fence domain
Mar  7 15:56:29 node-0 dlm_controld[1833]: process_uevent online@ error -1 errno 2
Mar  7 15:56:29 node-0 kernel: dlm: rgmanager: group join failed -1 -1
Mar  7 15:56:29 node-0 pvedaemon[2198]: command '/etc/init.d/rgmanager start' failed: exit code 1
Mar  7 15:56:29 node-0 pvedaemon[1952]: <root@pam> end task UPID:node-0:00000896:00008F53:4F574CED:srvstart:rgmanager:root@pam: command '/etc/init.d/rgmanager start' failed: exit code 1

/var/log/cluster/fence_na.log
Code:
Node Assassin: . [].
TCP Port: ...... [238].
Node: .......... [00].
Login: ......... [].
Password: ...... [].
Action: ........ [metadata].
Version Request: [no].
Done reading args.
Connection to Node Assassin: [] failed.
Error was: [unknown remote host: ]
Username and/or password invalid. Did you use the command line switches properly?
:confused: apparently the error in this. But the manual does not say anything about fencing na

fenced.log
Code:
Mar 07 15:50:38 fenced fenced 1324544458 started

clustat
Code:
Cluster Status for cluster @ Wed Mar  7 16:50:32 2012
Member Status: Quorate


 Member Name                             ID   Status
 ------ ----                             ---- ------
 node-0                                      1 Online, Local
 node-1                                      2 Online

cman_tool status
Code:
Version: 6.2.0
Config Version: 34
Cluster Name: cluster
Cluster Id: 13364
Cluster Member: Yes
Cluster Generation: 360
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags: 2node
Ports Bound: 0
Node name: node-0
Node ID: 1
Multicast addresses: 239.192.52.104
Node addresses: 192.168.0.10
pveversion -v
Code:
pve-manager: 2.0-38 (pve-manager/2.0/af81df02)
running kernel: 2.6.32-7-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-23
qemu-server: 2.0-25
pve-firmware: 1.0-15
libpve-common-perl: 1.0-17
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-12
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-5
ksm-control-daemon: 1.1-1
Maybe I made a mistake in configuring?
Fence works
DRDB works
 
Last edited:

hercous

Guest
Hi!

I have the same problem with starting RGmanager:

May 29 14:40:20 polina kernel: dlm: Using TCP for communications
May 29 14:40:20 polina dlm_controld[3027]: dlm_join_lockspace no fence domain
May 29 14:40:20 polina dlm_controld[3027]: process_uevent online@ error -1 errno 2
May 29 14:40:20 polina kernel: dlm: rgmanager: group join failed -1 -1

I have 2node cluster (dell M600) and it seems the problem has started after intalling Dell OpenManage (http://linux.dell.com/repo/community/deb/latest/)

After reboot I have to join fence domain (fence_tool join) and to start RGmanager (service rgmanager start) manually. Then the cluster works OK.

Does anyone have any suggestion?

Thanks in advance
 
Last edited by a moderator:

basanisi

Active Member
Apr 15, 2011
40
2
28
Hello,

I have the same problem, and all work perfectly since I install dell OMSA. when I restart the cluster rgmaner fail withe thsi error

kernel: dlm: rgmanager: group join failed -1 -1

When I remove dell OMSA and restart the cluster all start normally without touch any config files
 

jforeman

New Member
Mar 14, 2012
4
0
1
Hello,

I have the same problem, and all work perfectly since I install dell OMSA. when I restart the cluster rgmaner fail withe thsi error

kernel: dlm: rgmanager: group join failed -1 -1

When I remove dell OMSA and restart the cluster all start normally without touch any config files


Please see the bug that I've reported against cman. I've just noted in the bug that you are being affected by the same issue. There is an easy fix, and I've attached a patch to the bug report. The init script for cman (the cluster manager) uses an unreliable method for determining the Linux distribution and doesn't start properly on PVE if the folder /etc/sysconfig is present.

Please let me know if I can help further with this issue.

https://bugzilla.proxmox.com/show_bug.cgi?id=112
 

johjoh

Member
Feb 4, 2011
43
0
6
Same problem with fence_ilo rgmanager don't start and give me error:
"kernel: dlm: rgmanager: group join failed -1 -1"

I haven't apply any .patch file before, how does it work? Can you send me cman instead?

Thank you
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!