Cluster running, but failing to connect to resource group manager

dallen1

New Member
Jun 28, 2012
6
0
1
I'm getting the following message for all my HA managed systems. I've tried restarting all the services as well as rebooting each node. It's not just migrating that causing the error - all HA commands have the same failure to connect to the resource manager. The weird thing is that everything regarding the cluster seems to be reporting as okay. Where do I go from here to get HA running again?


error message:
Code:
Executing HA migrate for VM 112 to node pve-02
Trying to migrate pvevm:112 to pve-02...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -M pvevm:112 -m pve-02' failed: exit code 1


Here's the cluster info:


PVE Version info
Code:
root@pve-01:/etc/pve# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1


root@pve-01:/etc/pve# ssh pve-02 pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1


root@pve-01:/etc/pve# ssh pve-03 pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@pve-01:/etc/pve#


PVECM status
Code:
root@pve-01:/etc/pve# pvecm status
Version: 6.2.0
Config Version: 76
Cluster Name: CLUSTER
Cluster Id: 9300
Cluster Member: Yes
Cluster Generation: 1376
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 6
Flags: 
Ports Bound: 0  
Node name: pve-01
Node ID: 1
Multicast addresses: 239.192.36.120 
Node addresses: 10.10.164.9 


root@pve-01:/etc/pve# ssh pve-02 pvecm status
Version: 6.2.0
Config Version: 76
Cluster Name: CLUSTER
Cluster Id: 9300
Cluster Member: Yes
Cluster Generation: 1376
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 6
Flags: 
Ports Bound: 0  
Node name: pve-02
Node ID: 2
Multicast addresses: 239.192.36.120 
Node addresses: 10.10.164.10 


root@pve-01:/etc/pve# ssh pve-03 pvecm status
Version: 6.2.0
Config Version: 76
Cluster Name: CLUSTER
Cluster Id: 9300
Cluster Member: Yes
Cluster Generation: 1376
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2  
Active subsystems: 6
Flags: 
Ports Bound: 0  
Node name: pve-03
Node ID: 3
Multicast addresses: 239.192.36.120 
Node addresses: 10.10.164.11 
root@pve-01:/etc/pve#


/etc/pve/cluster.conf:
Code:
<?xml version="1.0"?>
<cluster config_version="76" name="CLUSTER">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="10.10.164.19" lanplus="1" login="pve" name="node1" passwd="secret" power_wait="5" shell_timeout="300"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.10.164.20" lanplus="1" login="pve" name="node2" passwd="secret" power_wait="5" shell_timeout="300"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.10.164.18" lanplus="1" login="pve" name="node3" passwd="secret" power_wait="5" shell_timeout="300"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="pve-01" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="node1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pve-02" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="node2"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="pve-03" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device name="node3"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <pvevm autostart="1" vmid="101"/>
    <pvevm autostart="1" vmid="102"/>
    <pvevm autostart="1" vmid="110"/>
    <pvevm autostart="1" vmid="113"/>
    <pvevm autostart="1" vmid="112"/>
  </rm>
</cluster>


rgmanager status:
Code:
root@pve-01:/etc/pve# service rgmanager status
rgmanager (pid 3426 3425) is running...
root@pve-01:/etc/pve# ssh pve-02 "service rgmanager status"
rgmanager (pid 3449 3448) is running...
root@pve-01:/etc/pve# ssh pve-03 "service rgmanager status"
rgmanager (pid 3437 3436) is running...
root@pve-01:/etc/pve#


fencetool outputs:
Code:
root@pve-01:/etc/pve# fence_tool ls
fence domain
member count  3
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2 3 


root@pve-01:/etc/pve# ssh pve-02 fence_tool ls
fence domain
member count  3
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2 3 


root@pve-01:/etc/pve# ssh pve-03 fence_tool ls
fence domain
member count  3
victim count  0
victim now    0
master nodeid 2
wait state    none
members       1 2 3




Cluster Status:
Code:
root@pve-01:/etc/pve# clustat 
Cluster Status for CLUSTER @ Wed Feb 25 08:20:16 2015
Member Status: Quorate


 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 pve-01                                                              1 Online, Local
 pve-02                                                              2 Online
 pve-03                                                              3 Online


root@pve-01:/etc/pve# ssh pve-02 clustat 
Cluster Status for CLUSTER @ Wed Feb 25 08:20:24 2015
Member Status: Quorate


 Member Name                             ID   Status
 ------ ----                             ---- ------
 pve-01                                      1 Online
 pve-02                                      2 Online, Local
 pve-03                                      3 Online


root@pve-01:/etc/pve# ssh pve-03 clustat 
Cluster Status for CLUSTER @ Wed Feb 25 08:20:30 2015
Member Status: Quorate


 Member Name                             ID   Status
 ------ ----                             ---- ------
 pve-01                                      1 Online
 pve-02                                      2 Online
 pve-03                                      3 Online, Local
 
Okay, I think I figured it out. In their infinite wisdom, my orgs networking team must have disabled multicasting and not told me... After reading this thread I tested multicast and found that none of my nodes are replying. http://comments.gmane.org/gmane.linux.redhat.cluster/21911 On a side note, my cluster config is wrong as well. I needed to add cipher=1 for fence-node to work.After I sort out multicasting I'll post back.
 
The issue was with the Cluster Firewall. When I turned it off multicast started working. I thought that local_network group membership automatically configure all necessary services? Regardless, when I added an accept all rule to the cluster firewall for the local_network group then restarted the cluster services HA started working again.