I'm getting the following message for all my HA managed systems. I've tried restarting all the services as well as rebooting each node. It's not just migrating that causing the error - all HA commands have the same failure to connect to the resource manager. The weird thing is that everything regarding the cluster seems to be reporting as okay. Where do I go from here to get HA running again?
error message:
Here's the cluster info:
PVE Version info
PVECM status
/etc/pve/cluster.conf:
rgmanager status:
fencetool outputs:
Cluster Status:
error message:
Code:
Executing HA migrate for VM 112 to node pve-02
Trying to migrate pvevm:112 to pve-02...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -M pvevm:112 -m pve-02' failed: exit code 1
Here's the cluster info:
PVE Version info
Code:
root@pve-01:/etc/pve# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@pve-01:/etc/pve# ssh pve-02 pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@pve-01:/etc/pve# ssh pve-03 pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
root@pve-01:/etc/pve#
PVECM status
Code:
root@pve-01:/etc/pve# pvecm status
Version: 6.2.0
Config Version: 76
Cluster Name: CLUSTER
Cluster Id: 9300
Cluster Member: Yes
Cluster Generation: 1376
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: pve-01
Node ID: 1
Multicast addresses: 239.192.36.120
Node addresses: 10.10.164.9
root@pve-01:/etc/pve# ssh pve-02 pvecm status
Version: 6.2.0
Config Version: 76
Cluster Name: CLUSTER
Cluster Id: 9300
Cluster Member: Yes
Cluster Generation: 1376
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: pve-02
Node ID: 2
Multicast addresses: 239.192.36.120
Node addresses: 10.10.164.10
root@pve-01:/etc/pve# ssh pve-03 pvecm status
Version: 6.2.0
Config Version: 76
Cluster Name: CLUSTER
Cluster Id: 9300
Cluster Member: Yes
Cluster Generation: 1376
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: pve-03
Node ID: 3
Multicast addresses: 239.192.36.120
Node addresses: 10.10.164.11
root@pve-01:/etc/pve#
/etc/pve/cluster.conf:
Code:
<?xml version="1.0"?>
<cluster config_version="76" name="CLUSTER">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.164.19" lanplus="1" login="pve" name="node1" passwd="secret" power_wait="5" shell_timeout="300"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.164.20" lanplus="1" login="pve" name="node2" passwd="secret" power_wait="5" shell_timeout="300"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.164.18" lanplus="1" login="pve" name="node3" passwd="secret" power_wait="5" shell_timeout="300"/>
</fencedevices>
<clusternodes>
<clusternode name="pve-01" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node1"/>
</method>
</fence>
</clusternode>
<clusternode name="pve-02" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node2"/>
</method>
</fence>
</clusternode>
<clusternode name="pve-03" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="101"/>
<pvevm autostart="1" vmid="102"/>
<pvevm autostart="1" vmid="110"/>
<pvevm autostart="1" vmid="113"/>
<pvevm autostart="1" vmid="112"/>
</rm>
</cluster>
rgmanager status:
Code:
root@pve-01:/etc/pve# service rgmanager status
rgmanager (pid 3426 3425) is running...
root@pve-01:/etc/pve# ssh pve-02 "service rgmanager status"
rgmanager (pid 3449 3448) is running...
root@pve-01:/etc/pve# ssh pve-03 "service rgmanager status"
rgmanager (pid 3437 3436) is running...
root@pve-01:/etc/pve#
fencetool outputs:
Code:
root@pve-01:/etc/pve# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3
root@pve-01:/etc/pve# ssh pve-02 fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3
root@pve-01:/etc/pve# ssh pve-03 fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3
Cluster Status:
Code:
root@pve-01:/etc/pve# clustat
Cluster Status for CLUSTER @ Wed Feb 25 08:20:16 2015
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
pve-01 1 Online, Local
pve-02 2 Online
pve-03 3 Online
root@pve-01:/etc/pve# ssh pve-02 clustat
Cluster Status for CLUSTER @ Wed Feb 25 08:20:24 2015
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
pve-01 1 Online
pve-02 2 Online, Local
pve-03 3 Online
root@pve-01:/etc/pve# ssh pve-03 clustat
Cluster Status for CLUSTER @ Wed Feb 25 08:20:30 2015
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
pve-01 1 Online
pve-02 2 Online
pve-03 3 Online, Local