Hi all,
I am in the test stage of using proxmox.
I am running 3 servers in a cluster with HA.
It was working fine for a few days but then I made some changes to my switches.
Now I can no longer use HA with a test VM. For example a migrate fails with:
"Executing HA migrate for VM 100 to node proxmox1
Trying to migrate pvevm:100 to proxmox1...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -M pvevm:100 -m proxmox1' failed: exit code 1"
rgmanager is running on each server.
what is odd is that when i run clutstat I do not see it showing up.
I tried rebooting each server, one by one, but that did not help.
I have removed the test VM from being a HA VM and everything works fine including migration.
Here is some information from server1 which is identical to the others.
root@proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 8
Cluster Name: xx-Srv-Cluster
Cluster Id: 28852
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.112.37
Node addresses: 10.180.1.100
root@proxmox1:~# clustat
Cluster Status for xx-Srv-Cluster @ Tue Dec 4 19:28:26 2012
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
proxmox1 1 Online, Local
proxmox2 2 Online
proxmox3 3 Online
root@proxmox1:~# clustat -x
<?xml version="1.0"?>
<clustat version="4.1.1">
<cluster name="xx-Srv-Cluster" id="28852" generation="76"/>
<quorum quorate="1" groupmember="0"/>
<nodes>
<node name="proxmox1" state="1" local="1" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000001"/>
<node name="proxmox2" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000002"/>
<node name="proxmox3" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000003"/>
</nodes>
</clustat>
root@proxmox1:~# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3
root@proxmox1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="8" name="xx-Srv-Cluster">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.100" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.101" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.102" lanplus="1" login="ADMIN" name="ipmi3" passwd="xxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="proxmox1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm/>
</cluster>
root@proxmox1:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1
When attempting to use /etc/init.d/rgmanager things just hang. Unless I kill rgmanager and then start it fresh.
root@proxmox1:~# /etc/init.d/rgmanager status
rgmanager (pid 67073 67071) is running...
root@proxmox1:~# ps auxwww| grep rgmanager
root 67071 0.0 0.0 32320 5716 ? S<Ls 19:08 0:00 rgmanager
root 67073 0.0 0.0 41096 1780 ? S<l 19:08 0:00 rgmanager
root@proxmox1:~# /etc/init.d/rgmanager restart
Stopping Cluster Service Manager: <<<<<<<<<<<<<<--- hangs
Please tell me if anyone requires more information to help or to take a stab at what I should try.
My logs in /var/log are just not giving any clue. It is like rgmanager is running but not doing a darn
thing.
I realize running HA requires a redundant and stable network but it would not be wise of me not to
mess around at this stage. And yes, my network is redundant except for multipath which I installed
but have not configured yet. Otherwise fencing is working, two switches, multicast tested, equallogic
iscsi with two cards in the chassis, minimum of 3 servers, multicast on its own vlan with a dedicated
nic, dual power supplies in each server, etc...
I am at the point I needed to post here for advice. Seems like a couple of others have run into a similar problem.
See this post: http://forum.proxmox.com/threads/9962-rgmanager-running-per-cli-but-not-pve?p=55904#post55904
thanks,
matt
I am in the test stage of using proxmox.
I am running 3 servers in a cluster with HA.
It was working fine for a few days but then I made some changes to my switches.
Now I can no longer use HA with a test VM. For example a migrate fails with:
"Executing HA migrate for VM 100 to node proxmox1
Trying to migrate pvevm:100 to proxmox1...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -M pvevm:100 -m proxmox1' failed: exit code 1"
rgmanager is running on each server.
what is odd is that when i run clutstat I do not see it showing up.
I tried rebooting each server, one by one, but that did not help.
I have removed the test VM from being a HA VM and everything works fine including migration.
Here is some information from server1 which is identical to the others.
root@proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 8
Cluster Name: xx-Srv-Cluster
Cluster Id: 28852
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.112.37
Node addresses: 10.180.1.100
root@proxmox1:~# clustat
Cluster Status for xx-Srv-Cluster @ Tue Dec 4 19:28:26 2012
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
proxmox1 1 Online, Local
proxmox2 2 Online
proxmox3 3 Online
root@proxmox1:~# clustat -x
<?xml version="1.0"?>
<clustat version="4.1.1">
<cluster name="xx-Srv-Cluster" id="28852" generation="76"/>
<quorum quorate="1" groupmember="0"/>
<nodes>
<node name="proxmox1" state="1" local="1" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000001"/>
<node name="proxmox2" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000002"/>
<node name="proxmox3" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000003"/>
</nodes>
</clustat>
root@proxmox1:~# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3
root@proxmox1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="8" name="xx-Srv-Cluster">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.100" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.101" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.102" lanplus="1" login="ADMIN" name="ipmi3" passwd="xxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="proxmox1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm/>
</cluster>
root@proxmox1:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1
When attempting to use /etc/init.d/rgmanager things just hang. Unless I kill rgmanager and then start it fresh.
root@proxmox1:~# /etc/init.d/rgmanager status
rgmanager (pid 67073 67071) is running...
root@proxmox1:~# ps auxwww| grep rgmanager
root 67071 0.0 0.0 32320 5716 ? S<Ls 19:08 0:00 rgmanager
root 67073 0.0 0.0 41096 1780 ? S<l 19:08 0:00 rgmanager
root@proxmox1:~# /etc/init.d/rgmanager restart
Stopping Cluster Service Manager: <<<<<<<<<<<<<<--- hangs
Please tell me if anyone requires more information to help or to take a stab at what I should try.
My logs in /var/log are just not giving any clue. It is like rgmanager is running but not doing a darn
thing.
I realize running HA requires a redundant and stable network but it would not be wise of me not to
mess around at this stage. And yes, my network is redundant except for multipath which I installed
but have not configured yet. Otherwise fencing is working, two switches, multicast tested, equallogic
iscsi with two cards in the chassis, minimum of 3 servers, multicast on its own vlan with a dedicated
nic, dual power supplies in each server, etc...
I am at the point I needed to post here for advice. Seems like a couple of others have run into a similar problem.
See this post: http://forum.proxmox.com/threads/9962-rgmanager-running-per-cli-but-not-pve?p=55904#post55904
thanks,
matt