Hello guys,
I'm having some issues with a fresh cluster.
I've set up a 3 node cluster using supermicro servers and iSCSI/LVM storage.
Fencing seems to be working fine.
KVM Live migration works fine untill I configure HA on the VM's. Something is wrong and I cannot seem to locate the problem.
It's all chaotic: HA migration fails with code 250, VM's are automatically "started" by the resource manager but they never come online, etc.
HA doesen't work AT ALL unless I add the VM to HA in "stopped" state and let HA autostart it.
I think it's all tied with "service:ha_test_ip". I used the standard IPMI fencing config from the wiki.
This is my clustat output:
It says failed.
I've tried "clusvcadm -R service:ha_test_ip" but it just says "Local machine trying to restart service:ha_test_ip...Failure" on any of the 3 nodes.
Nothing in the logs.
I'm out of ideas.
If anyone has any input I would very much appreciate it.
Best regards,
Marius
I'm having some issues with a fresh cluster.
I've set up a 3 node cluster using supermicro servers and iSCSI/LVM storage.
Fencing seems to be working fine.
KVM Live migration works fine untill I configure HA on the VM's. Something is wrong and I cannot seem to locate the problem.
It's all chaotic: HA migration fails with code 250, VM's are automatically "started" by the resource manager but they never come online, etc.
HA doesen't work AT ALL unless I add the VM to HA in "stopped" state and let HA autostart it.
I think it's all tied with "service:ha_test_ip". I used the standard IPMI fencing config from the wiki.
Code:
<?xml version="1.0"?>
<cluster config_version="54" name="HM">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.100.0.101" lanplus="1" login="ADMIN" name="ipmi1" passwd="ADMIN" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.100.0.102" lanplus="1" login="ADMIN" name="ipmi2" passwd="ADMIN" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.100.0.103" lanplus="1" login="ADMIN" name="ipmi3" passwd="ADMIN" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="pm01" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="pm02" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="pm03" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="ha_test_ip" recovery="relocate">
<ip address="8.8.8.8"/>
</service>
</rm>
</cluster>
This is my clustat output:
Code:
Cluster Status for HM @ Mon Jun 23 20:40:32 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
pm01 1 Online, Local, rgmanager
pm02 2 Online, rgmanager
pm03 3 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:ha_test_ip (pm03) failed
It says failed.
I've tried "clusvcadm -R service:ha_test_ip" but it just says "Local machine trying to restart service:ha_test_ip...Failure" on any of the 3 nodes.
Nothing in the logs.
I'm out of ideas.
If anyone has any input I would very much appreciate it.
Best regards,
Marius