After a lot of reboots, change confs, random fenced nodes and deleting-reinstalling nodes i came up with a 4 nodes cluster with HA issues.
I would love to avoid any other stopping and starting of kvms.
Let me count the problems.
1. clustat at nodes show different results. (i removed ids where all nodes agree about state and node)
clustat output is the same as
node3=node7
node2=node4
no info about 102 115 608
2. I can migrate HA kvms between node3-node7 and node2-node4. Then all nodes will receive the new node running the kvm.
I cant for example migrate from node7 to node4
Rgmannager is running at all nodes. No ntp problems
3. I have problems starting a new KVM with HA. Two nodes will get the change, the other two will not. Plus the HA start takes around 4 minutes.
4.Many times i found out after a stop and start of a HA kvm, 2 nodes running the same KVM leading to ext4 corruption on the disk
5. Updating the conf(just adding a new HA kvm), changes made at rm fail to activated at node3 and node7
No configuration for 101-115-608 ids. node2 and node4 have the correct HA conf
I would love to avoid any other stopping and starting of kvms.
Let me count the problems.
1. clustat at nodes show different results. (i removed ids where all nodes agree about state and node)
clustat output is the same as
node3=node7
node2=node4
Code:
root@node3:~# clustat
Cluster Status for cluster @ Fri Aug 1 11:52:13 2014
Member Status: Quorate
node7 1 Online, rgmanager
node2 2 Online, rgmanager
node3 3 Online, Local, rgmanager
node4 4 Online
pvevm:101 (node7) failed
pvevm:115 node7 started
pvevm:608 (unknown) disabled
Code:
root@node4:~# clustat
Cluster Status for cluster @ Fri Aug 1 11:54:30 2014
Member Status: Quorate
node7 1 Online, rgmanager
node2 2 Online, rgmanager
node3 3 Online, rgmanager
node4 4 Online, Local, rgmanager
no info about 102 115 608
2. I can migrate HA kvms between node3-node7 and node2-node4. Then all nodes will receive the new node running the kvm.
I cant for example migrate from node7 to node4
Code:
Executing HA migrate for VM 196 to node node4
Trying to migrate pvevm:196 to node4...Target node dead / nonexistent
TASK ERROR: command 'clusvcadm -M pvevm:196 -m node4' failed: exit code 244
Rgmannager is running at all nodes. No ntp problems
3. I have problems starting a new KVM with HA. Two nodes will get the change, the other two will not. Plus the HA start takes around 4 minutes.
4.Many times i found out after a stop and start of a HA kvm, 2 nodes running the same KVM leading to ext4 corruption on the disk
5. Updating the conf(just adding a new HA kvm), changes made at rm fail to activated at node3 and node7
Code:
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
Code:
root@node4:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="237" name="cluster">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<totem token="54000" window_size="150"/>
<clusternodes>
<clusternode name="node2" nodeid="2" votes="1">
<fence>
<method name="1">
<device action="off" name="fence010"/>
</method>
</fence>
</clusternode>
<clusternode name="node3" nodeid="3" votes="1">
<fence>
<method name="1">
<device action="off" name="fence012"/>
</method>
</fence>
</clusternode>
<clusternode name="node4" nodeid="4" votes="1">
<fence>
<method name="1">
<device action="off" name="fence014"/>
</method>
</fence>
</clusternode>
<clusternode name="node7" nodeid="1" votes="1">
<fence>
<method name="1">
<device action="off" name="fence008"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm status_child_max="20">
<pvevm autostart="1" vmid="110"/>
----------------------------------------------------------- there is no conf for 101-115-608
<pvevm autostart="1" vmid="600"/>
</rm>
</cluster>
No configuration for 101-115-608 ids. node2 and node4 have the correct HA conf
Code:
root@node4:~# cman_tool status
Version: 6.2.0
Config Version: 237
Cluster Name: cluster
Cluster Id: 13364
Cluster Member: Yes
Cluster Generation: 612
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 6
Flags:
Ports Bound: 0 177
Node name: node4
Node ID: 4
Multicast addresses: 239.192.52.104
Node addresses: 10.0.0.4
Code:
root@node4:~# fence_tool ls
fence domain
member count 4
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3 4