A lot of HA problems

Sakis

Active Member
Aug 14, 2013
121
6
38
After a lot of reboots, change confs, random fenced nodes and deleting-reinstalling nodes i came up with a 4 nodes cluster with HA issues.
I would love to avoid any other stopping and starting of kvms.

Let me count the problems.
1. clustat at nodes show different results. (i removed ids where all nodes agree about state and node)

clustat output is the same as
node3=node7
node2=node4

Code:
root@node3:~# clustat 
Cluster Status for cluster @ Fri Aug  1 11:52:13 2014
Member Status: Quorate

 node7                                         1 Online, rgmanager
 node2                                         2 Online, rgmanager
 node3                                         3 Online, Local, rgmanager
 node4                                         4 Online

 pvevm:101                      (node7)                        failed        
 pvevm:115                      node7                          started       
 pvevm:608                      (unknown)                      disabled


Code:
root@node4:~# clustat 
Cluster Status for cluster @ Fri Aug  1 11:54:30 2014
Member Status: Quorate

 node7                                             1 Online, rgmanager
 node2                                             2 Online, rgmanager
 node3                                             3 Online, rgmanager
 node4                                             4 Online, Local, rgmanager

no info about 102 115 608

2. I can migrate HA kvms between node3-node7 and node2-node4. Then all nodes will receive the new node running the kvm.
I cant for example migrate from node7 to node4

Code:
Executing HA migrate for VM 196 to node node4
Trying to migrate pvevm:196 to node4...Target node dead / nonexistent
TASK ERROR: command 'clusvcadm -M pvevm:196 -m node4' failed: exit code 244

Rgmannager is running at all nodes. No ntp problems

3. I have problems starting a new KVM with HA. Two nodes will get the change, the other two will not. Plus the HA start takes around 4 minutes.

4.Many times i found out after a stop and start of a HA kvm, 2 nodes running the same KVM leading to ext4 corruption on the disk

5. Updating the conf(just adding a new HA kvm), changes made at rm fail to activated at node3 and node7

Code:
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1


Code:
root@node4:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="237" name="cluster">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
  <totem token="54000" window_size="150"/>
  <clusternodes>
    <clusternode name="node2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fence010"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node3" nodeid="3" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fence012"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node4" nodeid="4" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fence014"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="node7" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="off" name="fence008"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm status_child_max="20">
    <pvevm autostart="1" vmid="110"/>
----------------------------------------------------------- there is no conf for 101-115-608
    <pvevm autostart="1" vmid="600"/>
  </rm>
</cluster>

No configuration for 101-115-608 ids. node2 and node4 have the correct HA conf
Code:
root@node4:~# cman_tool status
Version: 6.2.0
Config Version: 237
Cluster Name: cluster
Cluster Id: 13364
Cluster Member: Yes
Cluster Generation: 612
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3  
Active subsystems: 6
Flags: 
Ports Bound: 0 177  
Node name: node4
Node ID: 4
Multicast addresses: 239.192.52.104 
Node addresses: 10.0.0.4

Code:
root@node4:~# fence_tool ls
fence domain
member count  4
victim count  0
victim now    0
master nodeid 1
wait state    none
members       1 2 3 4
 
I have passwords there and server info i dont want to reveal.
I should say i skipped these lines.
Fencing works like a charm when is activated.

Let me add also some log for no5 issue. When i add or delete a HA service and activate the new conf

node2 and node4

Code:
Aug  1 13:15:47 node2 pmxcfs[3786]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Aug  1 13:15:47 node2 corosync[4048]:   [QUORUM] Members[4]: 1 2 3 4
Aug  1 13:15:47 node2 pmxcfs[3786]: [status] notice: update cluster info (cluster name  cluster, version = 238)
Aug  1 13:15:47 node2 rgmanager[4412]: Status Child Max set to 20
Aug  1 13:15:47 node2 rgmanager[4412]: Reconfiguring
Aug  1 13:15:47 node2 rgmanager[4412]: Loading Service Data
Aug  1 13:15:52 node2 rgmanager[4412]: Stopping changed resources.
Aug  1 13:15:52 node2 rgmanager[4412]: Restarting changed resources.
Aug  1 13:15:52 node2 rgmanager[4412]: Starting changed resources.

Code:
Aug  1 13:15:47 node4 pmxcfs[4026]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Aug  1 13:15:47 node4 corosync[4552]:   [QUORUM] Members[4]: 1 2 3 4
Aug  1 13:15:47 node4 pmxcfs[4026]: [status] notice: update cluster info (cluster name  cluster, version = 238)
Aug  1 13:15:47 node4 rgmanager[31885]: Status Child Max set to 20
Aug  1 13:15:47 node4 rgmanager[31885]: Reconfiguring
Aug  1 13:15:47 node4 rgmanager[31885]: Loading Service Data
Aug  1 13:15:51 node4 rgmanager[31885]: Stopping changed resources.
Aug  1 13:15:51 node4 rgmanager[31885]: Restarting changed resources.
Aug  1 13:15:51 node4 rgmanager[31885]: Starting changed resources.


On the other hand node3 and node7

Code:
Aug  1 13:15:47 node3 pmxcfs[3775]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Aug  1 13:15:47 node3 corosync[4349]:   [QUORUM] Members[4]: 1 2 3 4
Aug  1 13:15:47 node3 pmxcfs[3775]: [status] notice: update cluster info (cluster name  cluster, version = 238)
Code:
Aug  1 13:15:47 node7 pmxcfs[3783]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Aug  1 13:15:47 node7 corosync[4026]:   [QUORUM] Members[4]: 1 2 3 4
Aug  1 13:15:47 node7 pmxcfs[3783]: [status] notice: update cluster info (cluster name  cluster, version = 238)

Any thoughts except restarting rgmannager?
 
I think you have some type of cluster communication issues. Can you provide more detail as to how your cluster network is setup? Are all these nodes on the same switch? Would be worth posting your hosts file to from each node.
 
These are quite general questions that i dont seem to be relevant.
So host files are all the same so i can join nodes without the need of reboot each time

this is a sample. 10.0.0.1 node1 to 10.0.0.33 node33 is the same in all nodes.

Code:
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1    localhost.localdomain localhost

10.0.0.1 node1
10.0.0.2 node2
10.0.0.3 node3
10.0.0.4 node4
10.0.0.5 node5
10.0.0.6 node6
10.0.0.7 node7
10.0.0.8 node8
10.0.0.9 node9
10.0.0.10 node10
10.0.0.11 node11
10.0.0.12 node12
10.0.0.13 node13
10.0.0.14 node14
10.0.0.15 node15
10.0.0.16 node16
10.0.0.17 node17
10.0.0.18 node18
10.0.0.19 node19
10.0.0.20 node20
10.0.0.21 node21
10.0.0.22 node22
10.0.0.23 node23
10.0.0.24 node24
10.0.0.25 node25
10.0.0.26 node26
10.0.0.27 node27
10.0.0.28 node28
10.0.0.29 node29
10.0.0.30 node30
10.0.0.31 node31
10.0.0.32 node32
10.0.0.33 node33

public.ip    hostname.domain    hostname
Code:
/etc/hostname = nodeXX

Code:
/etc/resolv.conf
nameserver 127.0.0.1
nameserver providers.ip
nameserver 8.8.8.8


I cant know if we are in the same switch. I guess not. These are rented dedicated servers. We have though a private vlan to communicate.

Nerwork setup is like this

Code:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address public-ip
        netmask x.x.x.x
        network x.x.x.x
        broadcast x.x.x.x
        gateway x.x.x.x

auto eth1
iface eth1 inet static
address 0.0.0.0
netmask 0.0.0.0

auto vmbr1
iface vmbr1 inet manual
        bridge_ports dummy0
        bridge_stp off
        bridge_fd 0
        post-up /etc/pve/kvm-networking.sh

auto vmbr0
iface vmbr0 inet static
address 0.0.0.0
netmask 0.0.0.0
bridge_ports eth1
bridge_stp off
bridge_fd 0

auto vmbr0:1
iface vmbr0:1 inet static
        address  10.0.0.3
        netmask  255.255.255.0
        broadcast  10.0.0.255
        network  10.0.0.0

I use vmbr0 for KVMs. At the vlan eth1, my datacenter routes my ripe blocks that i assigne at KVMs.

These i think have nothing to do with the ΗΑ problems.
 
Code:
auto vmbr0
iface vmbr0 inet static
address 0.0.0.0
netmask 0.0.0.0
bridge_ports eth1
bridge_stp off
bridge_fd 0

auto vmbr0:1
iface vmbr0:1 inet static
        address  10.0.0.3
        netmask  255.255.255.0
        broadcast  10.0.0.255
        network  10.0.0.0

I think the above is the problem. I do not think using logical nics is a god idea. I also notice that the interface of the host share IP with one of the nodes. Try replacing the above with this:

Code:
auto vmbr0
iface vmbr0 inet static
address 10.0.0.254
netmask 255.255.255.0
bridge_ports eth1
bridge_stp off
bridge_fd 0
 
Code:
auto vmbr0
iface vmbr0 inet static
address 0.0.0.0
netmask 0.0.0.0
bridge_ports eth1
bridge_stp off
bridge_fd 0

auto vmbr0:1
iface vmbr0:1 inet static
        address  10.0.0.3
        netmask  255.255.255.0
        broadcast  10.0.0.255
        network  10.0.0.0

I think the above is the problem. I do not think using logical nics is a god idea. I also notice that the interface of the host share IP with one of the nodes. Try replacing the above with this:

Code:
auto vmbr0
iface vmbr0 inet static
address 10.0.0.254
netmask 255.255.255.0
bridge_ports eth1
bridge_stp off
bridge_fd 0

I agree completely, I would avoid the virtual or logical nics as much as possible. I would also want to confirm that they are indeed allowing multicast on all of their switches. '

All of these questions are very relevant to your issue.
 
After all i stopped problematic kvms. Restarted rgmannager and everything went good. It was the last option i had and this i did. All is perfect again. No problems at all.

I am testing now the different network configurations you propose.

If i delete auto vmbr0:1 and change vmrb0 as you suggest this host will loose connection with other nodes. This is the private ip the node communicate with the cluster. its node3 just like i have in hosts file.
In my eth1 i have to have the vmbr0:1 10.0.0.x ips of the cluster and the vmbr0 to assing public ips at the kvms through private vlan.
I dont see another way of doing this than the way i did it. This i because i had my cluster with broadcast and it was the only way i found to achive this with one physical ethernet.

But now i use multicast network this configuration maybe cause me problems. I will change it as you proposed. Thank you.