Cannot get second node to join cluster anymore due to "stateful merge" attempt

SoaringMonchi · Sep 11, 2013

Hi all,

I have recently setup a Proxmox based HA cluster with two nodes, IPMI based fencing and no quorum disk.
While things worked nicely during my first tests, to my very annoyance, it blew up yesterday night when I did another test of shutting down the network interface on my secondary node.

The node was fenced as expected and came back online. Unfortunately, this resulted in an immediate fencing of the other node.
This went back and forth until I manually powered of node 2 and let node 1 a few minutes to settle down.

Now when I switch node 2 back on, it still won't join the cluster, but is instead fenced by node 1 whenever it tries to.
I have purposely set the post_join_delay to a high value, but it didn't help.

Here are my cluster.conf and logs. I really hope someone has an idea. Any help is very much appreciated.
My own guess would be that the problem is associated with the fact that the node tries to do a stateful merge, when it really should be joining without state after a clean reboot. (see fence_tool dump line 9).

Code:

root@rmg-de-1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="14" name="rmg-de-cl1">
  <cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
  <fencedevices>
    <fencedevice agent="fence_ipmilan" ipaddr="10.xx.xx.11" login="FENCING" name="fenceNode1" passwd="abc"/>
    <fencedevice agent="fence_ipmilan" ipaddr="10.xx.xx.12" login="FENCING" name="fenceNode2" passwd="abc"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="rmg-de-1" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceNode1"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="rmg-de-2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device action="reboot" name="fenceNode2"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fence_daemon post_join_delay="360" />
  <rm>
    <pvevm autostart="1" vmid="101"/>
    <pvevm autostart="1" vmid="100"/>
    <pvevm autostart="1" vmid="104"/>
    <pvevm autostart="1" vmid="103"/>
    <pvevm autostart="1" vmid="102"/>
  </rm>
</cluster>

Code:

root@rmg-de-1:~# fence_tool dump | tail -n 40
1378890849 daemon node 1 max 1.1.1.0 run 1.1.1.1
1378890849 daemon node 1 join 1378855487 left 0 local quorum 1378855487
1378890849 receive_start 1:12 len 152
1378890849 match_change 1:12 matches cg 12
1378890849 wait_messages cg 12 need 1 of 2
1378890850 receive_protocol from 2 max 1.1.1.0 run 1.1.1.1
1378890850 daemon node 2 max 0.0.0.0 run 0.0.0.0
1378890850 daemon node 2 join 1378890849 left 1378859110 local quorum 1378855487
1378890850 daemon node 2 stateful merge
1378890850 daemon node 2 kill due to stateful merge
1378890850 telling cman to remove nodeid 2 from cluster
1378890862 cluster node 2 removed seq 832
1378890862 fenced:daemon conf 1 0 1 memb 1 join left 2
1378890862 fenced:daemon ring 1:832 1 memb 1
1378890862 fenced:default conf 1 0 1 memb 1 join left 2
1378890862 add_change cg 13 remove nodeid 2 reason 3
1378890862 add_change cg 13 m 1 j 0 r 1 f 1
1378890862 add_victims node 2
1378890862 check_ringid cluster 832 cpg 1:828
1378890862 fenced:default ring 1:832 1 memb 1
1378890862 check_ringid done cluster 832 cpg 1:832
1378890862 check_quorum done
1378890862 send_start 1:13 flags 2 started 6 m 1 j 0 r 1 f 1
1378890862 cpg_mcast_joined retried 1 start
1378890862 receive_start 1:13 len 152
1378890862 match_change 1:13 skip cg 12 already start
1378890862 match_change 1:13 matches cg 13
1378890862 wait_messages cg 13 got all 1
1378890862 set_master from 1 to complete node 1
1378890862 delay post_join_delay 360 quorate_from_last_update 0
1378891222 delay of 360s leaves 1 victims
1378891222 rmg-de-2 not a cluster member after 360 sec post_join_delay
1378891222 fencing node rmg-de-2
1378891236 fence rmg-de-2 dev 0.0 agent fence_ipmilan result: success
1378891236 fence rmg-de-2 success
1378891236 send_victim_done cg 13 flags 2 victim nodeid 2
1378891236 send_complete 1:13 flags 2 started 6 m 1 j 0 r 1 f 1
1378891236 receive_victim_done 1:13 flags 2 len 80
1378891236 receive_victim_done 1:13 remove victim 2 time 1378891236 how 1
1378891236 receive_complete 1:13 len 152

Code:

root@rmg-de-1:~# tail -n 100 /var/log/cluster/corosync.log
Sep 11 11:14:09 corosync [CLM   ] CLM CONFIGURATION CHANGE
Sep 11 11:14:09 corosync [CLM   ] New Configuration:
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.1)
Sep 11 11:14:09 corosync [CLM   ] Members Left:
Sep 11 11:14:09 corosync [CLM   ] Members Joined:
Sep 11 11:14:09 corosync [CLM   ] CLM CONFIGURATION CHANGE
Sep 11 11:14:09 corosync [CLM   ] New Configuration:
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.1)
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.2)
Sep 11 11:14:09 corosync [CLM   ] Members Left:
Sep 11 11:14:09 corosync [CLM   ] Members Joined:
Sep 11 11:14:09 corosync [CLM   ]     r(0) ip(10.xx.xx.2)
Sep 11 11:14:09 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 11 11:14:09 corosync [QUORUM] Members[2]: 1 2
Sep 11 11:14:09 corosync [QUORUM] Members[2]: 1 2
Sep 11 11:14:09 corosync [CPG   ] chosen downlist: sender r(0) ip(10.xx.xx.1) ; members(old:1 left:0)
Sep 11 11:14:09 corosync [MAIN  ] Completed service synchronization, ready to provide service.
Sep 11 11:14:20 corosync [TOTEM ] A processor failed, forming new configuration.
Sep 11 11:14:22 corosync [CLM   ] CLM CONFIGURATION CHANGE
Sep 11 11:14:22 corosync [CLM   ] New Configuration:
Sep 11 11:14:22 corosync [CLM   ]     r(0) ip(10.xx.xx.1)
Sep 11 11:14:22 corosync [CLM   ] Members Left:
Sep 11 11:14:22 corosync [CLM   ]     r(0) ip(10.xx.xx.2)
Sep 11 11:14:22 corosync [CLM   ] Members Joined:
Sep 11 11:14:22 corosync [QUORUM] Members[1]: 1
Sep 11 11:14:22 corosync [CLM   ] CLM CONFIGURATION CHANGE
Sep 11 11:14:22 corosync [CLM   ] New Configuration:
Sep 11 11:14:22 corosync [CLM   ]     r(0) ip(10.xx.xx.1)
Sep 11 11:14:22 corosync [CLM   ] Members Left:
Sep 11 11:14:22 corosync [CLM   ] Members Joined:
Sep 11 11:14:22 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 11 11:14:22 corosync [CPG   ] chosen downlist: sender r(0) ip(10.xx.xx.1) ; members(old:2 left:1)
Sep 11 11:14:22 corosync [MAIN  ] Completed service synchronization, ready to provide service.

EDIT:
As I figured this could be interesting as well, as the stateful/non-stateful check is apparently performed by dlm_controld, here is the output of dlm_tool ls:
Unfortunately I only have the output of the currently operational node, as the other one is fenced very quickly and the logs are hard to retrieve. If someone has an idea however, I'll do my best to provide these as well.

Code:

root@rmg-de-1:~# dlm_tool ls
dlm lockspaces
name          rgmanager
id            0x5231f3eb
flags         0x00000000
change        member 1 joined 0 remove 1 failed 1 seq 12,13
members       1

Many thanks in advance!

dietmar · Sep 11, 2013

SoaringMonchi said:
The node was fenced as expected and came back online. Unfortunately, this resulted in an immediate fencing of the other node.

You set "expected_votes=1", and that can result in above behavior.

I would never use such setup for production server.

SoaringMonchi · Sep 11, 2013

Hi Dietmar,

Thanks for the quick reply.

Would you mind quickly explaining why this can result in such behavior and what my options are without a third physical device available?

dietmar · Sep 11, 2013

SoaringMonchi said:
Would you mind quickly explaining why this can result in such behavior

A single node is allowed to gain quorum. So both nodes can have quorum at the same time, and fence each other.

SoaringMonchi said:
and what my options are without a third physical device available?

Do not allow the notes to automatically boot, and shut them down as fence action (not reboot).
It also helps if you have a central fence device (UPS) which serializes fence request.

I would not use such setup for any production server (use at least 3 nodes, and use a reliable fence device instead of ipmi),

SoaringMonchi · Sep 11, 2013

Hi Dietmar,

Just wanted to thank you for the incredibly quick and helpful response.
I decided to use an off-site quorum disk now, connected via VPN. It works very well and I'm glad that I went with this approach in the end.

I would still be interested in hearing why RedHat / the cluster stack developers provide the two_nodes option if it can result in a fence loop that easily, rendering effectively useless.
If anyone has a minute to explain this, it would be very much appreciated.

Cheers!

dietmar · Sep 11, 2013

SoaringMonchi said:
I would still be interested in hearing why RedHat / the cluster stack developers provide the two_nodes option if it can result in a fence loop that easily, rendering effectively useless.

It's a hack, and it works for many failure scenarios if you use a central fence device like UPS.

Search

Search

Cannot get second node to join cluster anymore due to "stateful merge" attempt

SoaringMonchi

New Member

dietmar

Proxmox Staff Member

SoaringMonchi

New Member

dietmar

Proxmox Staff Member

SoaringMonchi

New Member

dietmar

Proxmox Staff Member