RGManager won't start

seem that rgmanager is haning , waiting messages from other nodes

that why the /var/run/cluster/rgmanager.sk file is not yet created

Code:
Apr 03 18:02:47 dlm_controld dlm_controld 1364188437 started
Apr 03 18:02:47 dlm_controld found /dev/misc/dlm-control minor 57
Apr 03 18:02:47 dlm_controld found /dev/misc/dlm-monitor minor 56
Apr 03 18:02:47 dlm_controld found /dev/misc/dlm_plock minor 55
Apr 03 18:02:47 dlm_controld /dev/misc/dlm-monitor fd 12
Apr 03 18:02:47 dlm_controld /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
Apr 03 18:02:47 dlm_controld /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
Apr 03 18:02:47 dlm_controld cluster node 1 added seq 316
Apr 03 18:02:47 dlm_controld set_configfs_node 1 192.168.100.230 local 0
Apr 03 18:02:47 dlm_controld cluster node 2 added seq 316
Apr 03 18:02:47 dlm_controld set_configfs_node 2 192.168.100.232 local 0
Apr 03 18:02:47 dlm_controld cluster node 3 added seq 316
Apr 03 18:02:47 dlm_controld set_configfs_node 3 192.168.100.231 local 1
Apr 03 18:02:47 dlm_controld totem/rrp_mode = 'none'
Apr 03 18:02:47 dlm_controld set protocol 0
Apr 03 18:02:47 dlm_controld group_mode 3 compat 0
Apr 03 18:02:47 dlm_controld setup_cpg_daemon 14
Apr 03 18:02:47 dlm_controld dlm:controld conf 3 1 0 memb 1 2 3 join 3 left
Apr 03 18:02:47 dlm_controld run protocol from nodeid 1
Apr 03 18:02:47 dlm_controld daemon run 1.1.1 max 1.1.1 kernel run 1.1.1 max 1.1.1
Apr 03 18:02:47 dlm_controld plocks 16
Apr 03 18:02:47 dlm_controld plock cpg message size: 104 bytes
Apr 03 18:04:20 dlm_controld uevent: add@/kernel/dlm/rgmanager
Apr 03 18:04:20 dlm_controld kernel: add@ rgmanager
Apr 03 18:04:20 dlm_controld uevent: online@/kernel/dlm/rgmanager
Apr 03 18:04:20 dlm_controld kernel: online@ rgmanager
Apr 03 18:04:20 dlm_controld dlm:ls:rgmanager conf 3 1 0 memb 1 2 3 join 3 left
Apr 03 18:04:20 dlm_controld rgmanager add_change cg 1 joined nodeid 3
Apr 03 18:04:20 dlm_controld rgmanager add_change cg 1 we joined
Apr 03 18:04:20 dlm_controld rgmanager add_change cg 1 counts member 3 joined 1 remove 0 failed 0
Apr 03 18:04:20 dlm_controld rgmanager check_fencing done
Apr 03 18:04:20 dlm_controld rgmanager check_quorum disabled
Apr 03 18:04:20 dlm_controld rgmanager check_fs none registered
Apr 03 18:04:20 dlm_controld rgmanager send_start cg 1 flags 1 data2 0 counts 0 3 1 0 0
Apr 03 18:04:20 dlm_controld rgmanager receive_start 3:1 len 84
Apr 03 18:04:20 dlm_controld rgmanager match_change 3:1 matches cg 1
Apr 03 18:04:20 dlm_controld rgmanager wait_messages cg 1 need 2 of 3

multicast works fine, corosync has quorum.
I'll check with tcpdump/wireshark.
 
Ok, seem finally to be a multicast problem.

I see multicast traffic on my bond0 interface (lacp bonding), but not in vmbr0 (where the ip is configure).

multicast_snooping was disabled on vmbr0.

don't known why I have corosync quorum ?????

I finally move the ip directly on bond0, problem solved.
 
I seem to remember some kind of bug related to multicast_snooping on bridges. Maybe you have been hit by this bug?
I have exactly the same setup as you but working mulitcast over the bridge (also vmbr0)
 
I seem to remember some kind of bug related to multicast_snooping on bridges. Maybe you have been hit by this bug?
I have exactly the same setup as you but working mulitcast over the bridge (also vmbr0)

Well, It was with multicast_snooping disabled on vmbr0 ....
I wornder if it's not a mix of lacp + bridge + multicast problem.

but I don't understand why I had quorum with corosync...

I'll put some debug notes in the wiki, could be usefull for other people :)