cman keeps crashing

Do you have some spare machine to setup a third cluster? I wonder if the same happens on a newly installed system?
 
Here logs:

Apr 11 12:42:52 terrance corosync[5298]: [TOTEM ] FAILED TO RECEIVE
Apr 11 12:42:54 terrance pmxcfs[5214]: [quorum] crit: quorum_dispatch failed: 2
Apr 11 12:42:54 terrance fenced[5352]: cluster is down, exiting
Apr 11 12:42:54 terrance fenced[5352]: daemon cpg_dispatch error 2
Apr 11 12:42:54 terrance dlm_controld[5366]: cluster is down, exiting
Apr 11 12:42:54 terrance dlm_controld[5366]: daemon cpg_dispatch error 2
Apr 11 12:42:54 terrance pmxcfs[5214]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Apr 11 12:42:54 terrance pmxcfs[5214]: [confdb] crit: confdb_dispatch failed: 2
Apr 11 12:42:56 terrance pmxcfs[5214]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Apr 11 12:42:56 terrance pmxcfs[5214]: [dcdb] crit: cpg_dispatch failed: 2
Apr 11 12:42:56 terrance kernel: dlm: closing connection to node 2
Apr 11 12:42:56 terrance kernel: dlm: closing connection to node 1
Apr 11 12:42:58 terrance pmxcfs[5214]: [dcdb] crit: cpg_leave failed: 2
Apr 11 12:43:00 terrance pmxcfs[5214]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Apr 11 12:43:00 terrance pmxcfs[5214]: [dcdb] crit: cpg_dispatch failed: 2
Apr 11 12:43:02 terrance pmxcfs[5214]: [dcdb] crit: cpg_leave failed: 2
Apr 11 12:43:04 terrance pmxcfs[5214]: [libqb] warning: epoll_ctl(del): Bad file descriptor (9)
Apr 11 12:43:04 terrance pmxcfs[5214]: [quorum] crit: quorum_initialize failed: 6
Apr 11 12:43:04 terrance pmxcfs[5214]: [quorum] crit: can't initialize service
Apr 11 12:43:04 terrance pmxcfs[5214]: [confdb] crit: confdb_initialize failed: 6
Apr 11 12:43:04 terrance pmxcfs[5214]: [quorum] crit: can't initialize service
Apr 11 12:43:04 terrance pmxcfs[5214]: [dcdb] notice: start cluster connection
Apr 11 12:43:04 terrance pmxcfs[5214]: [dcdb] crit: cpg_initialize failed: 6
Apr 11 12:43:04 terrance pmxcfs[5214]: [quorum] crit: can't initialize service
Apr 11 12:43:06 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 2
Apr 11 12:43:06 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 2
Apr 11 12:43:06 terrance pmxcfs[5214]: [dcdb] notice: start cluster connection
Apr 11 12:43:06 terrance pmxcfs[5214]: [dcdb] crit: cpg_initialize failed: 6
Apr 11 12:43:06 terrance pmxcfs[5214]: [quorum] crit: can't initialize service
Apr 11 12:43:06 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:06 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:06 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:06 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:12 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:12 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:12 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:12 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:12 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:12 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:22 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:22 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:22 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:22 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:22 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:22 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:32 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:32 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:32 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:32 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:32 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
Apr 11 12:43:32 terrance pmxcfs[5214]: [status] crit: cpg_send_message failed: 9
 
Ok, got it, if i disable iptables, it seems to work as expected...

But, can't find working rules for iptables, here what i have:

iptables -P INPUT DROP
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# Corosync
iptables -A INPUT -p udp --dport 5404 -j ACCEPT
iptables -A INPUT -p udp --dport 5405 -j ACCEPT
iptables -A INPUT -p udp --dst 239.192.12.206 --dport 5405 -j ACCEPT

Am i missing something ?
 
>iptables -A INPUT -p udp --dst 239.192.12.206 --dport 5405 -j ACCEPT

Ok, it works with the above line, i was wrong :)
 
Mar 25 00:22:00 novaprospekt corosync[210647]: [TOTEM ] FAILED TO RECEIVE

mdevilz, i read in a corosync ml post that this line imply a problem with multicast on your network... For me, it was iptables...
 
Ok, fail this morning, i just disable iptables to see if it fails without...

Here my rules:

#####################################################################
# RAZ
iptables -F
iptables -t nat -F


iptables -P INPUT DROP
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT


iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT


iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p icmp -j ACCEPT

# ACCEPT connections between cluster nodes
iptables -A INPUT -s 194.253.148.3 -j ACCEPT
iptables -A INPUT -s 194.253.148.17 -j ACCEPT
iptables -A INPUT -s 194.253.148.36 -j ACCEPT

# proxmox admin web
iptables -A INPUT -p tcp -s 194.253.148.0/24 --dport 443 -j ACCEPT
iptables -A INPUT -p tcp -s 194.253.148.0/24 --dport 80 -j ACCEPT
iptables -A INPUT -p tcp -s 194.253.148.0/24 --dport 8006 -j ACCEPT
# ssh
iptables -A INPUT -p tcp -s 194.253.148.0/24 --dport 22 -j ACCEPT
# Corosync
iptables -A INPUT -p udp --dst 239.192.12.206 --dport 5405 -j ACCEPT

#####################################################################

Am i missing something ?
 
Last edited:
You tested with the iptables command I suggested?

iptables -I INPUT -p udp -m state --state NEW -m multiport --dports 5404,5405 -j ACCEPT

is, i think, equivalent to:

# ACCEPT connections between cluster nodes
iptables -A INPUT -s 194.253.148.3 -j ACCEPT
iptables -A INPUT -s 194.253.148.17 -j ACCEPT
iptables -A INPUT -s 194.253.148.36 -j ACCEPT

I'm looking at this line:
iptables -A INPUT -p udp --dst 239.192.12.206 --dport 5405 -j ACCEPT

i think it is wrong...

I have replaced it with:
iptables -A INPUT -m pkttype --pkt-type multicast -j ACCEPT

No problem for now, will post a comment if it really fix my iptables issue.