cluster node offline, pmxcfs error

Jorem

New Member
May 28, 2009
24
0
1
The cluster worked fine till I tried new firewall settings. The second node went offline in the control panel. After removing the iptables the node came back online. Tried again with other firewall settings, node offline again and I removed the firewall settings again, but now the second node does not come online again.

I can see the vm's on the second node (from the master node control panel). I can check the load and settings. So it is actually not really offline and I can work with it like when online.

What I tried:
- restart cman
- restart pve-cluster
- restart pve-manager

They are all restarting without errors.

This are the .members files (changed cluster name and ip with *):

{
"nodename": "c1",
"version": 3,
"cluster": { "name": "****-cluster", "version": 4, "nodes": 2, "quorate": 1 },
"nodelist": {
"c1": { "id": 1, "online": 1, "ip": "**.**.***.***"},
"c2": { "id": 2, "online": 0}
}
}

on the second node:

{
"nodename": "c2",
"version": 3,
"cluster": { "name": "****-cluster", "version": 4, "nodes": 2, "quorate": 1 },
"nodelist": {
"c1": { "id": 1, "online": 0},
"c2": { "id": 2, "online": 1, "ip": "**.**.***.***"}
}
}

In the logs I get this error: c1 pmxcfs[686435]: [status] crit: cpg_send_message failed: 9

What do I have to do to get the second node online in the control panel again?
 
I fixed the "problem".

I deleted all the iptable rules. Restarted cman, pve-manager and pve-cluster. Now both servers are up again.

There were probably still some rules in the iptables files that where blocking something from syncing.

The iptables rules that I used, maybe someone has a idea why they are not working/what blocks the sync:

*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A POSTROUTING -s **.**.***.0/24 ! -d **.**.***.0/24 -j MASQUERADE #the ip address of the slave node
COMMIT
*mangle
:FORWARD ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*filter
:FORWARD ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p udp -m udp -i vmbr0 --dport 53 -j ACCEPT
-A INPUT -p tcp -m tcp -i vmbr0 --dport 53 -j ACCEPT
-A INPUT -p udp -m udp -i vmbr0 --dport 22 -j ACCEPT
-A INPUT -p tcp -m tcp -i vmbr0 --dport 22 -j ACCEPT
-A INPUT -p udp -m udp -i vmbr0 --dport 5900:5999 -j ACCEPT
-A INPUT -p tcp -m tcp -i vmbr0 --dport 5900:5999 -j ACCEPT
-A FORWARD -m state -d 79.99.133.0/24 -o vmbr0 --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -s 79.99.133.0/24 -i vmbr0 -j ACCEPT
-A FORWARD -i vmbr0 -o vmbr0 -j ACCEPT
-A FORWARD -o vmbr0 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i vmbr0 -j REJECT --reject-with icmp-port-unreachable
-A INPUT -p udp -m udp -i vmbr0 --dport 443 -j ACCEPT
-A INPUT -p tcp -m tcp -i vmbr0 --dport 443 -j ACCEPT
-A INPUT -p udp -m udp -i vmbr0 --dport 8006 -j ACCEPT
-A INPUT -p tcp -m tcp -i vmbr0 --dport 8006 -j ACCEPT
-A INPUT -p tcp -m tcp -s 192.168.100.0/24 -i vmbr1 -j ACCEPT # LAN ip range
-A INPUT -p udp -m udp -s 192.168.100.0/24 -i vmbr1 -j ACCEPT # LAN ip range
-A INPUT -s **.**.***.*** -j ACCEPT # IP of Master server
-A INPUT -j REJECT
COMMIT