At last night on all nodes I got message:
or
I don't understand what It happened, but all nodes gave message:
When I did /etc/init.d/cman start, all nodes started reboot by fence daemon.
Could you please explain, what could happened with cluster? network switch don't have any message about errors on ports
Code:
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: client command is 5
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: About to process command
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: command to process is 5
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: Returning command data. length = 0
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: sending reply 40000005 to fd 28
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: read 20 bytes from fd 28
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: client command is 7
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: About to process command
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: command to process is 7
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: get_all_members: allocated new buffer (retsize=1024)
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: get_all_members: retlen = 6600
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: command return code is 15
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: Returning command data. length = 6600
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: sending reply 40000007 to fd 28
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: read 20 bytes from fd 28
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: client command is 91
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: About to process command
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: command to process is 91
Nov 15 06:28:46 cluster-1-1 corosync[3821]: cman killed by node 7 because we were killed by cman_tool or other application
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: command return code is 0
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: Returning command data. length = 24
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] daemon: sending reply 40000091 to fd 28
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] ais: deliver_fn source nodeid = 7, len=34, endian_conv=0
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] ais: deliver_fn source nodeid = 7, len=24, endian_conv=0
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: Message on port 0 is 6
Nov 15 06:28:46 cluster-1-1 corosync[3821]: [CMAN ] memb: got KILL for node 1
root@cluster-1-1:~#
or
Code:
Nov 16 06:25:32 cluster-1-9 pvedailycron[29872]: <root@pam> starting task UPID:cluster-1-9:000074BD:10B16646:56495ABC:aptupdate::root@pam:
Nov 16 06:25:32 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:32 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:32 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:32 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:34 cluster-1-9 pvedailycron[29885]: update new package list: /var/lib/pve-manager/pkgupdates
Nov 16 06:25:36 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:36 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:36 cluster-1-9 pvedailycron[29872]: <root@pam> end task UPID:cluster-1-9:000074BD:10B16646:56495ABC:aptupdate::root@pam: OK
Nov 16 06:25:36 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:36 cluster-1-9 pmxcfs[3369]: [status] crit: cpg_send_message failed: 9
Nov 16 06:25:36 cluster-1-9 postfix/pickup[22710]: EDC4932641F: uid=0 from=<root>
Nov 16 06:25:37 cluster-1-9 postfix/cleanup[29930]: EDC4932641F: message-id=<20151116042536.EDC4932641
I don't understand what It happened, but all nodes gave message:
Code:
root@cluster-1-3:~# clustat
Could not connect to CMAN: No such file or directory
root@cluster-1-3:~#
When I did /etc/init.d/cman start, all nodes started reboot by fence daemon.
Could you please explain, what could happened with cluster? network switch don't have any message about errors on ports