Recommended way to reboot a node in a cluster with HA enabled?

konaya

Active Member
Sep 16, 2017
24
0
41
Very occasionally when we reboot a node, the whole cluster reboots. Is there a recommended shutdown procedure we're missing? Is there a way to tell corosync “this node will go offline for a while now, so please don't panic”?
 
Hi,

if this happens you go under the required min votes.
This means your cluster is out of quorum.
You have to fix this before you reboot a node.
HA has requirements and if they are not met, HA cannot work.
 
The cluster in question has seven nodes. Each node has one vote. They were all seemingly in quorum before one node was rebooted and the rest decided to take a dive. Here are the corosync logs for each of the nodes around the incident. That gap between 09:52 and 09:55 is when everything rebooted. The node rebooting is member 4. It looks like its leaving triggers some sort of chain reaction, but I'm no expert. Hoping to find some help here.

Code:
Apr 22 23:32:25 node1 corosync[4668]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 04:14:59 node1 corosync[4668]:   [KNET  ] link: host: 2 link: 0 is down
Apr 23 04:14:59 node1 corosync[4668]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Apr 23 04:15:01 node1 corosync[4668]:   [KNET  ] rx: host: 2 link: 0 is up
Apr 23 04:15:01 node1 corosync[4668]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 23 09:50:59 node1 corosync[4668]:   [TOTEM ] Retransmit List: dcfa2
Apr 23 09:51:00 node1 corosync[4668]:   [KNET  ] link: host: 4 link: 0 is down
Apr 23 09:51:00 node1 corosync[4668]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:00 node1 corosync[4668]:   [TOTEM ] Retransmit List: dcfab
Apr 23 09:51:00 node1 corosync[4668]:   [TOTEM ] Retransmit List: dcfad
Apr 23 09:51:00 node1 corosync[4668]:   [TOTEM ] Retransmit List: dcfb1
Apr 23 09:51:12 node1 corosync[4668]:   [KNET  ] rx: host: 4 link: 0 is up
Apr 23 09:51:12 node1 corosync[4668]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 23 09:51:38 node1 corosync[4668]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node1 corosync[4668]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node1 corosync[4668]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node1 corosync[4668]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node1 corosync[4668]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node1 corosync[4668]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node1 corosync[4668]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node1 corosync[4668]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node1 corosync[4668]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:40 node1 corosync[4668]:   [KNET  ] link: host: 4 link: 0 is down
Apr 23 09:51:40 node1 corosync[4668]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:40 node1 corosync[4668]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 23 09:51:40 node1 corosync[4668]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:51:40 node1 corosync[4668]:   [KNET  ] host: host: 4 (passive) best link: 0 (pri: 1)
Apr 23 09:51:40 node1 corosync[4668]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:52:40 node1 corosync[4668]:   [KNET  ] link: host: 3 link: 0 is down
Apr 23 09:52:40 node1 corosync[4668]:   [KNET  ] link: host: 3 link: 1 is down
Apr 23 09:52:40 node1 corosync[4668]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Apr 23 09:52:40 node1 corosync[4668]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:52:40 node1 corosync[4668]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Apr 23 09:52:40 node1 corosync[4668]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:52:41 node1 corosync[4668]:   [TOTEM ] Token has not been received in 154 ms
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] link: host: 2 link: 0 is down
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] link: host: 2 link: 1 is down
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] link: host: 7 link: 1 is down
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] host: host: 2 has no active links
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] host: host: 2 has no active links
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:52:42 node1 corosync[4668]:   [KNET  ] host: host: 7 has no active links
Apr 23 09:55:33 node1 corosync[4772]:   [KNET  ] rx: host: 7 link: 1 is up
Apr 23 09:55:33 node1 corosync[4772]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:55:33 node1 corosync[4772]:   [TOTEM ] A new membership (1.192) was formed. Members joined: 7
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node1 corosync[4772]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Apr 23 09:55:33 node1 corosync[4772]:   [MAIN  ] Completed service synchronization, ready to provide service.

Code:
Apr 22 23:32:25 node2 corosync[4622]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 04:14:59 node2 corosync[4622]:   [TOTEM ] Retransmit List: 659c4 
Apr 23 04:14:59 node2 corosync[4622]:   [TOTEM ] Retransmit List: 659c5 
Apr 23 08:14:53 node2 corosync[4622]:   [TOTEM ] Retransmit List: bacff 
Apr 23 09:50:59 node2 corosync[4622]:   [TOTEM ] Retransmit List: dcfa2 
Apr 23 09:51:00 node2 corosync[4622]:   [TOTEM ] Retransmit List: dcfab 
Apr 23 09:51:00 node2 corosync[4622]:   [TOTEM ] Retransmit List: dcfad 
Apr 23 09:51:00 node2 corosync[4622]:   [TOTEM ] Retransmit List: dcfb1 
Apr 23 09:51:38 node2 corosync[4622]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node2 corosync[4622]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node2 corosync[4622]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node2 corosync[4622]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node2 corosync[4622]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node2 corosync[4622]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node2 corosync[4622]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node2 corosync[4622]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node2 corosync[4622]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:40 node2 corosync[4622]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:40 node2 corosync[4622]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:40 node2 corosync[4622]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:52:41 node2 corosync[4622]:   [KNET  ] link: host: 3 link: 1 is down
Apr 23 09:52:41 node2 corosync[4622]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
Apr 23 09:52:41 node2 corosync[4622]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:55:32 node2 corosync[4479]:   [KNET  ] rx: host: 7 link: 1 is up
Apr 23 09:55:32 node2 corosync[4479]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:55:33 node2 corosync[4479]:   [KNET  ] pmtud: PMTUD link change for host: 7 link: 1 from 469 to 1397
Apr 23 09:55:33 node2 corosync[4479]:   [TOTEM ] A new membership (1.192) was formed. Members joined: 7
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node2 corosync[4479]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Apr 23 09:55:33 node2 corosync[4479]:   [MAIN  ] Completed service synchronization, ready to provide service.
 
Code:
Apr 22 19:13:39 node3 corosync[4140]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 22 21:46:19 node3 corosync[4140]:   [TOTEM ] Retransmit List: 377fa 
Apr 22 23:28:51 node3 corosync[4140]:   [KNET  ] link: host: 1 link: 0 is down
Apr 22 23:28:51 node3 corosync[4140]:   [KNET  ] host: host: 1 (passive) best link: 1 (pri: 1)
Apr 22 23:28:59 node3 corosync[4140]:   [KNET  ] rx: host: 1 link: 0 is up
Apr 22 23:28:59 node3 corosync[4140]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 22 23:51:07 node3 corosync[4262]:   [TOTEM ] Retransmit List: 6e99 
Apr 23 02:51:06 node3 corosync[4262]:   [TOTEM ] Retransmit List: 47a2c 
Apr 23 04:40:39 node3 corosync[4262]:   [TOTEM ] Retransmit List: 6ec18 
Apr 23 08:42:00 node3 corosync[4262]:   [TOTEM ] Retransmit List: c474a c474b 
Apr 23 09:11:18 node3 corosync[4262]:   [TOTEM ] Retransmit List: cee51 
Apr 23 09:51:38 node3 corosync[4262]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node3 corosync[4262]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node3 corosync[4262]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node3 corosync[4262]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node3 corosync[4262]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node3 corosync[4262]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node3 corosync[4262]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node3 corosync[4262]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node3 corosync[4262]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:40 node3 corosync[4262]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:40 node3 corosync[4262]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:40 node3 corosync[4262]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:55:31 node3 corosync[4378]:   [KNET  ] rx: host: 7 link: 1 is up
Apr 23 09:55:31 node3 corosync[4378]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:55:32 node3 corosync[4378]:   [KNET  ] pmtud: PMTUD link change for host: 7 link: 1 from 469 to 1397
Apr 23 09:55:33 node3 corosync[4378]:   [TOTEM ] A new membership (1.192) was formed. Members joined: 7
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node3 corosync[4378]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Apr 23 09:55:33 node3 corosync[4378]:   [MAIN  ] Completed service synchronization, ready to provide service.

Code:
Apr 22 23:32:25 node4 corosync[4280]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 00:19:58 node4 corosync[4280]:   [TOTEM ] Retransmit List: 11434 
Apr 23 00:19:58 node4 corosync[4280]:   [TOTEM ] Retransmit List: 11435 
Apr 23 09:51:01 node4 corosync[4280]:   [KNET  ] link: host: 1 link: 0 is down
Apr 23 09:51:01 node4 corosync[4280]:   [KNET  ] host: host: 1 (passive) best link: 1 (pri: 1)
Apr 23 09:51:12 node4 corosync[4280]:   [KNET  ] rx: host: 1 link: 0 is up
Apr 23 09:51:12 node4 corosync[4280]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 23 09:54:42 node4 corosync[4511]:   [KNET  ] rx: host: 5 link: 1 is up
Apr 23 09:54:42 node4 corosync[4511]:   [KNET  ] host: host: 5 (passive) best link: 1 (pri: 1)
Apr 23 09:54:42 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 5 link: 1 from 469 to 1397
Apr 23 09:54:42 node4 corosync[4511]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Apr 23 09:54:42 node4 corosync[4511]:   [TOTEM ] A new membership (4.17e) was formed. Members joined: 5
Apr 23 09:54:42 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:54:42 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:54:42 node4 corosync[4511]:   [QUORUM] Members[2]: 4 5
Apr 23 09:54:42 node4 corosync[4511]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:01 node4 corosync[4511]:   [KNET  ] rx: host: 3 link: 1 is up
Apr 23 09:55:01 node4 corosync[4511]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
Apr 23 09:55:01 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 3 link: 1 from 469 to 1397
Apr 23 09:55:01 node4 corosync[4511]:   [TOTEM ] A new membership (3.182) was formed. Members joined: 3
Apr 23 09:55:01 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:01 node4 corosync[4511]: message repeated 2 times: [   [CPG   ] downlist left_list: 0 received]
Apr 23 09:55:01 node4 corosync[4511]:   [QUORUM] Members[3]: 3 4 5
Apr 23 09:55:01 node4 corosync[4511]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:04 node4 corosync[4511]:   [KNET  ] rx: host: 2 link: 1 is up
Apr 23 09:55:04 node4 corosync[4511]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Apr 23 09:55:04 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 469 to 1397
Apr 23 09:55:04 node4 corosync[4511]:   [TOTEM ] A new membership (2.186) was formed. Members joined: 2
Apr 23 09:55:04 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:04 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:04 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:04 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:04 node4 corosync[4511]:   [QUORUM] This node is within the primary component and will provide service.
Apr 23 09:55:04 node4 corosync[4511]:   [QUORUM] Members[4]: 2 3 4 5
Apr 23 09:55:04 node4 corosync[4511]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:08 node4 corosync[4511]:   [KNET  ] rx: host: 1 link: 1 is up
Apr 23 09:55:08 node4 corosync[4511]:   [KNET  ] rx: host: 1 link: 0 is up
Apr 23 09:55:08 node4 corosync[4511]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 23 09:55:08 node4 corosync[4511]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 23 09:55:08 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Apr 23 09:55:08 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 1 from 469 to 1397
Apr 23 09:55:08 node4 corosync[4511]:   [TOTEM ] A new membership (1.18a) was formed. Members joined: 1
Apr 23 09:55:08 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node4 corosync[4511]:   [QUORUM] Members[5]: 1 2 3 4 5
Apr 23 09:55:08 node4 corosync[4511]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:09 node4 corosync[4511]:   [TOTEM ] A new membership (1.18e) was formed. Members joined: 6
Apr 23 09:55:09 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node4 corosync[4511]: message repeated 3 times: [   [CPG   ] downlist left_list: 0 received]
Apr 23 09:55:09 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node4 corosync[4511]:   [QUORUM] Members[6]: 1 2 3 4 5 6
Apr 23 09:55:09 node4 corosync[4511]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:09 node4 corosync[4511]:   [KNET  ] rx: host: 6 link: 1 is up
Apr 23 09:55:09 node4 corosync[4511]:   [KNET  ] host: host: 6 (passive) best link: 1 (pri: 1)
Apr 23 09:55:09 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 6 link: 1 from 469 to 1397
Apr 23 09:55:32 node4 corosync[4511]:   [KNET  ] rx: host: 7 link: 1 is up
Apr 23 09:55:32 node4 corosync[4511]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:55:32 node4 corosync[4511]:   [KNET  ] pmtud: PMTUD link change for host: 7 link: 1 from 469 to 1397
Apr 23 09:55:33 node4 corosync[4511]:   [TOTEM ] A new membership (1.192) was formed. Members joined: 7
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node4 corosync[4511]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Apr 23 09:55:33 node4 corosync[4511]:   [MAIN  ] Completed service synchronization, ready to provide service.
 
Code:
Apr 22 23:32:25 node5 corosync[2241]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf8c 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf8d 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf8e 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf8f 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf90 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf91 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf92 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf93 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf94 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf95 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf96 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf97 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf98 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf99 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf9a 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf9b 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf9c 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf9d 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcf9e 
Apr 23 09:50:59 node5 corosync[2241]:   [TOTEM ] Retransmit List: dcfa3 
Apr 23 09:51:38 node5 corosync[2241]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node5 corosync[2241]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node5 corosync[2241]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node5 corosync[2241]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node5 corosync[2241]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node5 corosync[2241]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node5 corosync[2241]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node5 corosync[2241]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node5 corosync[2241]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:40 node5 corosync[2241]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:40 node5 corosync[2241]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:40 node5 corosync[2241]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:52:40 node5 corosync[2241]:   [KNET  ] link: host: 3 link: 1 is down
Apr 23 09:52:40 node5 corosync[2241]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
Apr 23 09:52:40 node5 corosync[2241]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:52:41 node5 corosync[2241]:   [TOTEM ] Token has not been received in 3187 ms 
Apr 23 09:52:42 node5 corosync[2241]:   [KNET  ] link: host: 2 link: 1 is down
Apr 23 09:52:42 node5 corosync[2241]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Apr 23 09:52:42 node5 corosync[2241]:   [KNET  ] host: host: 2 has no active links
Apr 23 09:52:42 node5 corosync[2241]:   [TOTEM ] A processor failed, forming new configuration.
Apr 23 09:52:43 node5 corosync[2241]:   [KNET  ] link: host: 7 link: 0 is down
Apr 23 09:52:43 node5 corosync[2241]:   [KNET  ] link: host: 7 link: 1 is down
Apr 23 09:52:43 node5 corosync[2241]:   [KNET  ] host: host: 7 (passive) best link: 0 (pri: 1)
Apr 23 09:52:43 node5 corosync[2241]:   [KNET  ] host: host: 7 has no active links
Apr 23 09:52:43 node5 corosync[2241]:   [KNET  ] host: host: 7 (passive) best link: 0 (pri: 1)
Apr 23 09:52:43 node5 corosync[2241]:   [KNET  ] host: host: 7 has no active links
Apr 23 09:55:04 node5 corosync[2536]:   [KNET  ] rx: host: 2 link: 1 is up
Apr 23 09:55:04 node5 corosync[2536]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Apr 23 09:55:04 node5 corosync[2536]:   [KNET  ] pmtud: PMTUD link change for host: 2 link: 1 from 469 to 1397
Apr 23 09:55:05 node5 corosync[2536]:   [TOTEM ] A new membership (2.186) was formed. Members joined: 2
Apr 23 09:55:05 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:05 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:05 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:05 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:05 node5 corosync[2536]:   [QUORUM] This node is within the primary component and will provide service.
Apr 23 09:55:05 node5 corosync[2536]:   [QUORUM] Members[4]: 2 3 4 5
Apr 23 09:55:05 node5 corosync[2536]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] rx: host: 1 link: 0 is up
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] rx: host: 1 link: 1 is up
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] rx: host: 6 link: 1 is up
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] host: host: 6 (passive) best link: 1 (pri: 1)
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] pmtud: PMTUD link change for host: 1 link: 1 from 469 to 1397
Apr 23 09:55:08 node5 corosync[2536]:   [KNET  ] pmtud: PMTUD link change for host: 6 link: 1 from 469 to 1397
Apr 23 09:55:08 node5 corosync[2536]:   [TOTEM ] A new membership (1.18a) was formed. Members joined: 1
Apr 23 09:55:08 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:08 node5 corosync[2536]:   [QUORUM] Members[5]: 1 2 3 4 5
Apr 23 09:55:08 node5 corosync[2536]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:09 node5 corosync[2536]:   [TOTEM ] A new membership (1.18e) was formed. Members joined: 6
Apr 23 09:55:09 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node5 corosync[2536]: message repeated 2 times: [   [CPG   ] downlist left_list: 0 received]
Apr 23 09:55:09 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:09 node5 corosync[2536]:   [QUORUM] Members[6]: 1 2 3 4 5 6
Apr 23 09:55:09 node5 corosync[2536]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:55:32 node5 corosync[2536]:   [KNET  ] rx: host: 7 link: 0 is up
Apr 23 09:55:32 node5 corosync[2536]:   [KNET  ] rx: host: 7 link: 1 is up
Apr 23 09:55:32 node5 corosync[2536]:   [KNET  ] host: host: 7 (passive) best link: 0 (pri: 1)
Apr 23 09:55:32 node5 corosync[2536]:   [KNET  ] host: host: 7 (passive) best link: 0 (pri: 1)
Apr 23 09:55:32 node5 corosync[2536]:   [KNET  ] pmtud: PMTUD link change for host: 7 link: 0 from 469 to 1397
Apr 23 09:55:32 node5 corosync[2536]:   [KNET  ] pmtud: PMTUD link change for host: 7 link: 1 from 469 to 1397
Apr 23 09:55:33 node5 corosync[2536]:   [TOTEM ] A new membership (1.192) was formed. Members joined: 7
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [CPG   ] downlist left_list: 0 received
Apr 23 09:55:33 node5 corosync[2536]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Apr 23 09:55:33 node5 corosync[2536]:   [MAIN  ] Completed service synchronization, ready to provide service.
 
Code:
Apr 22 23:32:25 node6 corosync[4057]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:38 node6 corosync[4057]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node6 corosync[4057]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]: message repeated 2 times: [   [CPG   ] downlist left_list: 1 received]
Apr 23 09:51:38 node6 corosync[4057]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node6 corosync[4057]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node6 corosync[4057]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node6 corosync[4057]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:38 node6 corosync[4057]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:39 node6 corosync[4057]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:39 node6 corosync[4057]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:39 node6 corosync[4057]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:39 node6 corosync[4057]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:51:39 node6 corosync[4057]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:39 node6 corosync[4057]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:52:41 node6 corosync[4057]:   [KNET  ] link: host: 3 link: 1 is down
Apr 23 09:52:41 node6 corosync[4057]:   [KNET  ] link: host: 3 link: 1 is down
Apr 23 09:52:41 node6 corosync[4057]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
Apr 23 09:52:41 node6 corosync[4057]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:52:41 node6 corosync[4057]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
Apr 23 09:52:41 node6 corosync[4057]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:52:41 node6 corosync[4057]:   [TOTEM ] Token has not been received in 154 ms 
Apr 23 09:52:41 node6 corosync[4057]:   [TOTEM ] Token has not been received in 154 ms 
Apr 23 09:52:42 node6 corosync[4057]:   [TOTEM ] A processor failed, forming new configuration.
Apr 23 09:52:42 node6 corosync[4057]:   [TOTEM ] A processor failed, forming new configuration.
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] link: host: 2 link: 1 is down
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] link: host: 2 link: 1 is down
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] link: host: 7 link: 1 is down
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 2 has no active links
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] link: host: 7 link: 1 is down
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 2 has no active links
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 7 has no active links
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 7 (passive) best link: 1 (pri: 1)
Apr 23 09:52:43 node6 corosync[4057]:   [KNET  ] host: host: 7 has no active links
Apr 23 09:55:06 node6 corosync[4178]:   [MAIN  ] Corosync Cluster Engine 3.0.3 starting up


Code:
Apr 22 23:32:25 node7 corosync[1867]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 07:15:27 node7 corosync[1867]:   [TOTEM ] Retransmit List: a5c2e 
Apr 23 09:50:59 node7 corosync[1867]:   [TOTEM ] Retransmit List: dcfa2 
Apr 23 09:51:00 node7 corosync[1867]:   [TOTEM ] Retransmit List: dcfab 
Apr 23 09:51:00 node7 corosync[1867]:   [TOTEM ] Retransmit List: dcfad 
Apr 23 09:51:00 node7 corosync[1867]:   [TOTEM ] Retransmit List: dcfb1 
Apr 23 09:51:38 node7 corosync[1867]:   [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node7 corosync[1867]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node7 corosync[1867]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node7 corosync[1867]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node7 corosync[1867]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node7 corosync[1867]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node7 corosync[1867]:   [CPG   ] downlist left_list: 1 received
Apr 23 09:51:38 node7 corosync[1867]:   [QUORUM] Members[6]: 1 2 3 5 6 7
Apr 23 09:51:38 node7 corosync[1867]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 23 09:51:40 node7 corosync[1867]:   [KNET  ] link: host: 4 link: 1 is down
Apr 23 09:51:40 node7 corosync[1867]:   [KNET  ] host: host: 4 (passive) best link: 1 (pri: 1)
Apr 23 09:51:40 node7 corosync[1867]:   [KNET  ] host: host: 4 has no active links
Apr 23 09:52:40 node7 corosync[1867]:   [KNET  ] link: host: 3 link: 1 is down
Apr 23 09:52:40 node7 corosync[1867]:   [KNET  ] host: host: 3 (passive) best link: 1 (pri: 1)
Apr 23 09:52:40 node7 corosync[1867]:   [KNET  ] host: host: 3 has no active links
Apr 23 09:52:41 node7 corosync[1867]:   [TOTEM ] Token has not been received in 154 ms
 
I don't see anything in the log except that the nodes on the left disappear.

Is the corosync network a dedicated network?
 
The nodes have four SFP+ 10Gb links, arranged into two LACP bonds plugged into two switches, and they also have a Gigabit Ethernet link each which goes to a third switch. We have two corosync networks; ring0 goes over one of the 10Gb bonds and is not dedicated, but ring1 goes over the GbE link which isn't used for anything else unless you count IPMI.

Are there any more logs of relevance which I could share?
 
Can you please send the complete syslog of node7 and node4 form the rebooting time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!