ProxMox v4 Red state

Egner

Renowned Member
Aug 2, 2015
96
1
73
Hi,

What i can see from the logs so is it something very strange that is going on with the cluster under heavy load (for a long amount of time, it could take about 3-4 days after the load was ended to the problem start to exist in the cluster.



I can not found any drops on the interface that indicate that this should be the problem.

vmbr0 Link encap:Ethernet HWaddr ec:f4:bb:e7:f2:5e
inet addr:10.10.13.102 Bcast:10.10.13.255 Mask:255.255.255.0
inet6 addr: fe80::eef4:bbff:fee7:f25e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:99047944 errors:0 dropped:0 overruns:0 frame:0
TX packets:86666544 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000


eth1 Link encap:Ethernet HWaddr ec:f4:bb:e7:f2:5e
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:107024643 errors:0 dropped:20581 overruns:0 frame:0
TX packets:93764407 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:32182262155 (29.9 GiB) TX bytes:16450322190 (15.3 GiB)



On all nodes i could find this:

Code:
Mar 05 11:40:57 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 05 11:40:58 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178312) was formed. Members
Mar 05 11:40:58 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 05 11:40:58 tc01-c-h corosync[2065]: [MAIN  ] Completed service synchronization, ready to provide service.
Mar 06 03:06:20 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 06 03:06:20 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178316) was formed. Members
Mar 06 03:06:20 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 06 03:06:20 tc01-c-h corosync[2065]: [MAIN  ] Completed service synchronization, ready to provide service.
Mar 06 19:46:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 06 19:46:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178320) was formed. Members
Mar 06 19:46:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 06 19:46:10 tc01-c-h corosync[2065]: [MAIN  ] Completed service synchronization, ready to provide service.
Mar 07 19:56:03 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 07 19:56:03 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178324) was formed. Members
Mar 07 19:56:03 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 07 19:56:03 tc01-c-h corosync[2065]: [MAIN  ] Completed service synchronization, ready to provide service.

The clusters have in total 13 members, and it also have a dedicated 10G interface for the internal communications between all nodes. (the same port is also using a NFS storage directly added to the LXC cointainers)

pvecm status
Quorum information
------------------
Date: Fri Mar 8 15:29:25 2019
Quorum provider: corosync_votequorum
Nodes: 13
Node ID: 0x00000002
Ring ID: 2/178324
Quorate: Yes
Votequorum information
----------------------
Expected votes: 13
Highest expected: 13
Total votes: 13
Quorum: 7
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 10.10.13.102 (local)
0x00000003 1 10.10.13.103
0x00000005 1 10.10.13.104
0x00000007 1 10.10.13.105
0x00000006 1 10.10.13.106
0x00000008 1 10.10.13.107
0x00000009 1 10.10.13.108
0x0000000a 1 10.10.13.109
0x0000000b 1 10.10.13.110
0x0000000c 1 10.10.13.111
0x0000000d 1 10.10.13.112
0x0000000e 1 10.10.13.113
0x00000004 1 10.10.13.117


I have also checked that the multicast is working OK on all nodes.

I have read about someone say that we should add more token time to the corosync config to prevent this to happen, but is this the solution ?
 
Last edited:
I have also checked that the multicast is working OK on all nodes.
How did you do that?

I have read about someone say that we should add more token time to the corosync config to prevent this to happen, but is this the solution ?
Well, I would consider this more of a last resort. But first, is the corosync traffic running on its own physical interface? Assuming from your description, it is not. What else is running on that network?
 
I have running this command between the two nodes without any loss:

nod#1
root@node1:~# omping -c 10000 -i 0.001 -F -q 10.10.13.102 10.10.13.103
10.10.13.103 : joined (S,G) = (*, 232.43.211.234), pinging
10.10.13.103 : waiting for response msg
10.10.13.103 : server told us to stop
10.10.13.103 : unicast, xmt/rcv/%loss = 9052/9052/0%, min/avg/max/std-dev = 0.031/0.063/0.199/0.024
10.10.13.103 : multicast, xmt/rcv/%loss = 9052/9052/0%, min/avg/max/std-dev = 0.037/0.077/0.245/0.030

nod#2
root@node2:~# omping -c 10000 -i 0.001 -F -q 10.10.13.102 10.10.13.103
10.10.13.102 : waiting for response msg
10.10.13.102 : joined (S,G) = (*, 232.43.211.234), pinging
10.10.13.102 : given amount of query messages was sent
10.10.13.102 : unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.029/0.066/0.200/0.025
10.10.13.102 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.037/0.078/0.239/0.029

I can't just say this is 100% but i believe it is.

This machine is connected to one of our NFS storage servers with 1 gigabit/s connection, these containers is connected directly
through this nfs server.

all the cluster servers have 10G interface with intel nics. (except of the NFS server..)
 
I have running this command between the two nodes without any loss:
Please use all nodes of your cluster on this test, as also the scale of the traffic is important.

This machine is connected to one of our NFS storage servers with 1 gigabit/s connection, these containers is connected directly
through this nfs server.

all the cluster servers have 10G interface with intel nics. (except of the NFS server..)
Do I understand correctly, that your storage, backup and migration traffic is going over the same 10GbE?

EDIT: also run the test for longer eg. 'omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...'
 
Hi, i have run the test on all nodes and this is the results:

node#1
10.10.13.103 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.047/0.155/0.239/0.035
10.10.13.103 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.053/0.148/0.239/0.037
10.10.13.104 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.059/0.150/0.219/0.029
10.10.13.104 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.064/0.158/0.233/0.028
10.10.13.105 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.068/0.140/0.214/0.031
10.10.13.105 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.053/0.137/0.230/0.034
10.10.13.106 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.125/0.204/0.033
10.10.13.106 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.140/0.210/0.030
10.10.13.107 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.043/0.143/4.149/0.167
10.10.13.107 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.049/0.142/4.157/0.167
10.10.13.108 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.040/0.137/4.131/0.166
10.10.13.108 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.045/0.149/4.135/0.165
10.10.13.109 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.056/0.136/0.217/0.030
10.10.13.109 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.056/0.138/0.225/0.030
10.10.13.110 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.073/0.126/0.205/0.027
10.10.13.110 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.075/0.138/0.212/0.027
10.10.13.111 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.067/0.125/0.219/0.029
10.10.13.111 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.054/0.126/0.227/0.030
10.10.13.112 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.050/0.127/0.205/0.028
10.10.13.112 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.059/0.136/0.226/0.027
10.10.13.113 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.072/0.145/0.225/0.021
10.10.13.113 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.069/0.155/0.233/0.021
10.10.13.117 : unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.075/0.147/0.233/0.031
10.10.13.117 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.079/0.151/0.240/0.031

You can find all nodes results here : https://pastebin.com/0d6kBJTL

And yes the network is also used for an NFS connection to some containers, but on a separate VLAN of course.
 
You can find all nodes results here : https://pastebin.com/0d6kBJTL
AFAICS, the omping results look good.

And yes the network is also used for an NFS connection to some containers, but on a separate VLAN of course.
The important bit for corosync is the latency. Any other traffic can interfere with corosync and may results in the above behavior. Do you see any other messages in the syslog/journal around the 'new membership' appearances?
 
AFAICS, the omping results look good.


The important bit for corosync is the latency. Any other traffic can interfere with corosync and may results in the above behavior. Do you see any other messages in the syslog/journal around the 'new membership' appearances?

1. No information in syslog.
2. journal gives me this if i search for "new membership" (journalctl -r | grep 'new membership')

"Mar 13 01:10:01 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178832) was formed. Members"

Can the problem be the gigabit connection to the NFS server? like if it start to choke .. if the lines goes full?

Any ideas ?
 
2. journal gives me this if i search for "new membership" (journalctl -r | grep 'new membership')
If you grep for 'new membership' then you only get output containing that message. What else is there to find, before and after?

Can the problem be the gigabit connection to the NFS server? like if it start to choke .. if the lines goes full?
Your NFS server is not part of the cluster, is it? I don't think it is the main suspect for corosync.
 
If you grep for 'new membership' then you only get output containing that message. What else is there to find, before and after?


Your NFS server is not part of the cluster, is it? I don't think it is the main suspect for corosync.

And yes the NFS server is not a part of the cluster.

This is what i could find from the entire log:


journalctl -lf -u corosync
-- Logs begin at Fri 2019-03-01 11:32:14 CET. --
Mar 12 23:26:34 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 12 23:26:34 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 12 23:32:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 12 23:32:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178828) was formed. Members
Mar 12 23:32:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 12 23:32:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 01:10:01 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 01:10:01 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178832) was formed. Members
Mar 13 01:10:01 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 01:10:01 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 19:03:09 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 19:03:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178836) was formed. Members
Mar 13 19:03:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 19:03:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 19:38:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 19:38:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178840) was formed. Members
Mar 13 19:38:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 19:38:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:08:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:08:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178844) was formed. Members
Mar 13 20:08:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:08:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:09:45 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:09:45 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178848) was formed. Members
Mar 13 20:09:45 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:09:45 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:35:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:35:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178852) was formed. Members
Mar 13 20:35:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:35:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:40:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:40:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178856) was formed. Members
Mar 13 20:40:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:40:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:45:54 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:45:54 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178860) was formed. Members
Mar 13 20:45:54 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:45:54 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:47:13 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:47:13 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178864) was formed. Members
Mar 13 20:47:13 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:47:13 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:50:14 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:50:14 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178868) was formed. Members
Mar 13 20:50:14 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:50:14 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 20:55:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 20:55:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178872) was formed. Members
Mar 13 20:55:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 20:55:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 21:07:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 21:07:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178876) was formed. Members
Mar 13 21:07:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 21:07:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 21:38:09 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 21:38:09 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178880) was formed. Members
Mar 13 21:38:09 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 21:38:09 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 21:43:16 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 21:43:16 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178884) was formed. Members
Mar 13 21:43:16 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 21:43:16 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 22:04:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 22:04:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178888) was formed. Members
Mar 13 22:04:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 22:04:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 22:44:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 22:44:17 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178892) was formed. Members
Mar 13 22:44:17 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 22:44:17 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 22:53:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 22:53:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178896) was formed. Members
Mar 13 22:53:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 22:53:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 13 22:58:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 13 22:58:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178900) was formed. Members
Mar 13 22:58:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 13 22:58:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 14 00:11:16 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 00:11:16 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178904) was formed. Members
Mar 14 00:11:16 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 00:11:16 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 14 00:35:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 00:35:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178908) was formed. Members
Mar 14 00:35:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 00:35:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 14 00:44:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 00:44:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178912) was formed. Members
Mar 14 00:44:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 00:44:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 14 01:57:10 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 01:57:10 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178916) was formed. Members
Mar 14 01:57:10 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 01:57:10 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 14 06:55:17 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 06:55:18 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178920) was formed. Members
Mar 14 06:55:18 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 06:55:18 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 14 07:57:16 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 07:57:16 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178924) was formed. Members
Mar 14 07:57:16 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 07:57:16 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
 
Mar 14 07:57:16 tc01-c-h corosync[2065]: [TOTEM ] A processor failed, forming new configuration.
Mar 14 07:57:16 tc01-c-h corosync[2065]: [TOTEM ] A new membership (10.10.13.102:178924) was formed. Members
Mar 14 07:57:16 tc01-c-h corosync[2065]: [QUORUM] Members[13]: 2 3 5 7 6 8 9 10 11 12 13 14 4
Mar 14 07:57:16 tc01-c-h corosync[2065]: [MAIN ] Completed service synchronization, ready to provide service.
If I look at these entries, then what can be seen in the logs prior this time? Something happend before corosync formed a new membership. Check all the logs on all your nodes in the cluster to see if there is any indication. As it could range from network interference to config issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!