Hi,
I have an issue with my cluster (5.1), newly updated, I have only 3 nodes synced (pmxfs / dcdb) out of 10 nodes up. The 7 remaining never catch up to date...
It seems the updates are sent, but always the same amount of updates, and I've not seen any node catching up...
Here are the logs that seems to repeat over and over...
I'm a bit blocked and running out of ideas...
I have an issue with my cluster (5.1), newly updated, I have only 3 nodes synced (pmxfs / dcdb) out of 10 nodes up. The 7 remaining never catch up to date...
Nov 6 15:24:35 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
It seems the updates are sent, but always the same amount of updates, and I've not seen any node catching up...
Nov 6 15:27:26 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (164) updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (164) updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (164) updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Here are the logs that seems to repeat over and over...
Nov 6 15:28:42 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Nov 6 15:28:42 R1M1 pmxcfs[1236]: [dcdb] notice: remove message from non-member 10/1302
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 12/1304, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE0)
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: leader is 1/1236
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: all data is up to date
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE1)
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: leader is 1/1236
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: all data is up to date
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE2)
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 14/17145, 16/1222, 17/1207
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: queue not emtpy - resening 1 messages
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE3)
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 16/1222, 17/1207
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE4)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 10/1302, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: queue not emtpy - resening 3 messages
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE5)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE6)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE7)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: leader is 1/1236
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: all data is up to date
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 4
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: remove message from non-member 10/1302
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 12/1304, 16/1222, 17/1207
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE8)
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
I'm a bit blocked and running out of ideas...