[SOLVED] dcdb not syncing ?

bladux

Well-Known Member
Nov 7, 2016
30
0
46
40
Hi,

I have an issue with my cluster (5.1), newly updated, I have only 3 nodes synced (pmxfs / dcdb) out of 10 nodes up. The 7 remaining never catch up to date...
Nov 6 15:24:35 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207


It seems the updates are sent, but always the same amount of updates, and I've not seen any node catching up...
Nov 6 15:27:26 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (164) updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:27 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:28 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:29 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (164) updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:33 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (164) updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:41 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:27:42 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates

Here are the logs that seems to repeat over and over...
Nov 6 15:28:42 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Nov 6 15:28:42 R1M1 pmxcfs[1236]: [dcdb] notice: remove message from non-member 10/1302
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 12/1304, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE0)
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: leader is 1/1236
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (69) updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: all data is up to date
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE1)
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: leader is 1/1236
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: all data is up to date
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 16/1222, 17/1207
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:44 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE2)
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 14/17145, 16/1222, 17/1207
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: queue not emtpy - resening 1 messages
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE3)
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 16/1222, 17/1207
Nov 6 15:28:45 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE4)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 10/1302, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: queue not emtpy - resening 3 messages
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE5)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 3/1269, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE6)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE7)
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: received all states
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: leader is 1/1236
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: synced members: 1/1236, 16/1222, 17/1207
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: start sending inode updates
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: sent all (0) updates
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: all data is up to date
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: dfsm_deliver_queue: queue length 4
Nov 6 15:28:48 R1M1 pmxcfs[1236]: [dcdb] notice: remove message from non-member 10/1302
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: members: 1/1236, 12/1304, 16/1222, 17/1207
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: starting data syncronisation
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: received sync request (epoch 1/1236/00000EE8)
Nov 6 15:28:50 R1M1 pmxcfs[1236]: [dcdb] notice: received all states


I'm a bit blocked and running out of ideas...
 
I fired up all my nodes (17), and only 6 are synced.. Some of my nodes are up for over 2 hours, all containers are up, but nodes are still not synced...

Any way to force a sync ? I'm kind of worried anything happens if I let it this way..
 
Posting how I solved it:
Manually restarted all nodes that were not in sync and made sure the sqlite file had nos corruption:

service corosync stop
rm /var/lib/pve-cluster/.pmxcfs.lockfile
rm backup.db
sqlite3 /var/lib/pve-cluster/config.db
.output backup.db
.dump
.quit
sqlite3 database_fixed.db
.read backup.db
.quit
mv /var/lib/pve-cluster/config.db /var/lib/pve-cluster/config.db_hs_auto
mv database_fixed.db /var/lib/pve-cluster/config.db
pmxcfs -l
cp /etc/corosync/corosync.conf /etc/pve/corosync.conf
service corosync start
rm /var/lib/pve-cluster/.pmxcfs.lockfile
service pve-cluster start
service pvedaemon restart
service pveproxy restart

On one node I had duplicate entries that I had to manually delete from backup.db before restoring the backup into database_fixed.db

Got all my nodes running smoothly again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!