2 of 8 nodes .members are not in sync

khanhnguyen

New Member
Dec 4, 2019
7
0
1
42
Setup:
Virtual Environment 5.4-11
8 nodes
2 of 8 show just 7 nodes instead of 8.


I have restarted corosync, pve-cluster, pvedaemon, pveproxy. Nothing works.

corosync.conf
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: prx01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.x.201
  }
  node {
    name: prx02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.x.202
  }
  node {
    name: prx03
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.x.203
  }
  node {
    name: prx04
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.x.204
  }
  node {
    name: prx05
    nodeid: 5
    quorum_votes: 1
    ring0_addr: 192.168.x.205
  }
  node {
    name: prx06
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.x.206
  }
  node {
    name: prx07
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 192.168.x.207
  }
  node {
    name: prx08
    nodeid: 8
    quorum_votes: 1
    ring0_addr: 192.168.x.208
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: c1
  config_version: 8
  interface {
    bindnetaddr: 192.168.x.201
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

.member
Code:
{

"nodename": "prx01",

"version": 24,

"cluster": { "name": "c1", "version": 8, "nodes": 8, "quorate": 1 },

"nodelist": {

  "prx01": { "id": 1, "online": 1, "ip": "192.168.x.201"},

  "prx02": { "id": 2, "online": 1, "ip": "192.168.x.202"},

  "prx03": { "id": 3, "online": 1, "ip": "192.168.x.203"},

  "prx04": { "id": 4, "online": 1, "ip": "192.168.x.204"},

  "prx05": { "id": 5, "online": 1, "ip": "192.168.x.205"},

  "prx07": { "id": 7, "online": 1, "ip": "192.168.x.207"},

  "prx08": { "id": 8, "online": 1, "ip": "192.168.x.208"},

  "prx06": { "id": 6, "online": 1, "ip": "192.168.x.206"}

  }

}

{

"nodename": "prx02",

"version": 21,

"cluster": { "name": "c1", "version": 7, "nodes": 7, "quorate": 1 },

"nodelist": {

  "prx07": { "id": 7, "online": 1, "ip": "192.168.x.207"},

  "prx01": { "id": 1, "online": 1, "ip": "192.168.x.201"},

  "prx02": { "id": 2, "online": 1, "ip": "192.168.x.202"},

  "prx03": { "id": 3, "online": 1, "ip": "192.168.x.203"},

  "prx04": { "id": 4, "online": 1, "ip": "192.168.x.204"},

  "prx05": { "id": 5, "online": 1, "ip": "192.168.x.205"},

  "prx06": { "id": 6, "online": 1, "ip": "192.168.x.206"}

  }

}

{

"nodename": "prx03",

"version": 19,

"cluster": { "name": "c1", "version": 7, "nodes": 7, "quorate": 1 },

"nodelist": {

  "prx07": { "id": 7, "online": 1, "ip": "192.168.x.207"},

  "prx01": { "id": 1, "online": 1, "ip": "192.168.x.201"},

  "prx02": { "id": 2, "online": 1, "ip": "192.168.x.202"},

  "prx03": { "id": 3, "online": 1, "ip": "192.168.x.203"},

  "prx04": { "id": 4, "online": 1, "ip": "192.168.x.204"},

  "prx05": { "id": 5, "online": 1, "ip": "192.168.x.205"},

  "prx06": { "id": 6, "online": 1, "ip": "192.168.x.206"}

  }

}
 
I have restarted corosync, pve-cluster, pvedaemon, pveproxy. Nothing works.

check out the syslog, corosync tells often some hints in the logs about issues. Compare from one "healthy" and one not healthy node. As a starter try the filtered: journalctl -b -u corosync -u pve-cluster

Also, this things normally do not happen out of blue, so what did you do before this happened? ;)
 
I noticed the problem one week ago and tried to solved on my own. Now every node is in sync again. But sometimes we loose prx01 or prx08 out of the sync. No config change by me on prx02 or prx03 for a long time.

Code:
Feb 10 23:54:12 prx02 pmxcfs[29930]: [status] notice: starting data syncronisation
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: members: 1/28585, 2/29930, 3/19910, 4/16886, 5/21391, 6/32272, 7/5570
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: starting data syncronisation
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: received sync request (epoch 1/28585/0000000E)
Feb 10 23:54:12 prx02 pmxcfs[29930]: [status] notice: received sync request (epoch 1/28585/0000000E)
Feb 10 23:54:12 prx02 corosync[29981]: notice  [TOTEM ] A new membership (192.168.31.201:4424) was formed. Members left: 8
Feb 10 23:54:12 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]: notice  [QUORUM] Members[7]: 1 2 3 4 5 6 7
Feb 10 23:54:12 prx02 corosync[29981]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 10 23:54:12 prx02 corosync[29981]:  [TOTEM ] A new membership (192.168.31.201:4424) was formed. Members left: 8
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [CPG   ] downlist left_list: 1 received
Feb 10 23:54:12 prx02 corosync[29981]:  [QUORUM] Members[7]: 1 2 3 4 5 6 7
Feb 10 23:54:12 prx02 corosync[29981]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: received all states
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: leader is 1/28585
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: synced members: 1/28585, 2/29930, 3/19910, 4/16886, 5/21391, 6/32272, 7/5570
Feb 10 23:54:12 prx02 pmxcfs[29930]: [dcdb] notice: all data is up to date
Feb 10 23:54:12 prx02 pmxcfs[29930]: [status] notice: received all states
Feb 10 23:54:12 prx02 pmxcfs[29930]: [status] notice: all data is up to date
Feb 10 23:54:14 prx02 corosync[29981]: notice  [TOTEM ] A new membership (192.168.31.201:4428) was formed. Members joined: 8
Feb 10 23:54:14 prx02 corosync[29981]:  [TOTEM ] A new membership (192.168.31.201:4428) was formed. Members joined: 8
Feb 10 23:54:14 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]: warning [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]:  [CPG   ] downlist left_list: 0 received
Feb 10 23:54:14 prx02 corosync[29981]: notice  [QUORUM] Members[8]: 1 2 3 4 5 6 7 8
Feb 10 23:54:14 prx02 corosync[29981]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 10 23:54:14 prx02 corosync[29981]:  [QUORUM] Members[8]: 1 2 3 4 5 6 7 8
Feb 10 23:54:14 prx02 corosync[29981]:  [MAIN  ] Completed service synchronization, ready to provide service.
Feb 10 23:54:18 prx02 pmxcfs[29930]: [dcdb] notice: members: 1/28585, 2/29930, 3/19910, 4/16886, 5/21391, 6/32272, 7/5570, 8/10201
 
I noticed the problem one week ago and tried to solved on my own
Which problem, and how did you solved it. Please provide some details - else we cannot really help.

But sometimes we loose prx01 or prx08 out of the sync. No config change by me on prx02 or prx03 for a long time.

So, this is not permanent as the initial post suggested but happens rather occasional? How's your setup build? How many networks? Does corosync runs on it's own or is it on a network shared with storage, vm, live-migration and/or backup traffic?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!