Nodes not joining cluster after restart

fralazar

New Member
Aug 21, 2017
4
0
1
39
Hi,
I have a cluster with 3 nodes, after a power outage two nodes restarted and are not joining the cluster, corosync is not running and when trying to restart the service the log shows the following error:

"[CMAP ] Received config version (4) is different than my config version (5)! Exiting"

Tested multicast with omping and everything seems fine, loss is 0% across all nodes.

Any advice?
Regards.
 
"[CMAP ] Received config version (4) is different than my config version (5)! Exiting"

this means the node has a more recent corosync.conf than the rest of the cluster
compare the content of corosync.conf in your nodes
 
Thanks manu,
I have compared corosync.conf across all nodes and the content is the same, in /etc/corosync/corosync.conf and /etc/pve/corosync.conf
 
can you post the /etc/corosync/corosync.conf of the node having the error message, and
the /etc/corosync/corosync.conf of one of the remaining two nodes ?
 
working node (nodeid: 1)
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: ch1flvps06
    nodeid: 2
    quorum_votes: 1
    ring0_addr: ch1flvps06
  }

  node {
    name: ch1flvps07
    nodeid: 3
    quorum_votes: 1
    ring0_addr: ch1flvps07
  }

  node {
    name: ch1flvps05
    nodeid: 1
    quorum_votes: 1
    ring0_addr: ch1flvps05
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: ch1flvps
  config_version: 5
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 190.215.57.194
    ringnumber: 0
  }

}

not working node (nodeid: 2)
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: ch1flvps06
    nodeid: 2
    quorum_votes: 1
    ring0_addr: ch1flvps06
  }

  node {
    name: ch1flvps07
    nodeid: 3
    quorum_votes: 1
    ring0_addr: ch1flvps07
  }

  node {
    name: ch1flvps05
    nodeid: 1
    quorum_votes: 1
    ring0_addr: ch1flvps05
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: ch1flvps
  config_version: 5
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 190.215.57.194
    ringnumber: 0
  }

}

not working node (nodeid: 3)
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: ch1flvps06
    nodeid: 2
    quorum_votes: 1
    ring0_addr: ch1flvps06
  }

  node {
    name: ch1flvps07
    nodeid: 3
    quorum_votes: 1
    ring0_addr: ch1flvps07
  }

  node {
    name: ch1flvps05
    nodeid: 1
    quorum_votes: 1
    ring0_addr: ch1flvps05
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: ch1flvps
  config_version: 5
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 190.215.57.194
    ringnumber: 0
  }

}

also checked the content on /etc/pve/corosync.conf and is the same in all of them.
 
ok can you please post the output of
journalctl -u corosync | tail -20
pvecm status

on one of your nodes

I don't understand why it is staying it's receiving a config file version 4
do you have some "hidden" nodes hanging on the network ?
 
nodeid: 1
Code:
root@ch1flvps05:~# journalctl -u corosync | tail -20
Aug 22 06:25:16 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:880) was formed. Members joined: 2
Aug 22 06:25:16 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:884) was formed. Members left: 2
Aug 22 06:25:16 ch1flvps05 corosync[1444]: [QUORUM] Members[1]: 1
Aug 22 06:25:16 ch1flvps05 corosync[1444]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 22 06:25:17 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:888) was formed. Members joined: 3
Aug 22 06:25:17 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:892) was formed. Members left: 3
Aug 22 06:25:17 ch1flvps05 corosync[1444]: [QUORUM] Members[1]: 1
Aug 22 06:25:17 ch1flvps05 corosync[1444]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 23 06:25:14 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:896) was formed. Members joined: 3 2
Aug 23 06:25:14 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:900) was formed. Members left: 3 2
Aug 23 06:25:14 ch1flvps05 corosync[1444]: [QUORUM] Members[1]: 1
Aug 23 06:25:14 ch1flvps05 corosync[1444]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 24 06:25:14 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:904) was formed. Members joined: 2
Aug 24 06:25:14 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:908) was formed. Members left: 2
Aug 24 06:25:14 ch1flvps05 corosync[1444]: [QUORUM] Members[1]: 1
Aug 24 06:25:14 ch1flvps05 corosync[1444]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 24 06:25:15 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:912) was formed. Members joined: 3
Aug 24 06:25:15 ch1flvps05 corosync[1444]: [TOTEM ] A new membership (190.215.57.194:916) was formed. Members left: 3
Aug 24 06:25:15 ch1flvps05 corosync[1444]: [QUORUM] Members[1]: 1
Aug 24 06:25:15 ch1flvps05 corosync[1444]: [MAIN  ] Completed service synchronization, ready to provide service.
root@ch1flvps05:~# pvecm status
Quorum information
------------------
Date:             Thu Aug 24 12:35:36 2017
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/920
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 190.215.57.194 (local)

nodeid2:
Code:
root@ch1flvps06:~# journalctl -u corosync | tail -20
Aug 24 06:25:13 ch1flvps06 corosync[30591]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [TOTEM ] A new membership (190.215.57.194:904) was formed. Members joined: 1
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [CMAP  ] Received config version (4) is different than my config version (5)! Exiting
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Unloading all Corosync service engines.
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [QB    ] withdrawing server sockets
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [QB    ] withdrawing server sockets
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Service engine unloaded: corosync configuration map access
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [QB    ] withdrawing server sockets
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Service engine unloaded: corosync configuration service
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [QB    ] withdrawing server sockets
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [QB    ] withdrawing server sockets
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [SERV  ] Service engine unloaded: corosync profile loading service
Aug 24 06:25:14 ch1flvps06 corosync[30591]: [MAIN  ] Corosync Cluster Engine exiting normally
Aug 24 06:26:14 ch1flvps06 corosync[30571]: Starting Corosync Cluster Engine (corosync): [FAILED]
Aug 24 06:26:14 ch1flvps06 systemd[1]: corosync.service: control process exited, code=exited status=1
Aug 24 06:26:14 ch1flvps06 systemd[1]: Failed to start Corosync Cluster Engine.
Aug 24 06:26:14 ch1flvps06 systemd[1]: Unit corosync.service entered failed state.
root@ch1flvps06:~# pvecm status
Cannot initialize CMAP service

nodeid: 3
Code:
root@ch1flvps07:~# journalctl -u corosync | tail -20
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [TOTEM ] A new membership (190.215.57.194:912) was formed. Members joined: 1
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [CMAP  ] Received config version (4) is different than my config version (6)! Exiting
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Unloading all Corosync service engines.
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [QB    ] withdrawing server sockets
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [QB    ] withdrawing server sockets
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Service engine unloaded: corosync configuration map access
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [QB    ] withdrawing server sockets
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Service engine unloaded: corosync configuration service
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [QB    ] withdrawing server sockets
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [QB    ] withdrawing server sockets
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [SERV  ] Service engine unloaded: corosync profile loading service
Aug 24 06:25:15 ch1flvps07 corosync[30666]: [MAIN  ] Corosync Cluster Engine exiting normally
Aug 24 06:26:15 ch1flvps07 corosync[30657]: Starting Corosync Cluster Engine (corosync): [FAILED]
Aug 24 06:26:15 ch1flvps07 systemd[1]: corosync.service: control process exited, code=exited status=1
Aug 24 06:26:15 ch1flvps07 systemd[1]: Failed to start Corosync Cluster Engine.
Aug 24 06:26:15 ch1flvps07 systemd[1]: Unit corosync.service entered failed state.
root@ch1flvps07:~# pvecm status
Cannot initialize CMAP service

Please note that on nodeid 2 and 3 i had to execute "pmxcfs -l" since i was not able to start vps within those nodes.

And for "hidden" nodes, some weeks ago i tried to add a node and it failed since it was on a different network, is not shown in corosync.conf but is showing in the gui only on nodeid1: https://www.awesomescreenshot.com/image/2761040/69ec0c61f03748af3c7e93bc1c6b9b24
In that moment i tried "pvecm delnode ch1flvps08" but did not worked.
 
I still see this as a problem of your corosync conf being node in sync between your nodes.
you have a configuration mismatch beween your nodes and corosync cannot build a quorum ( hence the need to mount the cluster file system read only)

Is the deleted node still trying to joing the corosync cluster ?

You can have a look at your multicast traffic with:

tcpdump -i my dev0 ether multicast

where mydev0 is the ethernet link where the corosync traffic appears

You should see here your three nodes sending packets on the multicast network (but only those three)

If the tcpdump output looks fine, then try again to synchronize the /etc/corosync/corosync.conf of all your nodes.
 
  • Like
Reactions: chrone
SOLVED: restarting corosync on all nodes solved the problem.

I have the same problem. 4 nodes, after two of them where reboot it is not possible to join cluster. All nodes have the same configuration version. I don't know why, but all nodes has the same IP in a log near wrong version error: 100.64.254.136, it is IP of one of nodes. This node has the same error with this IP:
Code:
Oct 23 16:41:24 vmk corosync[11733]: notice  [TOTEM ] A new membership (100.64.254.136:3080) was formed. Members joined: 2
Oct 23 16:41:24 vmk corosync[11733]: warning [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Oct 23 16:41:24 vmk corosync[11733]: warning [CPG   ] downlist left_list: 0 received
Oct 23 16:41:24 vmk corosync[11733]: notice  [QUORUM] Members[1]: 2
Oct 23 16:41:24 vmk corosync[11733]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 23 16:41:24 vmk corosync[11733]:  [QB    ] server name: cmap
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync configuration service [1]
Oct 23 16:41:24 vmk corosync[11733]:  [QB    ] server name: cfg
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Oct 23 16:41:24 vmk corosync[11733]:  [QB    ] server name: cpg
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Oct 23 16:41:24 vmk corosync[11733]:  [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
Oct 23 16:41:24 vmk corosync[11733]:  [WD    ] resource load_15min missing a recovery key.
Oct 23 16:41:24 vmk corosync[11733]:  [WD    ] resource memory_used missing a recovery key.
Oct 23 16:41:24 vmk corosync[11733]:  [WD    ] no resources configured.
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
Oct 23 16:41:24 vmk corosync[11733]:  [QUORUM] Using quorum provider corosync_votequorum
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Oct 23 16:41:24 vmk corosync[11733]:  [QB    ] server name: votequorum
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Oct 23 16:41:24 vmk corosync[11733]:  [QB    ] server name: quorum
Oct 23 16:41:24 vmk corosync[11733]:  [TOTEM ] A new membership (100.64.254.136:3080) was formed. Members joined: 2
Oct 23 16:41:24 vmk corosync[11733]:  [TOTEM ] JOIN or LEAVE message was thrown away during flush operation.
Oct 23 16:41:24 vmk corosync[11733]:  [CPG   ] downlist left_list: 0 received
Oct 23 16:41:24 vmk corosync[11733]:  [QUORUM] Members[1]: 2
Oct 23 16:41:24 vmk corosync[11733]:  [MAIN  ] Completed service synchronization, ready to provide service.
Oct 23 16:41:24 vmk corosync[11733]: notice  [TOTEM ] A new membership (100.64.254.136:3084) was formed. Members joined: 3
Oct 23 16:41:24 vmk corosync[11733]:  [TOTEM ] A new membership (100.64.254.136:3084) was formed. Members joined: 3
Oct 23 16:41:24 vmk corosync[11733]: error   [CMAP  ] Received config version (3) is different than my config version (4)! Exiting
Oct 23 16:41:24 vmk corosync[11733]:  [CMAP  ] Received config version (3) is different than my config version (4)! Exiting
Oct 23 16:41:24 vmk corosync[11733]:  [SERV  ] Unloading all Corosync service engines.
 
"[CMAP ] Received config version (4) is different than my config version (5)! Exiting"

this means the node has a more recent corosync.conf than the rest of the cluster
compare the content of corosync.conf in your nodes
I had same problem !!! It happens when cluster do new synchronization but one node is "down". This node will became with later version. Only after you compare corosync.conf from all nodes and all of them are equals, unlike de version of a node with problem. I modify the version for the same version of all other nodes (working) and restart corosync.service.
Bellow my scenario:
[CMAP ] Received config version (17) is different than my config version (16)! Exiting

root@proxmoxserver4:/etc/pve# pvecm status
Cluster information
-------------------
Name: PROXMOXLAB
Config Version: 16
Transport: knet
Secure auth: on

Cannot initialize CMAP service
root@proxmoxserver4:/etc/pve# pvecm status
Cluster information
-------------------
Name: PROXMOXLAB
Config Version: 16
Transport: knet
Secure auth: on

root@proxmoxserver3:/etc/corosync# vim corosync.conf
totem {
cluster_name: PROXMOXLAB
config_version: 18
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
root@proxmoxserver4:/etc/corosync# more corosync.conf
totem {
cluster_name: PROXMOXLAB
config_version: 18
interface {
linknumber: 0

root@proxmoxserver4:/etc/corosync# systemctl restart corosync.service

root@proxmoxserver4:/etc/corosync# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 proxmoxserver8
2 1 proxmoxserver2
3 1 proxmoxserver3
4 1 proxmoxserver4 (local)
6 1 proxmoxserver6
7 1 proxmoxserver7
8 1 proxmoxserver1
9 1 proxmoxserver9
10 1 proxmoxserver10
Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!