[SOLVED] one node "Cannot initialize CMAP service"

RobFantini

Famous Member
May 24, 2012
2,023
107
133
Boston,Mass
Hello

I have a 4 node cluster.

One is a 'cold spare' I do not usually leave it on line.

However I learned that it should be on line when I add a node.

It was powered off when I added a node yesterday.

After powering on today, it is not part of the cluster.

some debugging:
Code:
# journalctl -xn
-- Logs begin at Wed 2016-12-14 17:51:13 EST, end at Wed 2016-12-14 18:42:22 EST. --
Dec 14 18:42:10 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [quorum] crit: quorum_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [confdb] crit: cmap_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [dcdb] crit: cpg_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
Dec 14 18:42:16 s020 pvestatd[4584]: mount error: mount.nfs: access denied by server while mounting 10.2.2.181:/bkup/bkup-long-t
Dec 14 18:42:22 s020 pmxcfs[8087]: [quorum] crit: quorum_initialize failed: 2
Dec 14 18:42:22 s020 pmxcfs[8087]: [confdb] crit: cmap_initialize failed: 2
Dec 14 18:42:22 s020 pmxcfs[8087]: [dcdb] crit: cpg_initialize failed: 2
Dec 14 18:42:22 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2

Code:
# pvecm nodes
Cannot initialize CMAP service

any suggestions to fix this?



thanks, Rob Fantini

PS: It may be that it is impossible to do anything but reinstall. In which case the info would be nice to see as a warning before adding a new node.
that makes for complicated code i think. so a wiki page section for this would be good.
 
Last edited:
more info

corosync.conf
Code:
quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster-v4
  config_version: 37
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
  bindnetaddr: 10.1.10.181
  ringnumber: 0
  }

}

and from a working node:
Code:
quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster-v4
  config_version: 38
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
  bindnetaddr: 10.1.10.181
  ringnumber: 0
  }

}
 
I hope those are only snippets of your cosorsync config and not the whole one ;)

what does "journalctl -b -u corosync" say?
 
that was just last part including totem - i wanted to show the version.

Code:
-- Logs begin at Wed 2016-12-14 17:51:13 EST, end at Thu 2016-12-15 06:32:54 EST. --
Dec 14 17:51:26 s020 systemd[1]: Starting Corosync Cluster Engine...
Dec 14 17:51:26 s020 corosync[4577]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide se
Dec 14 17:51:26 s020 corosync[4577]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash:
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] The network interface [10.1.10.67] is now up.
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cmap
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cfg
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync cluster closed process group servi
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cpg
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 14 17:51:26 s020 corosync[4578]: [QUORUM] Using quorum provider corosync_votequorum
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: votequorum
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: quorum
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] A new membership (10.1.10.67:29528) was formed. Members joined: 2
Dec 14 17:51:26 s020 corosync[4578]: [QUORUM] Members[1]: 2
Dec 14 17:51:26 s020 corosync[4578]: [MAIN  ] Completed service synchronization, ready to provide service.
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] A new membership (10.1.10.10:29532) was formed. Members joined: 1
Dec 14 17:51:26 s020 corosync[4578]: [CMAP  ] Received config version (38) is different than my config version (
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Unloading all Corosync service engines.
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync configuration map access
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync configuration service
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync cluster closed process group ser
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync profile loading service
Dec 14 17:51:26 s020 corosync[4578]: [MAIN  ] Corosync Cluster Engine exiting normally
Dec 14 17:52:27 s020 corosync[4566]: Starting Corosync Cluster Engine (corosync): [FAILED]
Dec 14 17:52:27 s020 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec 14 17:52:27 s020 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 14 17:52:27 s020 systemd[1]: Unit corosync.service entered failed state.
Dec 14 18:18:23 s020 systemd[1]: Starting Corosync Cluster Engine...
Dec 14 18:18:23 s020 corosync[8103]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide se
Dec 14 18:18:23 s020 corosync[8103]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash:
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] The network interface [10.1.10.67] is now up.
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cmap
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cfg
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync cluster closed process group servi
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cpg
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 14 18:18:23 s020 corosync[8104]: [QUORUM] Using quorum provider corosync_votequorum
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: votequorum
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: quorum
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] A new membership (10.1.10.67:29536) was formed. Members joined: 2
Dec 14 18:18:23 s020 corosync[8104]: [QUORUM] Members[1]: 2
Dec 14 18:18:23 s020 corosync[8104]: [MAIN  ] Completed service synchronization, ready to provide service.
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] A new membership (10.1.10.10:29540) was formed. Members joined: 1
Dec 14 18:18:23 s020 corosync[8104]: [CMAP  ] Received config version (38) is different than my config version (
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Unloading all Corosync service engines.
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync configuration map access
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync configuration service
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync cluster closed process group ser
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync profile loading service
Dec 14 18:18:23 s020 corosync[8104]: [MAIN  ] Corosync Cluster Engine exiting normally
Dec 14 18:19:24 s020 corosync[8097]: Starting Corosync Cluster Engine (corosync): [FAILED]
Dec 14 18:19:24 s020 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec 14 18:19:24 s020 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 14 18:19:24 s020 systemd[1]: Unit corosync.service entered failed state.
 
replacing the outdated corosync config with the current one and restarting the corosync service should resync the cluster
 
  • Like
Reactions: atenorio
replacing the outdated corosync config with the current one and restarting the corosync service should resync the cluster

I am unable to mount /etc/pve in local mode - in order to copy the files. /etc/pve is mounted read only.

these days, how do I mount /etc/pve local mode? the old way does not work:
Code:
/usr/bin/pmxcfs -l
[main] notice: unable to aquire pmxcfs lock - trying again

[main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
[main] notice: exit proxmox configuration filesystem (-1)
 
this is how:
Code:
systemctl stop pve-cluster
/usr/bin/pmxcfs -l
[main] notice: forcing local mode (althought corosync.conf exists)
 
replacing the outdated corosync config with the current one and restarting the corosync service should resync the cluster
I tried copying corosync.conf from other healthy node to /etc/corosync/corosync.conf and it worked. I mean, it synced and fixed the problem above. Is this recommended way to fix this kind of problem?
 
other than avoiding that problem by not adding nodes to unhealthy clusters, yes ;)
 
  • Like
Reactions: Syrrys
In my case it was time synchronization issue:
I had lost connection to a node than after reading this thread first checked node's time and found it is out of synchronization 'cause it did not have Internet connection due to wrong gateway. So I upped the Internet 1st, then synced time and restarted pve-cluster.service and corosync.service. Afther this I got the node back to cluster.

many thx to everyone )
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!