[SOLVED] one node "Cannot initialize CMAP service"

RobFantini

Famous Member
May 24, 2012
2,043
111
133
Boston,Mass
Hello

I have a 4 node cluster.

One is a 'cold spare' I do not usually leave it on line.

However I learned that it should be on line when I add a node.

It was powered off when I added a node yesterday.

After powering on today, it is not part of the cluster.

some debugging:
Code:
# journalctl -xn
-- Logs begin at Wed 2016-12-14 17:51:13 EST, end at Wed 2016-12-14 18:42:22 EST. --
Dec 14 18:42:10 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [quorum] crit: quorum_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [confdb] crit: cmap_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [dcdb] crit: cpg_initialize failed: 2
Dec 14 18:42:16 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
Dec 14 18:42:16 s020 pvestatd[4584]: mount error: mount.nfs: access denied by server while mounting 10.2.2.181:/bkup/bkup-long-t
Dec 14 18:42:22 s020 pmxcfs[8087]: [quorum] crit: quorum_initialize failed: 2
Dec 14 18:42:22 s020 pmxcfs[8087]: [confdb] crit: cmap_initialize failed: 2
Dec 14 18:42:22 s020 pmxcfs[8087]: [dcdb] crit: cpg_initialize failed: 2
Dec 14 18:42:22 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2

Code:
# pvecm nodes
Cannot initialize CMAP service

any suggestions to fix this?



thanks, Rob Fantini

PS: It may be that it is impossible to do anything but reinstall. In which case the info would be nice to see as a warning before adding a new node.
that makes for complicated code i think. so a wiki page section for this would be good.
 
Last edited:
more info

corosync.conf
Code:
quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster-v4
  config_version: 37
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
  bindnetaddr: 10.1.10.181
  ringnumber: 0
  }

}

and from a working node:
Code:
quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster-v4
  config_version: 38
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
  bindnetaddr: 10.1.10.181
  ringnumber: 0
  }

}
 
I hope those are only snippets of your cosorsync config and not the whole one ;)

what does "journalctl -b -u corosync" say?
 
that was just last part including totem - i wanted to show the version.

Code:
-- Logs begin at Wed 2016-12-14 17:51:13 EST, end at Thu 2016-12-15 06:32:54 EST. --
Dec 14 17:51:26 s020 systemd[1]: Starting Corosync Cluster Engine...
Dec 14 17:51:26 s020 corosync[4577]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide se
Dec 14 17:51:26 s020 corosync[4577]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash:
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] The network interface [10.1.10.67] is now up.
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cmap
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cfg
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync cluster closed process group servi
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cpg
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 14 17:51:26 s020 corosync[4578]: [QUORUM] Using quorum provider corosync_votequorum
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: votequorum
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: quorum
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] A new membership (10.1.10.67:29528) was formed. Members joined: 2
Dec 14 17:51:26 s020 corosync[4578]: [QUORUM] Members[1]: 2
Dec 14 17:51:26 s020 corosync[4578]: [MAIN  ] Completed service synchronization, ready to provide service.
Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] A new membership (10.1.10.10:29532) was formed. Members joined: 1
Dec 14 17:51:26 s020 corosync[4578]: [CMAP  ] Received config version (38) is different than my config version (
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Unloading all Corosync service engines.
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync configuration map access
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync configuration service
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync cluster closed process group ser
Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync profile loading service
Dec 14 17:51:26 s020 corosync[4578]: [MAIN  ] Corosync Cluster Engine exiting normally
Dec 14 17:52:27 s020 corosync[4566]: Starting Corosync Cluster Engine (corosync): [FAILED]
Dec 14 17:52:27 s020 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec 14 17:52:27 s020 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 14 17:52:27 s020 systemd[1]: Unit corosync.service entered failed state.
Dec 14 18:18:23 s020 systemd[1]: Starting Corosync Cluster Engine...
Dec 14 18:18:23 s020 corosync[8103]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide se
Dec 14 18:18:23 s020 corosync[8103]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash:
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] The network interface [10.1.10.67] is now up.
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cmap
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cfg
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync cluster closed process group servi
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cpg
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 14 18:18:23 s020 corosync[8104]: [QUORUM] Using quorum provider corosync_votequorum
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: votequorum
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: quorum
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] A new membership (10.1.10.67:29536) was formed. Members joined: 2
Dec 14 18:18:23 s020 corosync[8104]: [QUORUM] Members[1]: 2
Dec 14 18:18:23 s020 corosync[8104]: [MAIN  ] Completed service synchronization, ready to provide service.
Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] A new membership (10.1.10.10:29540) was formed. Members joined: 1
Dec 14 18:18:23 s020 corosync[8104]: [CMAP  ] Received config version (38) is different than my config version (
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Unloading all Corosync service engines.
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync configuration map access
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync configuration service
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync cluster closed process group ser
Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync profile loading service
Dec 14 18:18:23 s020 corosync[8104]: [MAIN  ] Corosync Cluster Engine exiting normally
Dec 14 18:19:24 s020 corosync[8097]: Starting Corosync Cluster Engine (corosync): [FAILED]
Dec 14 18:19:24 s020 systemd[1]: corosync.service: control process exited, code=exited status=1
Dec 14 18:19:24 s020 systemd[1]: Failed to start Corosync Cluster Engine.
Dec 14 18:19:24 s020 systemd[1]: Unit corosync.service entered failed state.
 
replacing the outdated corosync config with the current one and restarting the corosync service should resync the cluster

I am unable to mount /etc/pve in local mode - in order to copy the files. /etc/pve is mounted read only.

these days, how do I mount /etc/pve local mode? the old way does not work:
Code:
/usr/bin/pmxcfs -l
[main] notice: unable to aquire pmxcfs lock - trying again

[main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
[main] notice: exit proxmox configuration filesystem (-1)
 
In my case it was time synchronization issue:
I had lost connection to a node than after reading this thread first checked node's time and found it is out of synchronization 'cause it did not have Internet connection due to wrong gateway. So I upped the Internet 1st, then synced time and restarted pve-cluster.service and corosync.service. Afther this I got the node back to cluster.

many thx to everyone )