[SOLVED] one node "Cannot initialize CMAP service"

Discussion in 'Proxmox VE: Installation and configuration' started by RobFantini, Dec 15, 2016.

  1. RobFantini

    RobFantini Well-Known Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,596
    Likes Received:
    26
    Hello

    I have a 4 node cluster.

    One is a 'cold spare' I do not usually leave it on line.

    However I learned that it should be on line when I add a node.

    It was powered off when I added a node yesterday.

    After powering on today, it is not part of the cluster.

    some debugging:
    Code:
    # journalctl -xn
    -- Logs begin at Wed 2016-12-14 17:51:13 EST, end at Wed 2016-12-14 18:42:22 EST. --
    Dec 14 18:42:10 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
    Dec 14 18:42:16 s020 pmxcfs[8087]: [quorum] crit: quorum_initialize failed: 2
    Dec 14 18:42:16 s020 pmxcfs[8087]: [confdb] crit: cmap_initialize failed: 2
    Dec 14 18:42:16 s020 pmxcfs[8087]: [dcdb] crit: cpg_initialize failed: 2
    Dec 14 18:42:16 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
    Dec 14 18:42:16 s020 pvestatd[4584]: mount error: mount.nfs: access denied by server while mounting 10.2.2.181:/bkup/bkup-long-t
    Dec 14 18:42:22 s020 pmxcfs[8087]: [quorum] crit: quorum_initialize failed: 2
    Dec 14 18:42:22 s020 pmxcfs[8087]: [confdb] crit: cmap_initialize failed: 2
    Dec 14 18:42:22 s020 pmxcfs[8087]: [dcdb] crit: cpg_initialize failed: 2
    Dec 14 18:42:22 s020 pmxcfs[8087]: [status] crit: cpg_initialize failed: 2
    
    Code:
    # pvecm nodes
    Cannot initialize CMAP service
    
    any suggestions to fix this?



    thanks, Rob Fantini

    PS: It may be that it is impossible to do anything but reinstall. In which case the info would be nice to see as a warning before adding a new node.
    that makes for complicated code i think. so a wiki page section for this would be good.
     
    #1 RobFantini, Dec 15, 2016
    Last edited: Dec 15, 2016
  2. RobFantini

    RobFantini Well-Known Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,596
    Likes Received:
    26
    more info

    corosync.conf
    Code:
    quorum {
      provider: corosync_votequorum
    }
    
    totem {
      cluster_name: cluster-v4
      config_version: 37
      ip_version: ipv4
      secauth: on
      version: 2
      interface {
      bindnetaddr: 10.1.10.181
      ringnumber: 0
      }
    
    }
    
    and from a working node:
    Code:
    quorum {
      provider: corosync_votequorum
    }
    
    totem {
      cluster_name: cluster-v4
      config_version: 38
      ip_version: ipv4
      secauth: on
      version: 2
      interface {
      bindnetaddr: 10.1.10.181
      ringnumber: 0
      }
    
    }
    
     
  3. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    I hope those are only snippets of your cosorsync config and not the whole one ;)

    what does "journalctl -b -u corosync" say?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. RobFantini

    RobFantini Well-Known Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,596
    Likes Received:
    26
    that was just last part including totem - i wanted to show the version.

    Code:
    -- Logs begin at Wed 2016-12-14 17:51:13 EST, end at Thu 2016-12-15 06:32:54 EST. --
    Dec 14 17:51:26 s020 systemd[1]: Starting Corosync Cluster Engine...
    Dec 14 17:51:26 s020 corosync[4577]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide se
    Dec 14 17:51:26 s020 corosync[4577]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
    Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] Initializing transport (UDP/IP Multicast).
    Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash:
    Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] The network interface [10.1.10.67] is now up.
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync configuration map access [0]
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cmap
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync configuration service [1]
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cfg
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync cluster closed process group servi
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: cpg
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync profile loading service [4]
    Dec 14 17:51:26 s020 corosync[4578]: [QUORUM] Using quorum provider corosync_votequorum
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: votequorum
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] server name: quorum
    Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] A new membership (10.1.10.67:29528) was formed. Members joined: 2
    Dec 14 17:51:26 s020 corosync[4578]: [QUORUM] Members[1]: 2
    Dec 14 17:51:26 s020 corosync[4578]: [MAIN  ] Completed service synchronization, ready to provide service.
    Dec 14 17:51:26 s020 corosync[4578]: [TOTEM ] A new membership (10.1.10.10:29532) was formed. Members joined: 1
    Dec 14 17:51:26 s020 corosync[4578]: [CMAP  ] Received config version (38) is different than my config version (
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Unloading all Corosync service engines.
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync configuration map access
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync configuration service
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync cluster closed process group ser
    Dec 14 17:51:26 s020 corosync[4578]: [QB  ] withdrawing server sockets
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
    Dec 14 17:51:26 s020 corosync[4578]: [SERV  ] Service engine unloaded: corosync profile loading service
    Dec 14 17:51:26 s020 corosync[4578]: [MAIN  ] Corosync Cluster Engine exiting normally
    Dec 14 17:52:27 s020 corosync[4566]: Starting Corosync Cluster Engine (corosync): [FAILED]
    Dec 14 17:52:27 s020 systemd[1]: corosync.service: control process exited, code=exited status=1
    Dec 14 17:52:27 s020 systemd[1]: Failed to start Corosync Cluster Engine.
    Dec 14 17:52:27 s020 systemd[1]: Unit corosync.service entered failed state.
    Dec 14 18:18:23 s020 systemd[1]: Starting Corosync Cluster Engine...
    Dec 14 18:18:23 s020 corosync[8103]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide se
    Dec 14 18:18:23 s020 corosync[8103]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
    Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] Initializing transport (UDP/IP Multicast).
    Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash:
    Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] The network interface [10.1.10.67] is now up.
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync configuration map access [0]
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cmap
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync configuration service [1]
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cfg
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync cluster closed process group servi
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: cpg
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync profile loading service [4]
    Dec 14 18:18:23 s020 corosync[8104]: [QUORUM] Using quorum provider corosync_votequorum
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: votequorum
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] server name: quorum
    Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] A new membership (10.1.10.67:29536) was formed. Members joined: 2
    Dec 14 18:18:23 s020 corosync[8104]: [QUORUM] Members[1]: 2
    Dec 14 18:18:23 s020 corosync[8104]: [MAIN  ] Completed service synchronization, ready to provide service.
    Dec 14 18:18:23 s020 corosync[8104]: [TOTEM ] A new membership (10.1.10.10:29540) was formed. Members joined: 1
    Dec 14 18:18:23 s020 corosync[8104]: [CMAP  ] Received config version (38) is different than my config version (
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Unloading all Corosync service engines.
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync vote quorum service v1.0
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync configuration map access
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync configuration service
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync cluster closed process group ser
    Dec 14 18:18:23 s020 corosync[8104]: [QB  ] withdrawing server sockets
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
    Dec 14 18:18:23 s020 corosync[8104]: [SERV  ] Service engine unloaded: corosync profile loading service
    Dec 14 18:18:23 s020 corosync[8104]: [MAIN  ] Corosync Cluster Engine exiting normally
    Dec 14 18:19:24 s020 corosync[8097]: Starting Corosync Cluster Engine (corosync): [FAILED]
    Dec 14 18:19:24 s020 systemd[1]: corosync.service: control process exited, code=exited status=1
    Dec 14 18:19:24 s020 systemd[1]: Failed to start Corosync Cluster Engine.
    Dec 14 18:19:24 s020 systemd[1]: Unit corosync.service entered failed state.
    
     
  5. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    replacing the outdated corosync config with the current one and restarting the corosync service should resync the cluster
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. RobFantini

    RobFantini Well-Known Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,596
    Likes Received:
    26
    I am unable to mount /etc/pve in local mode - in order to copy the files. /etc/pve is mounted read only.

    these days, how do I mount /etc/pve local mode? the old way does not work:
    Code:
    /usr/bin/pmxcfs -l
    [main] notice: unable to aquire pmxcfs lock - trying again
    
    [main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
    [main] notice: exit proxmox configuration filesystem (-1)
    
     
  7. RobFantini

    RobFantini Well-Known Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,596
    Likes Received:
    26
    this is how:
    Code:
    systemctl stop pve-cluster
    /usr/bin/pmxcfs -l
    [main] notice: forcing local mode (althought corosync.conf exists)
    
     
  8. myzamri

    myzamri New Member

    Joined:
    May 11, 2017
    Messages:
    11
    Likes Received:
    1
    I tried copying corosync.conf from other healthy node to /etc/corosync/corosync.conf and it worked. I mean, it synced and fixed the problem above. Is this recommended way to fix this kind of problem?
     
  9. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    other than avoiding that problem by not adding nodes to unhealthy clusters, yes ;)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice