Problem adding node to existing cluster

May 31, 2018
11
0
6
48
I have a cluster currently running with 3 nodes

pve-01 (10.11.0.101), pve-02 (10.11.0.102), pve-03 (10.11.0.103)

I have setup a forth node pve-04 (10.11.0.104) and am trying to join it to the cluster.

From pve-04 i can ssh into pve-01, pve-02, pve-03 with no password (ssh key auth), and i can ssh into pve-04 from any of the other nodes the same way.

if i run pvecm add 10.11.0.101 from pve-04 to join it to the cluster i get the following:

root@pve-04:~ # pvecm add 10.11.0.101
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed because the control process exited with error code.
See "systemctl status corosync.service" and "journalctl -xe" for details.
waiting for quorum...


If i run systemctl status corosync.service, i get this:
root@pve-04:~ # systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2018-05-31 16:17:41 AEST; 9min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 2994 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=20)
Main PID: 2994 (code=exited, status=20)
CPU: 108ms

May 31 16:17:41 pve-04 corosync[2994]: info [WD ] no resources configured.
May 31 16:17:41 pve-04 corosync[2994]: notice [SERV ] Service engine loaded: corosync watchdog service [7]
May 31 16:17:41 pve-04 corosync[2994]: notice [QUORUM] Using quorum provider corosync_votequorum
May 31 16:17:41 pve-04 corosync[2994]: crit [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
May 31 16:17:41 pve-04 corosync[2994]: error [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
May 31 16:17:41 pve-04 corosync[2994]: error [MAIN ] Corosync Cluster Engine exiting with status 20 at service.c:356.
May 31 16:17:41 pve-04 systemd[1]: corosync.service: Main process exited, code=exited, status=20/n/a
May 31 16:17:41 pve-04 systemd[1]: Failed to start Corosync Cluster Engine.
May 31 16:17:41 pve-04 systemd[1]: corosync.service: Unit entered failed state.
May 31 16:17:41 pve-04 systemd[1]: corosync.service: Failed with result 'exit-code'.
root@pve-04:~ #
 
Hi,

the error message says your config is not ok.
Please send the config.

Code:
cat /etc/pve/corosync.conf
 
This is my corosync file before adding the pve-04

Code:
root@pve-01:~ # vim /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}
nodelist {
  node {
    name: pve-01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: corosync-01
  }
  node {
    name: pve-02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: corosync-02
  }
  node {
    name: pve-03
    nodeid: 3
    quorum_votes: 1
    ring0_addr: corosync-03
  }
}
quorum {
  provider: corosync_votequorum
}
totem {
  cluster_name: production
  config_version: 11
  interface {
    bindnetaddr: 10.11.0.101
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

and this is my hosts file for reference:

Code:
127.0.0.1 localhost.localdomain localhost
10.13.0.101 pve-01.alphaitcentre.com.au pve-01 pvelocalhost
10.13.0.102 pve-02.alphaitcentre.com.au pve-02
10.13.0.103 pve-03.alphaitcentre.com.au pve-03
10.13.0.104 pve-04.alphaitcentre.com.au pve-04
10.11.0.101 corosync-01
10.11.0.102 corosync-02
10.11.0.103 corosync-03
10.11.0.104 corosync-04
 
AND.... you just helped me solve the issue. It has been a long time since i added a node to the cluster, i forgot i needed to supply the ring0_addr

I just ran the following on pve-04 and it successfully added the node.
Code:
pvecm add 10.13.0.101 -ring0_addr 10.11.0.104