Unable to join cluster - corosync.service failed

Jan 15, 2018
3
0
6
Hi,

I have a 13 node cluster running PVE 5.1. I have just rebuilt a server to add into the cluster and when adding it just fails constantly:

Code:
[11:45 root@pve14:~]# pvecm add pve1 -f               
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
copy corosync auth key
stopping pve-cluster service
backup old database
delete old backup '/var/lib/pve-cluster/backup/config-1516010417.sql.gz'
Job for corosync.service failed because the control process exited with error code.
See "systemctl status corosync.service" and "journalctl -xe" for details.
waiting for quorum...

This then hangs forever. Corosync refuses to start every time:

Code:
[11:50 root@pve14:~]# systemctl status corosync.service
‚ó corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2018-01-15 11:48:41 GMT; 1min 27s ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
  Process: 6135 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=20)
 Main PID: 6135 (code=exited, status=20)
      CPU: 72ms

Jan 15 11:48:41 pve14 corosync[6135]: info    [WD    ] no resources configured.
Jan 15 11:48:41 pve14 corosync[6135]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
Jan 15 11:48:41 pve14 corosync[6135]: notice  [QUORUM] Using quorum provider corosync_votequorum
Jan 15 11:48:41 pve14 corosync[6135]: crit    [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jan 15 11:48:41 pve14 corosync[6135]: error   [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jan 15 11:48:41 pve14 corosync[6135]: error   [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.
Jan 15 11:48:41 pve14 systemd[1]: corosync.service: Main process exited, code=exited, status=20/n/a
Jan 15 11:48:41 pve14 systemd[1]: Failed to start Corosync Cluster Engine.
Jan 15 11:48:41 pve14 systemd[1]: corosync.service: Unit entered failed state.
Jan 15 11:48:41 pve14 systemd[1]: corosync.service: Failed with result 'exit-code'.

Cluster status:

Code:
[11:50 root@pve1:~]# pvecm status                   
Quorum information
------------------
Date:             Mon Jan 15 11:50:56 2018
Quorum provider:  corosync_votequorum
Nodes:            13
Node ID:          0x00000001
Ring ID:          1/2176
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   13
Highest expected: 13
Total votes:      13
Quorum:           7 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.168.3.1 (local)
0x00000004          1 10.168.3.2
0x00000005          1 10.168.3.3
0x00000006          1 10.168.3.4
0x00000007          1 10.168.3.5
0x00000009          1 10.168.3.6
0x00000008          1 10.168.3.7
0x0000000d          1 10.168.3.8
0x0000000a          1 10.168.3.9
0x00000002          1 10.168.3.10
0x00000003          1 10.168.3.11
0x0000000b          1 10.168.3.12
0x0000000c          1 10.168.3.13


Any ideas how I can get this box to join the cluster?
 
Jan 15, 2018
3
0
6
corosync.conf from existing node:

Code:
[13:21 root@pve1:~]# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    nodeid: 7
    quorum_votes: 1
    ring0_addr: pve5
  }

  node {
    nodeid: 6
    quorum_votes: 1
    ring0_addr: pve4
  }

  node {
    nodeid: 4
    quorum_votes: 1
    ring0_addr: pve2
  }

  node {
    nodeid: 13
    quorum_votes: 1
    ring0_addr: pve8
  }

  node {
    nodeid: 8
    quorum_votes: 1
    ring0_addr: pve7
  }

  node {
    nodeid: 5
    quorum_votes: 1
    ring0_addr: pve3
  }

  node {
    nodeid: 9
    quorum_votes: 1
    ring0_addr: pve6
  }

  node {
    nodeid: 2
    quorum_votes: 1
    ring0_addr: pve10
  }

  node {
    nodeid: 1
    quorum_votes: 1
    ring0_addr: pve1
  }

  node {
    nodeid: 10
    quorum_votes: 1
    ring0_addr: pve9
  }

  node {
    nodeid: 3
    quorum_votes: 1
    ring0_addr: pve11
  }

  node {
    nodeid: 11
    quorum_votes: 1
    ring0_addr: pve12
  }

  node {
    nodeid: 12
    quorum_votes: 1
    ring0_addr: pve13
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: proxmox
  config_version: 14
  ip_version: ipv4
  secauth: on
  transport: udpu
  version: 2
  interface {
    bindnetaddr: 10.168.3.0
    ringnumber: 0
    member {
      memberaddr: 10.168.3.1
    }

    member {
      memberaddr: 10.168.3.2
    }

    member {
      memberaddr: 10.168.3.3
    }

    member {
      memberaddr: 10.168.3.4
    }

    member {
      memberaddr: 10.168.3.5
    }

    member {
      memberaddr: 10.168.3.6
    }

    member {
      memberaddr: 10.168.3.7
    }

    member {
      memberaddr: 10.168.3.8
    }

    member {
      memberaddr: 10.168.3.9
    }

    member {
      memberaddr: 10.168.3.10
    }

    member {
      memberaddr: 10.168.3.11
    }

  }

}
 
Jan 15, 2018
3
0
6
After messing around, I seem to have corrected this by manually adding the node to the master corosync.conf

Code:
  node {
    nodeid: 14
    quorum_votes: 1
    ring0_addr: pve14
  }

Then re-running pvecm add:

Code:
[11:01 root@pve14:~]# pvecm add pve1 -f               
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
copy corosync auth key
stopping pve-cluster service
backup old database
waiting for quorum...OK
generating node certificates
merge known_hosts file
restart services
successfully added node 'pve14' to cluster.

Not entirely sure this is the correct way to do it, but it seems to have worked. Any feedback would be grateful.
 

markmarkmia

New Member
Feb 5, 2018
23
0
1
47
I'm pretty much finding the same thing. I had set up PRoxmox a couple versions ago and no issues with Corosync (other than forgetting to add the cluster members to all the /etc/hosts file). But the only way I've gotten it to work properly on the latest version is manually hack everything. I just tried to add another node today and same issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!