I have been pulling my hair out today and have reinstalled Proxmox on 3 different servers about 4 times each so far because not only does it not work but I can't seem to revert back either.
I have 3 servers (nodes) running Proxmox 4.2-23
All 3 servers are on an RPN network as well as having public IP addresses.
On server 1 I type:
On server 2 & 3 I type:
10.91.150.134 is the RPN IP of the first server where I created the cluster.
And there the system hangs. I can no longer access the web interface for servers 2 and 3 and have to reinstall Proxmox.
Multicast is enabled on the RPN:
Output from systemctl status corosync.service:
Output from journalctl -xn:
There are no containers or virtual machines running on any of the nodes, they are fresh installations and the only changes made are I have install sudo, added my user to the sudo group and configured eth1 for the RPN on each server.
It is driving me insane and costing me huge amounts of time. Does anyone know how I can fix this or at least get the second and third nodes working again once the pvecm add fails so I don't have to waste so much time reinstall every single time?
Thanks in advance.
I have 3 servers (nodes) running Proxmox 4.2-23
All 3 servers are on an RPN network as well as having public IP addresses.
On server 1 I type:
Code:
# pvecm create tpc1
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/urandom.
Writing corosync key to /etc/corosync/authkey.
#
On server 2 & 3 I type:
Code:
# pvecm add 10.91.150.134
The authenticity of host '10.91.150.134 (10.91.150.134)' can't be established.
ECDSA key fingerprint is ########################################.
Are you sure you want to continue connecting (yes/no)? yes
root@10.91.150.134's password:
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed. See 'systemctl status corosync.service' and 'journalctl -xn' for details.
waiting for quorum...
10.91.150.134 is the RPN IP of the first server where I created the cluster.
And there the system hangs. I can no longer access the web interface for servers 2 and 3 and have to reinstall Proxmox.
Multicast is enabled on the RPN:
Code:
# ifconfig eth1 | grep MTU
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
#
Output from systemctl status corosync.service:
Code:
# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: failed (Result: exit-code) since Thu 2016-09-22 19:38:07 CEST; 19min ago
Process: 11271 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)
Sep 22 19:37:06 pmn2 corosync[11280]: [QB ] server name: cmap
Sep 22 19:37:06 pmn2 corosync[11280]: [SERV ] Service engine loaded: corosync configuration service [1]
Sep 22 19:37:06 pmn2 corosync[11280]: [QB ] server name: cfg
Sep 22 19:37:06 pmn2 corosync[11280]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 22 19:37:06 pmn2 corosync[11280]: [QB ] server name: cpg
Sep 22 19:37:06 pmn2 corosync[11280]: [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 22 19:37:06 pmn2 corosync[11280]: [QUORUM] Using quorum provider corosync_votequorum
Sep 22 19:37:06 pmn2 corosync[11280]: [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Sep 22 19:37:06 pmn2 corosync[11280]: [SERV ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Sep 22 19:38:07 pmn2 corosync[11271]: Starting Corosync Cluster Engine (corosync): [FAILED]
Sep 22 19:38:07 pmn2 systemd[1]: corosync.service: control process exited, code=exited status=1
Sep 22 19:38:07 pmn2 systemd[1]: Failed to start Corosync Cluster Engine.
Sep 22 19:38:07 pmn2 systemd[1]: Unit corosync.service entered failed state.
#
Output from journalctl -xn:
Code:
# journalctl -xn
-- Logs begin at Thu 2016-09-22 15:57:08 CEST, end at Thu 2016-09-22 19:59:29 CEST. --
Sep 22 19:59:17 pmn2 pmxcfs[11262]: [dcdb] crit: cpg_initialize failed: 2
Sep 22 19:59:17 pmn2 pmxcfs[11262]: [status] crit: cpg_initialize failed: 2
Sep 22 19:59:23 pmn2 pmxcfs[11262]: [quorum] crit: quorum_initialize failed: 2
Sep 22 19:59:23 pmn2 pmxcfs[11262]: [confdb] crit: cmap_initialize failed: 2
Sep 22 19:59:23 pmn2 pmxcfs[11262]: [dcdb] crit: cpg_initialize failed: 2
Sep 22 19:59:23 pmn2 pmxcfs[11262]: [status] crit: cpg_initialize failed: 2
Sep 22 19:59:29 pmn2 pmxcfs[11262]: [quorum] crit: quorum_initialize failed: 2
Sep 22 19:59:29 pmn2 pmxcfs[11262]: [confdb] crit: cmap_initialize failed: 2
Sep 22 19:59:29 pmn2 pmxcfs[11262]: [dcdb] crit: cpg_initialize failed: 2
Sep 22 19:59:29 pmn2 pmxcfs[11262]: [status] crit: cpg_initialize failed: 2
#
There are no containers or virtual machines running on any of the nodes, they are fresh installations and the only changes made are I have install sudo, added my user to the sudo group and configured eth1 for the RPN on each server.
It is driving me insane and costing me huge amounts of time. Does anyone know how I can fix this or at least get the second and third nodes working again once the pvecm add fails so I don't have to waste so much time reinstall every single time?
Thanks in advance.