I have a cluster of 3 nodes. They were all running happily as below.
root@pve2:/etc/pve# pveversion -v
proxmox-ve: 5.1-30 (running kernel: 4.13.8-3-pve)
pve-manager: 5.1-38 (running version: 5.1-38/1e9bc777)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.13.8-3-pve: 4.13.8-30
libpve-http-server-perl: 2.0-7
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-22
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-3
pve-container: 2.0-17
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
I upgraded one host (pve1) with apt dist-upgrade and it now looks like this.
root@pve1:/etc/apt# pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.10.17-2-pve: 4.10.17-20
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
pve-zsync: 1.6-15
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9
The process was error free, but after a reboot, it will no longer join the cluster because corosync won't start.
Here is the output from journalctl -xe
-- Subject: Unit corosync.service has begun start-up
-- Defined-By: systemd
-- Support: yadayada
--
-- Unit corosync.service has begun starting up.
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Apr 16 20:05:23 pve1 corosync[60658]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Apr 16 20:05:23 pve1 corosync[60658]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd s
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie
Apr 16 20:05:23 pve1 corosync[60658]: error [MAIN ] parse error in config: This totem parser can only parse version 2 configurations.
Apr 16 20:05:23 pve1 corosync[60658]: error [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1308.
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] parse error in config: This totem parser can only parse version 2 configurations.
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1308.
Apr 16 20:05:23 pve1 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Apr 16 20:05:23 pve1 systemd[1]: Failed to start Corosync Cluster Engine.
-- Subject: Unit corosync.service has failed
-- Defined-By: systemd
-- Support: yadayada
--
-- Unit corosync.service has failed.
--
-- The result is failed.
Apr 16 20:05:23 pve1 systemd[1]: corosync.service: Unit entered failed state.
Apr 16 20:05:23 pve1 systemd[1]: corosync.service: Failed with result 'exit-code'.
Apr 16 20:05:29 pve1 pmxcfs[2393]: [quorum] crit: quorum_initialize failed: 2
Apr 16 20:05:29 pve1 pmxcfs[2393]: [confdb] crit: cmap_initialize failed: 2
Apr 16 20:05:29 pve1 pmxcfs[2393]: [dcdb] crit: cpg_initialize failed: 2
Apr 16 20:05:29 pve1 pmxcfs[2393]: [status] crit: cpg_initialize failed: 2
I am lost. I don't dare ugrade the other hosts in the hope that it is just a version issue. I have enough capacity to run with 2 hosts, but not 1.
Can anyone point me in the right direction please?
root@pve2:/etc/pve# pveversion -v
proxmox-ve: 5.1-30 (running kernel: 4.13.8-3-pve)
pve-manager: 5.1-38 (running version: 5.1-38/1e9bc777)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.13.8-3-pve: 4.13.8-30
libpve-http-server-perl: 2.0-7
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-22
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-3
pve-container: 2.0-17
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
I upgraded one host (pve1) with apt dist-upgrade and it now looks like this.
root@pve1:/etc/apt# pveversion -v
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.10.17-2-pve: 4.10.17-20
corosync: 2.4.2-pve4
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-15
pve-cluster: 5.0-25
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-8
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
pve-zsync: 1.6-15
qemu-server: 5.0-25
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9
The process was error free, but after a reboot, it will no longer join the cluster because corosync won't start.
Here is the output from journalctl -xe
-- Subject: Unit corosync.service has begun start-up
-- Defined-By: systemd
-- Support: yadayada
--
-- Unit corosync.service has begun starting up.
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Apr 16 20:05:23 pve1 corosync[60658]: notice [MAIN ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
Apr 16 20:05:23 pve1 corosync[60658]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd s
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie
Apr 16 20:05:23 pve1 corosync[60658]: error [MAIN ] parse error in config: This totem parser can only parse version 2 configurations.
Apr 16 20:05:23 pve1 corosync[60658]: error [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1308.
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] parse error in config: This totem parser can only parse version 2 configurations.
Apr 16 20:05:23 pve1 corosync[60658]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1308.
Apr 16 20:05:23 pve1 systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a
Apr 16 20:05:23 pve1 systemd[1]: Failed to start Corosync Cluster Engine.
-- Subject: Unit corosync.service has failed
-- Defined-By: systemd
-- Support: yadayada
--
-- Unit corosync.service has failed.
--
-- The result is failed.
Apr 16 20:05:23 pve1 systemd[1]: corosync.service: Unit entered failed state.
Apr 16 20:05:23 pve1 systemd[1]: corosync.service: Failed with result 'exit-code'.
Apr 16 20:05:29 pve1 pmxcfs[2393]: [quorum] crit: quorum_initialize failed: 2
Apr 16 20:05:29 pve1 pmxcfs[2393]: [confdb] crit: cmap_initialize failed: 2
Apr 16 20:05:29 pve1 pmxcfs[2393]: [dcdb] crit: cpg_initialize failed: 2
Apr 16 20:05:29 pve1 pmxcfs[2393]: [status] crit: cpg_initialize failed: 2
I am lost. I don't dare ugrade the other hosts in the hope that it is just a version issue. I have enough capacity to run with 2 hosts, but not 1.
Can anyone point me in the right direction please?