[SOLVED] problems with corosync and quorum

r4a5a88

Renowned Member
Jun 15, 2016
65
4
73
37
hi

i am in the process of upgrading my proxmox cluster to jessie
and cannot add a node
I keep getting , that my corosync fails to start and this :

Jun 21 11:32:29 pro-06-dmed pmxcfs[17721]: [status] crit: cpg_initialize failed: 2
Jun 21 11:32:29 pro-06-dmed pmxcfs[17721]: [dcdb] crit: cpg_initialize failed: 2
Jun 21 11:32:29 pro-06-dmed pmxcfs[17721]: [confdb] crit: cmap_initialize failed: 2
Jun 21 11:32:29 pro-06-dmed pmxcfs[17721]: [quorum] crit: quorum_initialize failed: 2
Jun 21 11:32:23 pro-06-dmed pmxcfs[17721]: [status] crit: cpg_initialize failed: 2
Jun 21 11:32:23 pro-06-dmed pmxcfs[17721]: [dcdb] crit: cpg_initialize failed: 2
Jun 21 11:32:23 pro-06-dmed pmxcfs[17721]: [confdb] crit: cmap_initialize failed: 2
Jun 21 11:32:23 pro-06-dmed pmxcfs[17721]: [quorum] crit: quorum_initialize failed: 2
Jun 21 11:32:17 pro-06-dmed pmxcfs[17721]: [status] crit: cpg_initialize failed: 2
Jun 21 11:32:17 pro-06-dmed pmxcfs[17721]: [dcdb] crit: cpg_initialize failed: 2
Jun 21 11:32:17 pro-06-dmed pmxcfs[17721]: [confdb] crit: cmap_initialize failed: 2
Jun 21 11:32:17 pro-06-dmed pmxcfs[17721]: [quorum] crit: quorum_initialize failed: 2
Jun 21 11:32:11 pro-06-dmed pmxcfs[17721]: [status] crit: cpg_initialize failed: 2
Jun 21 11:32:11 pro-06-dmed pmxcfs[17721]: [dcdb] crit: cpg_initialize failed: 2
Jun 21 11:32:11 pro-06-dmed pmxcfs[17721]: [confdb] crit: cmap_initialize failed: 2
Jun 21 11:32:11 pro-06-dmed pmxcfs[17721]: [quorum] crit: quorum_initialize failed: 2
Jun 21 11:32:06 pro-06-dmed systemd[1]: Unit corosync.service entered failed state.
Jun 21 11:32:06 pro-06-dmed systemd[1]: Failed to start Corosync Cluster Engine.
Jun 21 11:32:06 pro-06-dmed systemd[1]: corosync.service: control process exited, code=exited status=1
Jun 21 11:32:06 pro-06-dmed corosync[17730]: Starting Corosync Cluster Engine (corosync): [FAILED]

in my logs
I tried reinstalling this Server twice
now with an new IP
I entered pvecm e 1 a bunch of times and restarted the cluster

what should I do??
 
it seems corosync can't find the private key under

/etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1626.

I added the node with pvecm add xxx.xxx.xxx.xxx --force
it seems to not create the nodes directory
can you create it all manually ?
 
thx , yes I did
I think 4.X Clusters don't have a clusterconf any more
I couldn't find one on any other node

the other thing thats happend was :
when I installed the Node all Folders were there ( in the the /etc/pve/local dir )
but after I while adding the node to the cluster , the folders an files were deleted

-r--r----- 1 root www-data 159 Jan 1 1970 .clusterlog
-r--r----- 1 root www-data 573 Jun 22 08:19 corosync.conf
-rw-r----- 1 root www-data 2 Jan 1 1970 .debug
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 local -> nodes/pro-06-dmed
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 lxc -> nodes/pro-06-dmed/lxc
-r--r----- 1 root www-data 44 Jan 1 1970 .members
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 openvz -> nodes/pro-06-dmed/openvz
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 qemu-server -> nodes/pro-06-dmed/qemu-server
-r--r----- 1 root www-data 216 Jan 1 1970 .rrd
-r--r----- 1 root www-data 383 Jan 1 1970 .version
-r--r----- 1 root www-data 18 Jan 1 1970 .vmlist

this is how the directory looks like
the folder nodes is missing
I cannot recreate it with mkdir as root
 
this is my main cluster node
pveversion -v
proxmox-ve: 4.2-54 (running kernel: 4.4.6-1-pve)
pve-manager: 4.2-15 (running version: 4.2-15/6669ad2c)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.10-1-pve: 4.4.10-54
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-42
qemu-server: 4.0-81
pve-firmware: 1.1-8
libpve-common-perl: 4.0-68
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-55
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-68
pve-firewall: 2.0-29
pve-ha-manager: 1.0-32
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie

this is the node I am triying to add

proxmox-ve: 4.2-54 (running kernel: 4.4.6-1-pve)
pve-manager: 4.2-15 (running version: 4.2-15/6669ad2c)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.10-1-pve: 4.4.10-54
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-42
qemu-server: 4.0-81
pve-firmware: 1.1-8
libpve-common-perl: 4.0-68
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-55
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-68
pve-firewall: 2.0-29
pve-ha-manager: 1.0-32
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
 
I solved it
the servers were not in the same subnet
the network was parted in 3 subnets