[SOLVED] proxmox 4 cluster setup issue

pfoo

Renowned Member
Jan 21, 2012
29
3
68
Hi, I'm having an issue setuping a cluster using latest proxmox4 (clean install, no upgrade from proxmox 3)
I'm currently trying to establish cluster between 2 nodes : node1 and node2.
multicast is working, confirmed by omping.

This is /etc/hosts from node1 :
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

10.10.10.1 node1.domain.tld node1 pvelocalhost
10.10.10.2 node2.domain.tld node2
10.10.10.3 node3.domain.tld node3

This is /etc/hosts from node2 :
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

10.10.10.1 node1.domain.tld node1
10.10.10.2 node2.domain.tld node2 pvelocalhost
10.10.10.3 node3.domain.tld node3


This is what I inputed on node1 :
$ pvecm create SHIELD


This is what I inputed on node2 :
$ pvecm add 10.10.10.1
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed. See 'systemctl status corosync.service' and 'journalctl -xn' for details.
waiting for quorum...


I've got a strange feeling when I saw this line in logfile on node2:
corosync[14980]: [TOTEM ] The network interface is down.

(the network interface is ofcourse up)

And this is
/etc/corosync/corosync.conf on node2
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
nodeid: 1
quorum_votes: 1
ring0_addr: node1
}

node {
nodeid: 2
quorum_votes: 1
ring0_addr: node2
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: SHIELD
config_version: 2
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.10.10.1
ringnumber: 0
}

}



IMHO, bindnetaddr apear to be wrong as it should be 10.10.10.2 on this node (node2).

If I manually edit bindnetaddr in /etc/corosync/corosync.conf and later manually start corosync, cluster establish successfully, but /etc/corosync/corosync.conf is reverted back to the wrong ip so the change won't survive a reboot.
Have I done something wrong during cluster setup, or did I hit a glitch ?
 
Last edited:
pveversion on both nodes :
proxmox-ve: 4.0-16 (running kernel: 4.2.2-1-pve)
pve-manager: 4.0-48 (running version: 4.0-48/0d8559d0)
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-22
qemu-server: 4.0-30
pve-firmware: 1.1-7
libpve-common-perl: 4.0-29
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-25
pve-libspice-server1: 0.12.5-1
vncterm: 1.2-1
pve-qemu-kvm: 2.4-9
pve-container: 1.0-6
pve-firewall: 2.0-12
pve-ha-manager: 1.0-9
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.3-1
lxcfs: 0.9-pve2
cgmanager: 0.37-pve2
criu: 1.6.0-1
openvswitch-switch: 2.3.2-1

 
Ok, found it
corosync/cluster interface must have 255.255.255.0 as netmask, and this way, bindnetaddr become valid and corosync start.