New V4 install cluster problems.

Animatrix

Member
Mar 5, 2015
9
1
23
Hi guys, first post here so I will explain background to my issues. I have run a v3 proxmox environment for a few years with no issues. I have a 5 node cluster 3 hp servers and 2 dell servers with a shared NFS server and Isi NAS for storage. I decided to upgrade to V4 so I backed up all vm's and reinstalled using the latest iso, on reboot I created a new cluster and joined all the nodes to the first node, all seemed OK. I then rebooted a machine and when it came back up it wouldn't join the cluster, it said running in single node mode and said no quorum, hmm so I rebooted it again and same issue, so I did pvecm status and noticed no multicast address in there, so I tried to ping the nodes hostname, no response, host not found ? Looked in hosts file and noticed only its own hostname, the servers all use a dns server that has the correct dns records for each host, so I thought I would add the host names and IP addresses into the nodes hosts file, and bam I have quorum again. I can restart nodes and the cluster remains ok. My concern is that something isn't right as doing the the command corosync-cmapctl -g totem.interface.0.mcastaddr give the result
can't get key totem.interface.0.mcastaddr error CS_ERR_NOT_EXIST.

All "seems" ok but I am worried that it will fail, I never had to add the host names to the /etc/hosts file on the old version 3 , please note as I said this is not an upgrade its a brand new install, and I have also blacklisted hpwdt on the hp nodes. Am I panicking for nothing or is something wrong here, I am using a netgear layer 3 switch and snooping etc is set correctly as per the wiki. I wonder does v4 now use host names for multicast or ??? I am going to setup a new nic connection for the nodes multicast as per the wiki on a separate switch so I can use HA on a few critical vm's but I need to understand this issue first.

Thanks in advance

Kurt
 
Sure no problem please see below, as a note i just needed to reboot a node and it failed to go into the cluster 1st boot, rebooted it and worked 2nd time so seems its still not correct.

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
nodeid: 2
quorum_votes: 1
ring0_addr: pve2
}

node {
nodeid: 5
quorum_votes: 1
ring0_addr: pve4
}

node {
nodeid: 4
quorum_votes: 1
ring0_addr: pve-master
}

node {
nodeid: 3
quorum_votes: 1
ring0_addr: pve3
}
node {
nodeid: 1
quorum_votes: 1
ring0_addr: pve1
}

}
quorum {
provider: corosync_votequorum
}

totem {
cluster_name: dmcluster
config_version: 5
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.254.1
ringnumber: 0
}
}
 
Last edited:
What are the network config of your nodes, especially the IP addresses on which corosync runs. What did you do for the cluster creation? Did you set a manual bindnetaddress?
 
Hi thanks for the reply, my setup is pretty basic at present, I have 5 separate nodes as listed below

pve1 192.168.254.1
pve2 192.168.254.2
pve3 192.168.254.3
pve4 192.168.254.8
pve-master 192.168.254.4

corosync runs on the above network also , I do have an NFS shared storage which I know should be on a seperate network and it will be but isn't at the moment, to create the cluster on pve1 I did pvecm create dmcluster then on all the other nodes I did pvecm add 192.168.254.1 I did not set the bind address manually it must do that on cluster creation, all nodes can ping each other by IP, full hostname and just the pve1 etc as I added all this into the hosts file on each node, I have also done omping on all nodes and tested all can omping everything else and they can, on my network switch snooping is on and I have a queries setup though as I'm not a network man I'm not sure that it's set right as there are no entries in the table but as they can moping I'm assuming it's not the issue.

I think that's everything I have setup, pretty simple setup I was thinking about either setting up another network for storage or corosync or maybe just bonding the nice on each node, my switch is a netgear pro safe one so pretty good spec.

thanks
kurt
 
Ok looks good as it is, but yes NFS/Storage on a separate network should be done.

You could change the bindnetaddr in the /etc/pve/corosync.conf to 192.168.254.0 but AFAIK, it should be OK like it is as.
For a separate cluster network you may find https://pve.proxmox.com/wiki/Separate_Cluster_Network interesting.