New V4 install cluster problems.

Animatrix

Member
Mar 5, 2015
9
1
23
Hi guys, first post here so I will explain background to my issues. I have run a v3 proxmox environment for a few years with no issues. I have a 5 node cluster 3 hp servers and 2 dell servers with a shared NFS server and Isi NAS for storage. I decided to upgrade to V4 so I backed up all vm's and reinstalled using the latest iso, on reboot I created a new cluster and joined all the nodes to the first node, all seemed OK. I then rebooted a machine and when it came back up it wouldn't join the cluster, it said running in single node mode and said no quorum, hmm so I rebooted it again and same issue, so I did pvecm status and noticed no multicast address in there, so I tried to ping the nodes hostname, no response, host not found ? Looked in hosts file and noticed only its own hostname, the servers all use a dns server that has the correct dns records for each host, so I thought I would add the host names and IP addresses into the nodes hosts file, and bam I have quorum again. I can restart nodes and the cluster remains ok. My concern is that something isn't right as doing the the command corosync-cmapctl -g totem.interface.0.mcastaddr give the result
can't get key totem.interface.0.mcastaddr error CS_ERR_NOT_EXIST.

All "seems" ok but I am worried that it will fail, I never had to add the host names to the /etc/hosts file on the old version 3 , please note as I said this is not an upgrade its a brand new install, and I have also blacklisted hpwdt on the hp nodes. Am I panicking for nothing or is something wrong here, I am using a netgear layer 3 switch and snooping etc is set correctly as per the wiki. I wonder does v4 now use host names for multicast or ??? I am going to setup a new nic connection for the nodes multicast as per the wiki on a separate switch so I can use HA on a few critical vm's but I need to understand this issue first.

Thanks in advance

Kurt
 
Sure no problem please see below, as a note i just needed to reboot a node and it failed to go into the cluster 1st boot, rebooted it and worked 2nd time so seems its still not correct.

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
nodeid: 2
quorum_votes: 1
ring0_addr: pve2
}

node {
nodeid: 5
quorum_votes: 1
ring0_addr: pve4
}

node {
nodeid: 4
quorum_votes: 1
ring0_addr: pve-master
}

node {
nodeid: 3
quorum_votes: 1
ring0_addr: pve3
}
node {
nodeid: 1
quorum_votes: 1
ring0_addr: pve1
}

}
quorum {
provider: corosync_votequorum
}

totem {
cluster_name: dmcluster
config_version: 5
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.254.1
ringnumber: 0
}
}
 
Last edited:
What are the network config of your nodes, especially the IP addresses on which corosync runs. What did you do for the cluster creation? Did you set a manual bindnetaddress?
 
Hi thanks for the reply, my setup is pretty basic at present, I have 5 separate nodes as listed below

pve1 192.168.254.1
pve2 192.168.254.2
pve3 192.168.254.3
pve4 192.168.254.8
pve-master 192.168.254.4

corosync runs on the above network also , I do have an NFS shared storage which I know should be on a seperate network and it will be but isn't at the moment, to create the cluster on pve1 I did pvecm create dmcluster then on all the other nodes I did pvecm add 192.168.254.1 I did not set the bind address manually it must do that on cluster creation, all nodes can ping each other by IP, full hostname and just the pve1 etc as I added all this into the hosts file on each node, I have also done omping on all nodes and tested all can omping everything else and they can, on my network switch snooping is on and I have a queries setup though as I'm not a network man I'm not sure that it's set right as there are no entries in the table but as they can moping I'm assuming it's not the issue.

I think that's everything I have setup, pretty simple setup I was thinking about either setting up another network for storage or corosync or maybe just bonding the nice on each node, my switch is a netgear pro safe one so pretty good spec.

thanks
kurt
 
Ok looks good as it is, but yes NFS/Storage on a separate network should be done.

You could change the bindnetaddr in the /etc/pve/corosync.conf to 192.168.254.0 but AFAIK, it should be OK like it is as.
For a separate cluster network you may find https://pve.proxmox.com/wiki/Separate_Cluster_Network interesting.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!