Help with cluster configuration

Valerio Pachera

Active Member
Aug 19, 2016
131
6
38
42
Hi, I'm configuring my first cluster following the documentation.

I have a dedicated nic for corosync on 192.168.9.0/24 network.
On the first node (192.168.9.106) I run 'pvecm create pve-cluster-01' and no errors were reported.
On the second node I run
Code:
pvecm add 192.168.9.106 --ring0_addr 192.168.9.78
...
Request addition of this node
500 cluster not ready - no quorum?

Then

Code:
pvecm status
Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
Cannot initialize CMAP service

pvecm nodes
Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
Cannot initialize CMAP service

On the first node

Code:
pvecm status
Cannot initialize CMAP service

I guess the first step went wrong but i got no errors.

PS: the two dedicated nic are connected to a TP-Link TL-SG105E switch.
I enabled IGMP snooping on it.

Any suggestion?
Any log I can look at?

Looking at the man page, I think I should create the cluster on the first node by the rin0_addr option in first place.
Otherwise it would bind to another address.

Code:
      pvecm create <clustername> [OPTIONS]
      ....
       --ring0_addr <string> (default = Hostname of the node)
           Hostname (or IP) of the corosync ring0 address of this node.
 
Hi, I'm configuring my first cluster following the documentation.

I have a dedicated nic for corosync on 192.168.9.0/24 network.
On the first node (192.168.9.106) I run 'pvecm create pve-cluster-01' and no errors were reported.
On the second node I run
Code:
pvecm add 192.168.9.106 --ring0_addr 192.168.9.78
...
Request addition of this node
500 cluster not ready - no quorum?
...
Hi,
I assume your host entry on the first node don't point to 192.168.9.106?!

Take a look at /etc/corosync/corosync.conf on the first node.

Udo
 
@udo You are right!
Nontheless I found out that the ip address set in /etc/hosts was not matching the current server ip.
Note: the first server has been installed and configured by a third person and is already running some guests.

1) So, because I want to use a dedicated NIC, I have to specify its address alreay when I create the cluster.
Am I right?
By man, I see two options:

Code:
       --bindnet0_addr <string>
           This specifies the network address the corosync ring 0 executive should bind to and defaults to the local IP address of the node.

       --ring0_addr <string> (default = Hostname of the node)
           Hostname (or IP) of the corosync ring0 address of this node.

2) Honestly, I dont understand the 'bindnet0_addr' and the difference with 'ring0_addr'.
Could you explain it please?

3) I guess it's safe to run 'pvecm create' a second time right?
As of now, there are no other nodes.

Thank you.
 
Hi udo, I changed /etc/corosync/corosync.conf but the file /etc/pve/corosync.conf has 440 permission so I can't edit it (and still contains the wrong ip).
Am I allowed to add write permission?
Are you sure I can change corosync settings without rebooting the server?
Shall I increase 'config_version' as described in https://pve.proxmox.com/wiki/Separate_Cluster_Network#Configure_corosync ?
Note: as of now, if I try to start a guest or do any action in the gui I get the error "cluster not ready - no quorum? (500)".

Code:
May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
May  2 08:49:39 pve corosync[14854]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
May  2 08:49:39 pve corosync[14854]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] The network interface [192.168.9.106] is now up.
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] The network interface [192.168.9.106] is now up.
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cmap
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cfg
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cpg
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
May  2 08:49:39 pve corosync[14854]: warning [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
May  2 08:49:39 pve corosync[14854]: warning [WD    ] resource load_15min missing a recovery key.
May  2 08:49:39 pve corosync[14854]: warning [WD    ] resource memory_used missing a recovery key.
May  2 08:49:39 pve corosync[14854]: info    [WD    ] no resources configured.
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Using quorum provider corosync_votequorum
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
May  2 08:49:39 pve systemd[1]: Started Corosync Cluster Engine.
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] This node is within the primary component and will provide service.
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Members[0]:
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: votequorum
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: quorum
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] A new membership (192.168.9.106:4) was formed. Members joined: 1
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Members[1]: 1
May  2 08:49:39 pve corosync[14854]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cmap
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync configuration service [1]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cfg
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cpg
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
May  2 08:49:39 pve corosync[14854]:  [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
May  2 08:49:39 pve corosync[14854]:  [WD    ] resource load_15min missing a recovery key.
May  2 08:49:39 pve corosync[14854]:  [WD    ] resource memory_used missing a recovery key.
May  2 08:49:39 pve corosync[14854]:  [WD    ] no resources configured.
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
May  2 08:49:39 pve corosync[14854]:  [QUORUM] Using quorum provider corosync_votequorum
May  2 08:49:39 pve corosync[14854]:  [QUORUM] This node is within the primary component and will provide service.
May  2 08:49:39 pve corosync[14854]:  [QUORUM] Members[0]:
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: votequorum
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: quorum
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] A new membership (192.168.9.106:4) was formed. Members joined: 1
May  2 08:49:39 pve corosync[14854]:  [QUORUM] Members[1]: 1
May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Completed service synchronization, ready to provide service.

Thank you.
 
I think the only way is to follow the steps of chapter 'Separate A Node Without Reinstalling'.

Code:
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm /etc/corosync/*
killall pmxcfs
rm /var/lib/corosync/*

And create the cluster from scratch with the right options:

Code:
systemctl start pve-cluster
pvecm create testx -bindnet0_addr 192.168.9.106 -ring0_addr 192.168.9.106

I tried that procedure on a test installation where I reproduce the situation.

Note from man pvecm

In case of network partitioning, state changes requires that a majority of nodes are online. The cluster switches to read-only mode if it loses quorum.

Let me know if you have better solutions.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!