Help with cluster configuration

Valerio Pachera

Active Member
Aug 19, 2016
131
6
38
43
Hi, I'm configuring my first cluster following the documentation.

I have a dedicated nic for corosync on 192.168.9.0/24 network.
On the first node (192.168.9.106) I run 'pvecm create pve-cluster-01' and no errors were reported.
On the second node I run
Code:
pvecm add 192.168.9.106 --ring0_addr 192.168.9.78
...
Request addition of this node
500 cluster not ready - no quorum?

Then

Code:
pvecm status
Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
Cannot initialize CMAP service

pvecm nodes
Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?
Cannot initialize CMAP service

On the first node

Code:
pvecm status
Cannot initialize CMAP service

I guess the first step went wrong but i got no errors.

PS: the two dedicated nic are connected to a TP-Link TL-SG105E switch.
I enabled IGMP snooping on it.

Any suggestion?
Any log I can look at?

Looking at the man page, I think I should create the cluster on the first node by the rin0_addr option in first place.
Otherwise it would bind to another address.

Code:
      pvecm create <clustername> [OPTIONS]
      ....
       --ring0_addr <string> (default = Hostname of the node)
           Hostname (or IP) of the corosync ring0 address of this node.
 
Hi, I'm configuring my first cluster following the documentation.

I have a dedicated nic for corosync on 192.168.9.0/24 network.
On the first node (192.168.9.106) I run 'pvecm create pve-cluster-01' and no errors were reported.
On the second node I run
Code:
pvecm add 192.168.9.106 --ring0_addr 192.168.9.78
...
Request addition of this node
500 cluster not ready - no quorum?
...
Hi,
I assume your host entry on the first node don't point to 192.168.9.106?!

Take a look at /etc/corosync/corosync.conf on the first node.

Udo
 
@udo You are right!
Nontheless I found out that the ip address set in /etc/hosts was not matching the current server ip.
Note: the first server has been installed and configured by a third person and is already running some guests.

1) So, because I want to use a dedicated NIC, I have to specify its address alreay when I create the cluster.
Am I right?
By man, I see two options:

Code:
       --bindnet0_addr <string>
           This specifies the network address the corosync ring 0 executive should bind to and defaults to the local IP address of the node.

       --ring0_addr <string> (default = Hostname of the node)
           Hostname (or IP) of the corosync ring0 address of this node.

2) Honestly, I dont understand the 'bindnet0_addr' and the difference with 'ring0_addr'.
Could you explain it please?

3) I guess it's safe to run 'pvecm create' a second time right?
As of now, there are no other nodes.

Thank you.
 
Hi udo, I changed /etc/corosync/corosync.conf but the file /etc/pve/corosync.conf has 440 permission so I can't edit it (and still contains the wrong ip).
Am I allowed to add write permission?
Are you sure I can change corosync settings without rebooting the server?
Shall I increase 'config_version' as described in https://pve.proxmox.com/wiki/Separate_Cluster_Network#Configure_corosync ?
Note: as of now, if I try to start a guest or do any action in the gui I get the error "cluster not ready - no quorum? (500)".

Code:
May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
May  2 08:49:39 pve corosync[14854]: notice  [MAIN  ] Corosync Cluster Engine ('2.4.2-dirty'): started and ready to provide service.
May  2 08:49:39 pve corosync[14854]: info    [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Corosync built-in features: dbus rdma monitoring watchdog augeas systemd upstart xmlconf qdevices qnetd snmp pie relro bindnow
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] Initializing transport (UDP/IP Multicast).
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] The network interface [192.168.9.106] is now up.
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] The network interface [192.168.9.106] is now up.
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync configuration map access [0]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cmap
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync configuration service [1]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cfg
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: cpg
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync profile loading service [4]
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
May  2 08:49:39 pve corosync[14854]: warning [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
May  2 08:49:39 pve corosync[14854]: warning [WD    ] resource load_15min missing a recovery key.
May  2 08:49:39 pve corosync[14854]: warning [WD    ] resource memory_used missing a recovery key.
May  2 08:49:39 pve corosync[14854]: info    [WD    ] no resources configured.
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync watchdog service [7]
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Using quorum provider corosync_votequorum
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync configuration map access [0]
May  2 08:49:39 pve systemd[1]: Started Corosync Cluster Engine.
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] This node is within the primary component and will provide service.
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Members[0]:
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: votequorum
May  2 08:49:39 pve corosync[14854]: notice  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
May  2 08:49:39 pve corosync[14854]: info    [QB    ] server name: quorum
May  2 08:49:39 pve corosync[14854]: notice  [TOTEM ] A new membership (192.168.9.106:4) was formed. Members joined: 1
May  2 08:49:39 pve corosync[14854]: notice  [QUORUM] Members[1]: 1
May  2 08:49:39 pve corosync[14854]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cmap
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync configuration service [1]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cfg
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: cpg
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync profile loading service [4]
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync resource monitoring service [6]
May  2 08:49:39 pve corosync[14854]:  [WD    ] Watchdog /dev/watchdog exists but couldn't be opened.
May  2 08:49:39 pve corosync[14854]:  [WD    ] resource load_15min missing a recovery key.
May  2 08:49:39 pve corosync[14854]:  [WD    ] resource memory_used missing a recovery key.
May  2 08:49:39 pve corosync[14854]:  [WD    ] no resources configured.
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync watchdog service [7]
May  2 08:49:39 pve corosync[14854]:  [QUORUM] Using quorum provider corosync_votequorum
May  2 08:49:39 pve corosync[14854]:  [QUORUM] This node is within the primary component and will provide service.
May  2 08:49:39 pve corosync[14854]:  [QUORUM] Members[0]:
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: votequorum
May  2 08:49:39 pve corosync[14854]:  [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
May  2 08:49:39 pve corosync[14854]:  [QB    ] server name: quorum
May  2 08:49:39 pve corosync[14854]:  [TOTEM ] A new membership (192.168.9.106:4) was formed. Members joined: 1
May  2 08:49:39 pve corosync[14854]:  [QUORUM] Members[1]: 1
May  2 08:49:39 pve corosync[14854]:  [MAIN  ] Completed service synchronization, ready to provide service.

Thank you.
 
I think the only way is to follow the steps of chapter 'Separate A Node Without Reinstalling'.

Code:
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm /etc/corosync/*
killall pmxcfs
rm /var/lib/corosync/*

And create the cluster from scratch with the right options:

Code:
systemctl start pve-cluster
pvecm create testx -bindnet0_addr 192.168.9.106 -ring0_addr 192.168.9.106

I tried that procedure on a test installation where I reproduce the situation.

Note from man pvecm

In case of network partitioning, state changes requires that a majority of nodes are online. The cluster switches to read-only mode if it loses quorum.

Let me know if you have better solutions.
 
Last edited: