cluster situation *solved*

RobFantini

Famous Member
May 24, 2012
2,041
107
133
Boston,Mass
I set up a separate cluster network, following https://pve.proxmox.com/wiki/Separate_Cluster_Network#Setup_on_a_Running_Cluster .

At one point I had to start over due to a typo in hosts file.* Some nodes got reinstalled.** So our issue* is probably not a bug* .** Maybe due to config file issue hangover from hosts issue.

I've 3 nodes . sys5 dell2 and 10.2.8.42

when sys5 is off line , dell1 nodes show as"
Code:
dell1  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 10.2.8.42
         3          1 dell1-corosync (local)

when sys5 is started , dell1 is alone:
Code:
dell1  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         3          1 dell1-corosync (local)

at that point sys5 shows:
Code:
sys5  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         4          1 sys5-corosync (local)
         1          1 10.2.8.42

this has been going on for a week. eventually all 3 nodes show up.** I restart some pve services and they are back.

the corosync IP addresses always can be pinged from all nodes.


Update: after 10 minutes +- the cluster worked . I had not restarted any services .
Code:
dell1  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         4          1 sys5-corosync
         1          1 10.2.8.42
         3          1 dell1-corosync (local)

Any clues on how to fix this?
 
Last edited:
Re: cluster situation

corosync.conf

Code:
dell1  /etc/pve # cat corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: dell1
    nodeid: 3
    quorum_votes: 1
    ring0_addr: dell1-corosync
  }

  node {
    name: sys3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.2.8.42
  }

  node {
    name: sys4
    nodeid: 2
    quorum_votes: 1
    ring0_addr: sys4-corosync
  }

  node {
    name: sys5
    nodeid: 4
    quorum_votes: 1
    ring0_addr: sys5-corosync
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster-v4
  config_version: 12
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 10.2.8.181
    ringnumber: 0
  }

}

note sys4 is off line.

sys3 was added after the cluster was set up.
 
Re: cluster situation

part of /etc/hosts
Code:
# corosync network hosts
10.2.8.42  sys3-corosync.fantinibakery.com  sys3-corosync
10.2.8.41  sys4-corosync.fantinibakery.com  sys4-corosync
10.2.8.19  sys5-corosync.fantinibakery.com  sys5-corosync
10.2.8.181 dell1-corosync.fantinibakery.com dell1-corosync
 
Re: cluster situation

Sounds like a multicast membership problem? To test, you can try to disable multicast snooping on the bridge.
 
Re: cluster situation

Sounds like a multicast membership problem? To test, you can try to disable multicast snooping on the bridge.

Yes I think it is a multicast issue. Thank you.

For the corosync network I'm using a dumb switch. AFAIK there is not a way to enable or disable multicast settings in the dumb switch.

corosync network uses line like this in /etc/network/interfaces :
Code:
auto eth2
iface eth2 inet static
        address 10.2.8.181
        netmask 255.255.255.0

Can multicast snooping be set in a interface like the above? Or should that use a vmbr [ bridge ? ] type setup?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!