cluster situation *solved*

RobFantini

Famous Member
May 24, 2012
2,084
117
133
Boston,Mass
I set up a separate cluster network, following https://pve.proxmox.com/wiki/Separate_Cluster_Network#Setup_on_a_Running_Cluster .

At one point I had to start over due to a typo in hosts file.* Some nodes got reinstalled.** So our issue* is probably not a bug* .** Maybe due to config file issue hangover from hosts issue.

I've 3 nodes . sys5 dell2 and 10.2.8.42

when sys5 is off line , dell1 nodes show as"
Code:
dell1  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 10.2.8.42
         3          1 dell1-corosync (local)

when sys5 is started , dell1 is alone:
Code:
dell1  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         3          1 dell1-corosync (local)

at that point sys5 shows:
Code:
sys5  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         4          1 sys5-corosync (local)
         1          1 10.2.8.42

this has been going on for a week. eventually all 3 nodes show up.** I restart some pve services and they are back.

the corosync IP addresses always can be pinged from all nodes.


Update: after 10 minutes +- the cluster worked . I had not restarted any services .
Code:
dell1  ~ # pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         4          1 sys5-corosync
         1          1 10.2.8.42
         3          1 dell1-corosync (local)

Any clues on how to fix this?
 
Last edited:
Re: cluster situation

corosync.conf

Code:
dell1  /etc/pve # cat corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: dell1
    nodeid: 3
    quorum_votes: 1
    ring0_addr: dell1-corosync
  }

  node {
    name: sys3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.2.8.42
  }

  node {
    name: sys4
    nodeid: 2
    quorum_votes: 1
    ring0_addr: sys4-corosync
  }

  node {
    name: sys5
    nodeid: 4
    quorum_votes: 1
    ring0_addr: sys5-corosync
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: cluster-v4
  config_version: 12
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 10.2.8.181
    ringnumber: 0
  }

}

note sys4 is off line.

sys3 was added after the cluster was set up.
 
Re: cluster situation

part of /etc/hosts
Code:
# corosync network hosts
10.2.8.42  sys3-corosync.fantinibakery.com  sys3-corosync
10.2.8.41  sys4-corosync.fantinibakery.com  sys4-corosync
10.2.8.19  sys5-corosync.fantinibakery.com  sys5-corosync
10.2.8.181 dell1-corosync.fantinibakery.com dell1-corosync
 
Re: cluster situation

Sounds like a multicast membership problem? To test, you can try to disable multicast snooping on the bridge.
 
Re: cluster situation

Sounds like a multicast membership problem? To test, you can try to disable multicast snooping on the bridge.

Yes I think it is a multicast issue. Thank you.

For the corosync network I'm using a dumb switch. AFAIK there is not a way to enable or disable multicast settings in the dumb switch.

corosync network uses line like this in /etc/network/interfaces :
Code:
auto eth2
iface eth2 inet static
        address 10.2.8.181
        netmask 255.255.255.0

Can multicast snooping be set in a interface like the above? Or should that use a vmbr [ bridge ? ] type setup?