[SOLVED] one node cannot see other nodes in a cluster

helj15798

New Member
May 13, 2021
8
0
1
34
Hi, I got a problem.
I have set up a five node cluster wIth the latest version (7.0-2) on each of the five nodes.
The nodes can see each other at first.

However, after about one hour, one node cannot see the other four nodes.

1633173205305-png.29969



Meanwhile, the other four nodes show very slow performance and cannot see pub1.
1633173318971-png.29970


If I shutdown the node pub1, the other four nodes can see each other.

1633173373924-png.29971


I have tried to re-install all nodes and re-create a cluster. But similar things happen again.
 
Please share your network configuration and the Corosync config as well.

Bash:
cat /etc/network/interfaces
cat /etc/pve/corosync.conf
cat /etc/hosts

And the output of pvecm nodes command.
 
Please share your network configuration and the Corosync config as well.

Bash:
cat /etc/network/interfaces
cat /etc/pve/corosync.conf
cat /etc/hosts

And the output of pvecm nodes command.

Thank you for your reply.

I have re-installed three nodes and re-created a cluster.
After re-installation, the three-node cluster works well and the three nodes can see each other.
Similarly, after about one hour, the same issue happens again.
Currently, for this three-node cluster, pub1 cannot see yard3 and yard6, while yard3 and yard6 can see each other.
1633418902898.png 1633418956660.png


Info as requested.

On pub1:
Bash:
root@pub1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno113 inet manual

auto vmbr0
iface vmbr0 inet static
        address 10.20.xx.xx/17
        gateway 10.20.xxx.xxx
        bridge-ports eno113
        bridge-stp off
        bridge-fd 0

iface eno114 inet manual

iface eno115 inet manual

iface eno116 inet manual

root@pub1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pub1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.20.xx.xxx
  }
  node {
    name: yard3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.16.xx.x
  }
  node {
    name: yard6
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.16.xx.xx
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: HxxxLab
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@pub1:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.20.xx.xxx pub1.xxx.edu.cn pub1

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

root@pub1:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 pub1 (local)

On yard3:
Bash:
root@yard3:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp68s0f0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 10.16.xx.x/17
        gateway 10.16.xxx.xxx
        bridge-ports enp68s0f0
        bridge-stp off
        bridge-fd 0

iface enp193s0f0 inet manual

iface enp68s0f1 inet manual

iface enp193s0f1 inet manual

root@yard3:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pub1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.20.xx.xxx
  }
  node {
    name: yard3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.16.xx.x
  }
  node {
    name: yard6
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.16.xx.xx
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: HxxxLab
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@yard3:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.16.xx.x yard3.xxx.edu.cn yard3

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

root@yard3:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 yard3 (local)
         3          1 yard6
 
Thank you for your reply.

I have re-installed three nodes and re-created a cluster.
After re-installation, the three-node cluster works well and the three nodes can see each other.
Similarly, after about one hour, the same issue happens again.
Currently, for this three-node cluster, pub1 cannot see yard3 and yard6, while yard3 and yard6 can see each other.
View attachment 30054 View attachment 30055


Info as requested.

On pub1:
Bash:
root@pub1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno113 inet manual

auto vmbr0
iface vmbr0 inet static
        address 10.20.xx.xx/17
        gateway 10.20.xxx.xxx
        bridge-ports eno113
        bridge-stp off
        bridge-fd 0

iface eno114 inet manual

iface eno115 inet manual

iface eno116 inet manual

root@pub1:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pub1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.20.xx.xxx
  }
  node {
    name: yard3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.16.xx.x
  }
  node {
    name: yard6
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.16.xx.xx
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: HxxxLab
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@pub1:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.20.xx.xxx pub1.xxx.edu.cn pub1

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

root@pub1:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 pub1 (local)

On yard3:
Bash:
root@yard3:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp68s0f0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 10.16.xx.x/17
        gateway 10.16.xxx.xxx
        bridge-ports enp68s0f0
        bridge-stp off
        bridge-fd 0

iface enp193s0f0 inet manual

iface enp68s0f1 inet manual

iface enp193s0f1 inet manual

root@yard3:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pub1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.20.xx.xxx
  }
  node {
    name: yard3
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.16.xx.x
  }
  node {
    name: yard6
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.16.xx.xx
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: HxxxLab
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

root@yard3:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
10.16.xx.x yard3.xxx.edu.cn yard3

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

root@yard3:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 yard3 (local)
         3          1 yard6

Can you edit /etc/hosts so every node has the other nodes in it example:

Code:
127.0.0.1 localhost.localdomain localhost
10.20.xx.xxx pub1.xxx.edu.cn pub1
10.16.xx.x yard3.xxx.edu.cn yard3
10.16.xx.x yard6.xxx.edu.cn yard6

You are also having different subnet on pub1, typically your having all the hosts on the same subnet especially for corosync. if it its routed the latency for the link can go up high until cluster is broken or reboots.

  • use separate interface if possible (dont use gui/vm and corosync on same link)
  • no routing for corosync if possible (no gateway needed if interfaces are on the same switch)
    • maybe use other interface as corosync fallback
  • put ip adresses in the same subnet
  • use /etc/hosts for local dns
 
Can you edit /etc/hosts so every node has the other nodes in it example:

Code:
127.0.0.1 localhost.localdomain localhost
10.20.xx.xxx pub1.xxx.edu.cn pub1
10.16.xx.x yard3.xxx.edu.cn yard3
10.16.xx.x yard6.xxx.edu.cn yard6

You are also having different subnet on pub1, typically your having all the hosts on the same subnet especially for corosync. if it its routed the latency for the link can go up high until cluster is broken or reboots.

  • use separate interface if possible (dont use gui/vm and corosync on same link)
  • no routing for corosync if possible (no gateway needed if interfaces are on the same switch)
    • maybe use other interface as corosync fallback
  • put ip adresses in the same subnet
  • use /etc/hosts for local dns

Editing /etc/hosts for each node works!
Thank you very much!
 
Editing /etc/hosts for each node works!
Thank you very much!

No problem, would be cool if you could mark this thread at "solved".
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!