Cluster Communication

Ryan Anderson

Active Member
Mar 9, 2018
19
2
43
45
Hello All,

Sorry I know this has been asked before on multiple threads, but I have a slightly different setup then normal and was wondering where to start looking. I have only two nodes currently. Third one is in the works. But for now I have two nodes. PVE01 and PVE02. I had to switch out the NIC on PVE01 as I was having trouble with one of the ports. After installing and booting up PVE01 I had to change the /etc/network/interfaces file to reflect the new names on the NIC ports. I can ping the hosts and the corosync networks on both hosts to the other. I can SSH into each other via both ip addresses too. Both hosts are up and running, but they are showing each other as offline. I have tried restarting all these services on each host: corosync, pvestatd, pve-cluster, pveproxy.


Any ideas on troubleshooting this?
  • PVE01
    • Host - 192.168.3.99
    • Corosync - 192.168.25.99
    • Quorum information
      ------------------
      Date: Sat May 18 17:21:57 2019
      Quorum provider: corosync_votequorum
      Nodes: 1
      Node ID: 0x00000001
      Ring ID: 1/124
      Quorate: No

      Votequorum information
      ----------------------
      Expected votes: 2
      Highest expected: 2
      Total votes: 1
      Quorum: 2 Activity blocked
      Flags:

      Membership information
      ----------------------
      Nodeid Votes Name
      0x00000001 1 192.168.25.99 (local)

    • Corosysnc.conf:
    • logging {
      debug: off
      to_syslog: yes
      }

      nodelist {
      node {
      name: pve01
      nodeid: 1
      quorum_votes: 1
      ring0_addr: pve01-corosync
      }
      node {
      name: pve02
      nodeid: 2
      quorum_votes: 1
      ring0_addr: pve02-corosync
      }
      }

      quorum {
      provider: corosync_votequorum
      }

      totem {
      cluster_name: Cluster01
      config_version: 2
      interface {
      bindnetaddr: 192.168.25.99
      ringnumber: 0
      }
      ip_version: ipv4
      secauth: on
      version: 2
      }
  • PVE02
    • Host - 192.168.3.99
    • Corosync - 192.168.25.99
    • Quorum information
      ------------------
      Date: Sat May 18 17:22:57 2019
      Quorum provider: corosync_votequorum
      Nodes: 1
      Node ID: 0x00000002
      Ring ID: 2/140
      Quorate: No

      Votequorum information
      ----------------------
      Expected votes: 2
      Highest expected: 2
      Total votes: 1
      Quorum: 2 Activity blocked
      Flags:

      Membership information
      ----------------------
      Nodeid Votes Name
      0x00000002 1 192.168.25.100 (local)


    • Corosync.conf:
    • logging {
      debug: off
      to_syslog: yes
      }

      nodelist {
      node {
      name: pve01
      nodeid: 1
      quorum_votes: 1
      ring0_addr: pve01-corosync
      }
      node {
      name: pve02
      nodeid: 2
      quorum_votes: 1
      ring0_addr: pve02-corosync
      }
      }

      quorum {
      provider: corosync_votequorum
      }

      totem {
      cluster_name: Cluster01
      config_version: 2
      interface {
      bindnetaddr: 192.168.25.99
      ringnumber: 0
      }
      ip_version: ipv4
      secauth: on
      version: 2
      }

P.S. - It was working fine before I switched out the hardware NIC.

Thanks,
Ryan
 
Thanks for the reply dietmar!

I think multicast is working correctly:
  • PVE01
    • root@pve01:~# omping -m 239.192.177.19 pve01 pve02
      pve02 : waiting for response msg
      pve02 : waiting for response msg
      pve02 : waiting for response msg
      pve02 : waiting for response msg
      pve02 : joined (S,G) = (*, 239.192.177.19), pinging
      pve02 : unicast, seq=1, size=69 bytes, dist=0, time=0.175ms
      pve02 : multicast, seq=1, size=69 bytes, dist=0, time=0.178ms
      pve02 : multicast, seq=1 (dup), size=69 bytes, dist=0, time=0.201ms
      pve02 : unicast, seq=2, size=69 bytes, dist=0, time=0.217ms
      pve02 : multicast, seq=2, size=69 bytes, dist=0, time=0.223ms
      pve02 : multicast, seq=2 (dup), size=69 bytes, dist=0, time=0.289ms
      pve02 : unicast, seq=3, size=69 bytes, dist=0, time=0.296ms
      pve02 : multicast, seq=3, size=69 bytes, dist=0, time=0.303ms
      pve02 : multicast, seq=3 (dup), size=69 bytes, dist=0, time=0.320ms
      pve02 : unicast, seq=4, size=69 bytes, dist=0, time=0.324ms
      pve02 : multicast, seq=4, size=69 bytes, dist=0, time=0.328ms
      pve02 : multicast, seq=4 (dup), size=69 bytes, dist=0, time=0.331ms
      pve02 : unicast, seq=5, size=69 bytes, dist=0, time=0.248ms
      pve02 : multicast, seq=5, size=69 bytes, dist=0, time=0.254ms
      pve02 : multicast, seq=5 (dup), size=69 bytes, dist=0, time=0.325ms
      pve02 : unicast, seq=6, size=69 bytes, dist=0, time=0.259ms
      pve02 : multicast, seq=6, size=69 bytes, dist=0, time=0.266ms
      pve02 : multicast, seq=6 (dup), size=69 bytes, dist=0, time=0.282ms
      pve02 : unicast, seq=7, size=69 bytes, dist=0, time=0.246ms
      pve02 : multicast, seq=7, size=69 bytes, dist=0, time=0.252ms
      pve02 : multicast, seq=7 (dup), size=69 bytes, dist=0, time=0.324ms
      pve02 : unicast, seq=8, size=69 bytes, dist=0, time=0.296ms
      pve02 : multicast, seq=8, size=69 bytes, dist=0, time=0.302ms
      pve02 : multicast, seq=8 (dup), size=69 bytes, dist=0, time=0.319ms
      pve02 : unicast, seq=9, size=69 bytes, dist=0, time=0.239ms
      pve02 : multicast, seq=9, size=69 bytes, dist=0, time=0.246ms
      pve02 : multicast, seq=9 (dup), size=69 bytes, dist=0, time=0.263ms
      pve02 : unicast, seq=10, size=69 bytes, dist=0, time=0.293ms
      pve02 : multicast, seq=10, size=69 bytes, dist=0, time=0.299ms
      pve02 : multicast, seq=10 (dup), size=69 bytes, dist=0, time=0.316ms
      pve02 : unicast, seq=11, size=69 bytes, dist=0, time=0.289ms
      pve02 : multicast, seq=11, size=69 bytes, dist=0, time=0.295ms
      pve02 : multicast, seq=11 (dup), size=69 bytes, dist=0, time=0.312ms
      pve02 : unicast, seq=12, size=69 bytes, dist=0, time=0.267ms
      pve02 : multicast, seq=12, size=69 bytes, dist=0, time=0.274ms
      pve02 : multicast, seq=12 (dup), size=69 bytes, dist=0, time=0.340ms
      pve02 : unicast, seq=13, size=69 bytes, dist=0, time=0.243ms
      pve02 : multicast, seq=13, size=69 bytes, dist=0, time=0.250ms
      pve02 : multicast, seq=13 (dup), size=69 bytes, dist=0, time=0.321ms
      pve02 : unicast, seq=14, size=69 bytes, dist=0, time=0.245ms
      pve02 : multicast, seq=14, size=69 bytes, dist=0, time=0.251ms
      pve02 : multicast, seq=14 (dup), size=69 bytes, dist=0, time=0.268ms
      pve02 : unicast, seq=15, size=69 bytes, dist=0, time=0.244ms
      pve02 : multicast, seq=15, size=69 bytes, dist=0, time=0.251ms
      pve02 : multicast, seq=15 (dup), size=69 bytes, dist=0, time=0.317ms
      pve02 : unicast, seq=16, size=69 bytes, dist=0, time=0.285ms
      pve02 : multicast, seq=16, size=69 bytes, dist=0, time=0.292ms
      pve02 : multicast, seq=16 (dup), size=69 bytes, dist=0, time=0.307ms
      pve02 : unicast, seq=17, size=69 bytes, dist=0, time=0.264ms
      pve02 : multicast, seq=17, size=69 bytes, dist=0, time=0.271ms
      pve02 : multicast, seq=17 (dup), size=69 bytes, dist=0, time=0.336ms
      pve02 : unicast, seq=18, size=69 bytes, dist=0, time=0.231ms
      pve02 : multicast, seq=18, size=69 bytes, dist=0, time=0.238ms
      pve02 : multicast, seq=18 (dup), size=69 bytes, dist=0, time=0.304ms
      pve02 : unicast, seq=19, size=69 bytes, dist=0, time=0.234ms
      pve02 : multicast, seq=19, size=69 bytes, dist=0, time=0.241ms
      pve02 : multicast, seq=19 (dup), size=69 bytes, dist=0, time=0.257ms
      pve02 : unicast, seq=20, size=69 bytes, dist=0, time=0.255ms
      pve02 : multicast, seq=20, size=69 bytes, dist=0, time=0.262ms
      pve02 : multicast, seq=20 (dup), size=69 bytes, dist=0, time=0.336ms
      pve02 : unicast, seq=21, size=69 bytes, dist=0, time=0.331ms
      pve02 : multicast, seq=21, size=69 bytes, dist=0, time=0.337ms
      pve02 : multicast, seq=21 (dup), size=69 bytes, dist=0, time=0.354ms
      pve02 : unicast, seq=22, size=69 bytes, dist=0, time=0.279ms
      pve02 : multicast, seq=22, size=69 bytes, dist=0, time=0.285ms
      pve02 : multicast, seq=22 (dup), size=69 bytes, dist=0, time=0.302ms
      pve02 : unicast, seq=23, size=69 bytes, dist=0, time=0.291ms
      pve02 : multicast, seq=23, size=69 bytes, dist=0, time=0.297ms
      pve02 : multicast, seq=23 (dup), size=69 bytes, dist=0, time=0.313ms
      pve02 : waiting for response msg
      pve02 : server told us to stop

      pve02 : unicast, xmt/rcv/%loss = 23/23/0%, min/avg/max/std-dev = 0.175/0.263/0.331/0.035
      pve02 : multicast, xmt/rcv/%loss = 23/23+23/0%, min/avg/max/std-dev = 0.178/0.269/0.337/0.036
  • PVE02
    • root@pve02:~# omping -m 239.192.177.19 pve01 pve02
      pve01 : waiting for response msg
      pve01 : joined (S,G) = (*, 239.192.177.19), pinging
      pve01 : unicast, seq=1, size=69 bytes, dist=0, time=0.163ms
      pve01 : unicast, seq=2, size=69 bytes, dist=0, time=0.309ms
      pve01 : multicast, seq=2, size=69 bytes, dist=0, time=0.323ms
      pve01 : unicast, seq=3, size=69 bytes, dist=0, time=0.325ms
      pve01 : multicast, seq=3, size=69 bytes, dist=0, time=0.339ms
      pve01 : unicast, seq=4, size=69 bytes, dist=0, time=0.259ms
      pve01 : multicast, seq=4, size=69 bytes, dist=0, time=0.271ms
      pve01 : unicast, seq=5, size=69 bytes, dist=0, time=0.363ms
      pve01 : multicast, seq=5, size=69 bytes, dist=0, time=0.377ms
      pve01 : unicast, seq=6, size=69 bytes, dist=0, time=0.331ms
      pve01 : multicast, seq=6, size=69 bytes, dist=0, time=0.344ms
      pve01 : unicast, seq=7, size=69 bytes, dist=0, time=0.328ms
      pve01 : multicast, seq=7, size=69 bytes, dist=0, time=0.341ms
      pve01 : unicast, seq=8, size=69 bytes, dist=0, time=0.320ms
      pve01 : multicast, seq=8, size=69 bytes, dist=0, time=0.334ms
      pve01 : unicast, seq=9, size=69 bytes, dist=0, time=0.272ms
      pve01 : multicast, seq=9, size=69 bytes, dist=0, time=0.285ms
      pve01 : unicast, seq=10, size=69 bytes, dist=0, time=0.299ms
      pve01 : multicast, seq=10, size=69 bytes, dist=0, time=0.305ms
      pve01 : unicast, seq=11, size=69 bytes, dist=0, time=0.348ms
      pve01 : multicast, seq=11, size=69 bytes, dist=0, time=0.359ms
      pve01 : unicast, seq=12, size=69 bytes, dist=0, time=0.280ms
      pve01 : multicast, seq=12, size=69 bytes, dist=0, time=0.291ms
      pve01 : unicast, seq=13, size=69 bytes, dist=0, time=0.283ms
      pve01 : multicast, seq=13, size=69 bytes, dist=0, time=0.297ms
      pve01 : unicast, seq=14, size=69 bytes, dist=0, time=0.374ms
      pve01 : multicast, seq=14, size=69 bytes, dist=0, time=0.387ms
      pve01 : unicast, seq=15, size=69 bytes, dist=0, time=0.332ms
      pve01 : multicast, seq=15, size=69 bytes, dist=0, time=0.347ms
      pve01 : unicast, seq=16, size=69 bytes, dist=0, time=0.332ms
      pve01 : multicast, seq=16, size=69 bytes, dist=0, time=0.346ms
      pve01 : unicast, seq=17, size=69 bytes, dist=0, time=0.363ms
      pve01 : multicast, seq=17, size=69 bytes, dist=0, time=0.380ms
      pve01 : unicast, seq=18, size=69 bytes, dist=0, time=0.293ms
      pve01 : multicast, seq=18, size=69 bytes, dist=0, time=0.307ms
      pve01 : unicast, seq=19, size=69 bytes, dist=0, time=0.329ms
      pve01 : multicast, seq=19, size=69 bytes, dist=0, time=0.342ms
      pve01 : unicast, seq=20, size=69 bytes, dist=0, time=0.291ms
      pve01 : multicast, seq=20, size=69 bytes, dist=0, time=0.304ms
      pve01 : unicast, seq=21, size=69 bytes, dist=0, time=0.291ms
      pve01 : multicast, seq=21, size=69 bytes, dist=0, time=0.302ms
      pve01 : unicast, seq=22, size=69 bytes, dist=0, time=0.313ms
      pve01 : multicast, seq=22, size=69 bytes, dist=0, time=0.330ms
      pve01 : unicast, seq=23, size=69 bytes, dist=0, time=0.318ms
      pve01 : multicast, seq=23, size=69 bytes, dist=0, time=0.333ms
      pve01 : unicast, seq=24, size=69 bytes, dist=0, time=0.334ms
      pve01 : multicast, seq=24, size=69 bytes, dist=0, time=0.350ms
      pve01 : unicast, seq=25, size=69 bytes, dist=0, time=0.338ms
      pve01 : multicast, seq=25, size=69 bytes, dist=0, time=0.351ms
      ^C
      pve01 : unicast, xmt/rcv/%loss = 25/25/0%, min/avg/max/std-dev = 0.163/0.312/0.374/0.043
      pve01 : multicast, xmt/rcv/%loss = 25/24/4% (seq>=2 0%), min/avg/max/std-dev = 0.271/0.331/0.387/0.031
 
I tried on pve01-corosync and pve02-corosync and got different responses:

  • PVE01
    • root@pve01:~# omping -m 239.192.177.19 pve01-corosync pve02-corosync
      pve02-corosync : waiting for response msg
      pve02-corosync : waiting for response msg
      pve02-corosync : waiting for response msg
      pve02-corosync : waiting for response msg
      pve02-corosync : joined (S,G) = (*, 239.192.177.19), pinging
      pve02-corosync : unicast, seq=1, size=69 bytes, dist=0, time=0.176ms
      pve02-corosync : unicast, seq=2, size=69 bytes, dist=0, time=0.238ms
      pve02-corosync : unicast, seq=3, size=69 bytes, dist=0, time=0.260ms
      pve02-corosync : unicast, seq=4, size=69 bytes, dist=0, time=0.253ms
      pve02-corosync : unicast, seq=5, size=69 bytes, dist=0, time=0.271ms
      pve02-corosync : unicast, seq=6, size=69 bytes, dist=0, time=0.240ms
      pve02-corosync : unicast, seq=7, size=69 bytes, dist=0, time=0.231ms
      pve02-corosync : unicast, seq=8, size=69 bytes, dist=0, time=0.232ms
      pve02-corosync : unicast, seq=9, size=69 bytes, dist=0, time=0.231ms
      pve02-corosync : unicast, seq=10, size=69 bytes, dist=0, time=0.278ms
      pve02-corosync : unicast, seq=11, size=69 bytes, dist=0, time=0.279ms
      pve02-corosync : waiting for response msg
      pve02-corosync : server told us to stop

      pve02-corosync : unicast, xmt/rcv/%loss = 11/11/0%, min/avg/max/std-dev = 0.176/0.244/0.279/0.029
      pve02-corosync : multicast, xmt/rcv/%loss = 11/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
  • PVE02
    • root@pve02:~# omping -m 239.192.177.19 pve01-corosync pve02-corosync
      pve01-corosync : waiting for response msg
      pve01-corosync : joined (S,G) = (*, 239.192.177.19), pinging
      pve01-corosync : unicast, seq=1, size=69 bytes, dist=0, time=0.183ms
      pve01-corosync : multicast, seq=1, size=69 bytes, dist=0, time=0.214ms
      pve01-corosync : unicast, seq=2, size=69 bytes, dist=0, time=0.320ms
      pve01-corosync : multicast, seq=2, size=69 bytes, dist=0, time=0.351ms
      pve01-corosync : multicast, seq=3, size=69 bytes, dist=0, time=0.308ms
      pve01-corosync : unicast, seq=3, size=69 bytes, dist=0, time=0.330ms
      pve01-corosync : unicast, seq=4, size=69 bytes, dist=0, time=0.350ms
      pve01-corosync : multicast, seq=4, size=69 bytes, dist=0, time=0.374ms
      pve01-corosync : unicast, seq=5, size=69 bytes, dist=0, time=0.338ms
      pve01-corosync : multicast, seq=5, size=69 bytes, dist=0, time=0.382ms
      pve01-corosync : unicast, seq=6, size=69 bytes, dist=0, time=0.258ms
      pve01-corosync : multicast, seq=6, size=69 bytes, dist=0, time=0.250ms
      pve01-corosync : unicast, seq=7, size=69 bytes, dist=0, time=0.319ms
      pve01-corosync : multicast, seq=7, size=69 bytes, dist=0, time=0.318ms
      pve01-corosync : unicast, seq=8, size=69 bytes, dist=0, time=0.281ms
      pve01-corosync : multicast, seq=8, size=69 bytes, dist=0, time=0.281ms
      pve01-corosync : unicast, seq=9, size=69 bytes, dist=0, time=0.346ms
      pve01-corosync : multicast, seq=9, size=69 bytes, dist=0, time=0.388ms
      pve01-corosync : unicast, seq=10, size=69 bytes, dist=0, time=0.283ms
      pve01-corosync : multicast, seq=10, size=69 bytes, dist=0, time=0.283ms
      pve01-corosync : unicast, seq=11, size=69 bytes, dist=0, time=0.296ms
      pve01-corosync : multicast, seq=11, size=69 bytes, dist=0, time=0.295ms
      pve01-corosync : unicast, seq=12, size=69 bytes, dist=0, time=0.279ms
      pve01-corosync : multicast, seq=12, size=69 bytes, dist=0, time=0.281ms
      pve01-corosync : unicast, seq=13, size=69 bytes, dist=0, time=0.345ms
      pve01-corosync : multicast, seq=13, size=69 bytes, dist=0, time=0.345ms
      ^C
      pve01-corosync : unicast, xmt/rcv/%loss = 13/13/0%, min/avg/max/std-dev = 0.183/0.302/0.350/0.047
      pve01-corosync : multicast, xmt/rcv/%loss = 13/13/0%, min/avg/max/std-dev = 0.214/0.313/0.388/0.053

That leads my to think either something is wrong with the switch or port, but I swapped the ports from and got the same results on the same nodes. Could this be a setting on the NIC?

Thanks,
Ryan
 
Here is my network configs (I have also attached them if that easier to read):

  • PVE01
    • # network interface settings; autogenerated
      # Please do NOT modify this file directly, unless you know what
      # you're doing.
      #
      # If you want to manage parts of the network configuration manually,
      # please utilize the 'source' or 'source-directory' directives to do
      # so.
      # PVE will preserve these directives, but will NOT read its network
      # configuration from sourced files, so do not attempt to move any of
      # the PVE managed interfaces into external files!

      auto lo
      iface lo inet loopback

      iface enp6s0f0 inet manual

      auto enp6s0f1
      iface enp6s0f1 inet static
      address 192.168.25.99
      netmask 255.255.255.0
      #corosync

      auto enp7s0f0
      iface enp7s0f0 inet static
      address 192.168.35.99
      netmask 255.255.255.0
      #ceph

      allow-vmbr2 enp7s0f1
      iface enp7s0f1 inet manual
      ovs_type OVSPort
      ovs_bridge vmbr2

      allow-vmbr1 enp12s0
      iface enp12s0 inet manual
      ovs_type OVSPort
      ovs_bridge vmbr1

      auto vmbr0
      iface vmbr0 inet static
      address 192.168.3.99
      netmask 255.255.255.0
      gateway 192.168.3.33
      bridge-ports enp6s0f0
      bridge-stp off
      bridge-fd 0

      auto vmbr1
      iface vmbr1 inet manual
      ovs_type OVSBridge
      ovs_ports enp12s0
      #***Bridge to Router***

      auto vmbr2
      iface vmbr2 inet manual
      ovs_type OVSBridge
      ovs_ports enp7s0f1
      #VM Traffic

  • PVE02
    • # network interface settings; autogenerated
      # Please do NOT modify this file directly, unless you know what
      # you're doing.
      #
      # If you want to manage parts of the network configuration manually,
      # please utilize the 'source' or 'source-directory' directives to do
      # so.
      # PVE will preserve these directives, but will NOT read its network
      # configuration from sourced files, so do not attempt to move any of
      # the PVE managed interfaces into external files!

      auto lo
      iface lo inet loopback

      iface enp18s0f0 inet manual

      allow-vmbr1 enp37s0
      iface enp37s0 inet manual
      ovs_type OVSPort
      ovs_bridge vmbr1

      auto enp18s0f1
      iface enp18s0f1 inet static
      address 192.168.25.100
      netmask 255.255.255.0
      #corosync

      auto enp19s0f0
      iface enp19s0f0 inet static
      address 192.168.35.100
      netmask 255.255.255.0
      #ceph

      allow-vmbr2 enp19s0f1
      iface enp19s0f1 inet manual
      ovs_type OVSPort
      ovs_bridge vmbr2

      auto vmbr0
      iface vmbr0 inet static
      address 192.168.3.100
      netmask 255.255.255.0
      gateway 192.168.3.33
      bridge-ports enp18s0f0
      bridge-stp off
      bridge-fd 0

      auto vmbr1
      iface vmbr1 inet manual
      ovs_type OVSBridge
      ovs_ports enp37s0
      #***Bridge to Router***

      auto vmbr2
      iface vmbr2 inet manual
      ovs_type OVSBridge
      ovs_ports enp19s0f1
      #VM Traffic
 

Attachments

Also, here is my corosync.conf for each machine:

  • PVE01
    • logging {
      debug: off
      to_syslog: yes
      }

      nodelist {
      node {
      name: pve01
      nodeid: 1
      quorum_votes: 1
      ring0_addr: pve01-corosync
      }
      node {
      name: pve02
      nodeid: 2
      quorum_votes: 1
      ring0_addr: pve02-corosync
      }
      }

      quorum {
      provider: corosync_votequorum
      }

      totem {
      cluster_name: Cluster01
      config_version: 2
      interface {
      bindnetaddr: 192.168.25.99
      ringnumber: 0
      }
      ip_version: ipv4
      secauth: on
      version: 2
      }

  • PVE02
    • logging {
      debug: off
      to_syslog: yes
      }

      nodelist {
      node {
      name: pve01
      nodeid: 1
      quorum_votes: 1
      ring0_addr: pve01-corosync
      }
      node {
      name: pve02
      nodeid: 2
      quorum_votes: 1
      ring0_addr: pve02-corosync
      }
      }

      quorum {
      provider: corosync_votequorum
      }

      totem {
      cluster_name: Cluster01
      config_version: 2
      interface {
      bindnetaddr: 192.168.25.99
      ringnumber: 0
      }
      ip_version: ipv4
      secauth: on
      version: 2
      }