Adding node to cluster failed (Broken pipe (596))

leifpa

New Member
Aug 27, 2016
5
0
1
28
Hey,

I'm currently working on adding a proxmox node to my proxmox cluster.
I tried everything but it don't seems to work so I want to ask you guys for help :)

I installed 2 nodes with the newest proxmox iso
I edited both /etc/hosts configs, so I added node002 to the node001 hosts file and node001 to the node002 hosts file.
I checked if multicast is working with iperf, it's working finde when I use the multicast address
The 2 nodes are in the same IP Network 31.172.9x.xxx/28

I created a cluster on the 1st node with this command:
pvecm create test
then I tried to add the cluster on my 2nd node with this command:
pvecm add 31.172.9x.x21 --force (IP from the 1st node)

It seems working all fine, there weren't any errors after I entered the password...

But the problem ist: Node002 appears in the Node002 webinterface, but I can't manage it: Broken pipe (596)

service corosync status of 1st node (where I created the cluster)

Code:
root@node001:~# service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: active (running) since Sat 2016-08-27 18:14:25 CEST; 30min ago
  Process: 13572 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 13584 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 13593 (corosync)
   CGroup: /system.slice/corosync.service
           └─13593 corosync

Aug 27 18:14:25 node001 corosync[13584]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Aug 27 18:14:51 node001 corosync[13593]: [QUORUM] This node is within the primary component and will provide service.
Aug 27 18:14:51 node001 corosync[13593]: [QUORUM] Members[1]: 1
Aug 27 18:14:56 node001 corosync[13593]: [CFG   ] Config reload requested by node 1
Aug 27 18:15:16 node001 corosync[13593]: [CFG   ] Config reload requested by node 1
Aug 27 18:15:16 node001 corosync[13593]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 27 18:15:16 node001 corosync[13593]: [QUORUM] Members[1]: 1
Aug 27 18:18:19 node001 corosync[13593]: [TOTEM ] A new membership (31.172.9x.x21:116) was formed. Members
Aug 27 18:18:19 node001 corosync[13593]: [QUORUM] Members[1]: 1
Aug 27 18:18:19 node001 corosync[13593]: [MAIN  ] Completed service synchronization, ready to provide service.

service corosync status on the 2nd node

Code:
root@node002:~# service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: active (running) since Sat 2016-08-27 18:18:19 CEST; 28min ago
  Process: 1383 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 1390 (corosync)
   CGroup: /system.slice/corosync.service
           └─1390 corosync

Aug 27 18:18:18 node002 corosync[1390]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Aug 27 18:18:18 node002 corosync[1390]: [QUORUM] Using quorum provider corosync_votequorum
Aug 27 18:18:18 node002 corosync[1390]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Aug 27 18:18:18 node002 corosync[1390]: [QB    ] server name: votequorum
Aug 27 18:18:18 node002 corosync[1390]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Aug 27 18:18:18 node002 corosync[1390]: [QB    ] server name: quorum
Aug 27 18:18:18 node002 corosync[1390]: [TOTEM ] A new membership (31.172.9x.217:16) was formed. Members joined: 2
Aug 27 18:18:18 node002 corosync[1390]: [QUORUM] Members[1]: 2
Aug 27 18:18:18 node002 corosync[1390]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 27 18:18:19 node002 corosync[1383]: Starting Corosync Cluster Engine (corosync): [  OK  ]

pvecm status on the 1st node

Code:
root@node001:~# pvecm status
Quorum information
------------------
Date:             Sat Aug 27 18:47:14 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          120
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 31.172.9x.x21 (local)

pvecm status on the 2nd node

Code:
root@node002:~# pvecm status
Quorum information
------------------
Date:             Sat Aug 27 18:47:54 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 31.172.9x.x17 (local)

/etc/corosync/corosync.conf on the 1st node

Code:
root@node001:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: node001
    nodeid: 1
    quorum_votes: 1
    ring0_addr: node001
  }

  node {
    name: node002
    nodeid: 2
    quorum_votes: 1
    ring0_addr: node002
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: avoro
  config_version: 8
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 31.172.9x.x21
    ringnumber: 0
  }

}

and on the 2nd node

Code:
root@node002:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: node001
    nodeid: 1
    quorum_votes: 1
    ring0_addr: node001
  }

  node {
    name: node002
    nodeid: 2
    quorum_votes: 1
    ring0_addr: node002
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: avoro
  config_version: 8
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 31.172.9x.x21
    ringnumber: 0
  }

}

the hosts file on the 1st node

Code:
127.0.0.1 localhost.localdomain localhost
31.172.9x.x21 node001.avoro.eu node001 pvelocalhost
31.172.9x.x17 node002.avoro.eu node002

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

and the 2nd node

Code:
127.0.0.1 localhost.localdomain localhost
31.172.9x.x17 node002.avoro.eu node002 pvelocalhost
31.172.9x.x21 node001.avoro.eu node001

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

31.172.9x.x21 is the IP of the 1st Host (Where I created the cluster)
31.172.9x.x17 of the 2nd (Where I tried to add the cluster)

It would be pretty nice if someone could help me :)

Thanks!
 
Hey,
thanks for your reply!
This is the result of my test:
Code:
root@node002:~# omping -c 10000 -i 0.001 -F -q node001 node002
node001 : waiting for response msg
node001 : joined (S,G) = (*, 232.43.211.234), pinging
node001 : given amount of query messages was sent

node001 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.088/0.153/1.611/0.037
node001 : multicast, xmt/rcv/%loss = 10000/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
root@node002:~#
Code:
root@node001:~# omping -c 10000 -i 0.001 -F -q node001 node002
node002 : waiting for response msg
node002 : joined (S,G) = (*, 232.43.211.234), pinging
node002 : waiting for response msg
node002 : server told us to stop
node002 :   unicast, xmt/rcv/%loss = 9842/9842/0%, min/avg/max/std-dev = 0.094/0.148/1.254/0.028
node002 : multicast, xmt/rcv/%loss = 9842/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

Best regards
Leif
 
you have 100% loss of the multicast packages, so this will not work unless you make multicast work
 
you have 100% loss of the multicast packages, so this will not work unless you make multicast work
I already tried to active it on the node side with
Code:
ifconfig eth0 multicast
and my network config looks like this on booth nodes (same subnet):
Code:
auto lo
iface lo inet loopback

auto vmbr0
iface vmbr0 inet static
        address 31.172.9x.x21
        netmask 255.255.255.240
        gateway 31.172.9x.x09
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
        post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )
Best regards
Leif