Adding node to cluster failed (Broken pipe (596))

leifpa · Aug 27, 2016

Hey,

I'm currently working on adding a proxmox node to my proxmox cluster.
I tried everything but it don't seems to work so I want to ask you guys for help

I installed 2 nodes with the newest proxmox iso
I edited both /etc/hosts configs, so I added node002 to the node001 hosts file and node001 to the node002 hosts file.
I checked if multicast is working with iperf, it's working finde when I use the multicast address
The 2 nodes are in the same IP Network 31.172.9x.xxx/28

I created a cluster on the 1st node with this command:
pvecm create test
then I tried to add the cluster on my 2nd node with this command:
pvecm add 31.172.9x.x21 --force (IP from the 1st node)

It seems working all fine, there weren't any errors after I entered the password...

But the problem ist: Node002 appears in the Node002 webinterface, but I can't manage it: Broken pipe (596)

service corosync status of 1st node (where I created the cluster)

Code:

root@node001:~# service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: active (running) since Sat 2016-08-27 18:14:25 CEST; 30min ago
  Process: 13572 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
  Process: 13584 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 13593 (corosync)
   CGroup: /system.slice/corosync.service
           └─13593 corosync

Aug 27 18:14:25 node001 corosync[13584]: Starting Corosync Cluster Engine (corosync): [  OK  ]
Aug 27 18:14:51 node001 corosync[13593]: [QUORUM] This node is within the primary component and will provide service.
Aug 27 18:14:51 node001 corosync[13593]: [QUORUM] Members[1]: 1
Aug 27 18:14:56 node001 corosync[13593]: [CFG   ] Config reload requested by node 1
Aug 27 18:15:16 node001 corosync[13593]: [CFG   ] Config reload requested by node 1
Aug 27 18:15:16 node001 corosync[13593]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 27 18:15:16 node001 corosync[13593]: [QUORUM] Members[1]: 1
Aug 27 18:18:19 node001 corosync[13593]: [TOTEM ] A new membership (31.172.9x.x21:116) was formed. Members
Aug 27 18:18:19 node001 corosync[13593]: [QUORUM] Members[1]: 1
Aug 27 18:18:19 node001 corosync[13593]: [MAIN  ] Completed service synchronization, ready to provide service.

service corosync status on the 2nd node

Code:

root@node002:~# service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: active (running) since Sat 2016-08-27 18:18:19 CEST; 28min ago
  Process: 1383 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
Main PID: 1390 (corosync)
   CGroup: /system.slice/corosync.service
           └─1390 corosync

Aug 27 18:18:18 node002 corosync[1390]: [SERV  ] Service engine loaded: corosync profile loading service [4]
Aug 27 18:18:18 node002 corosync[1390]: [QUORUM] Using quorum provider corosync_votequorum
Aug 27 18:18:18 node002 corosync[1390]: [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Aug 27 18:18:18 node002 corosync[1390]: [QB    ] server name: votequorum
Aug 27 18:18:18 node002 corosync[1390]: [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Aug 27 18:18:18 node002 corosync[1390]: [QB    ] server name: quorum
Aug 27 18:18:18 node002 corosync[1390]: [TOTEM ] A new membership (31.172.9x.217:16) was formed. Members joined: 2
Aug 27 18:18:18 node002 corosync[1390]: [QUORUM] Members[1]: 2
Aug 27 18:18:18 node002 corosync[1390]: [MAIN  ] Completed service synchronization, ready to provide service.
Aug 27 18:18:19 node002 corosync[1383]: Starting Corosync Cluster Engine (corosync): [  OK  ]

pvecm status on the 1st node

Code:

root@node001:~# pvecm status
Quorum information
------------------
Date:             Sat Aug 27 18:47:14 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          120
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 31.172.9x.x21 (local)

pvecm status on the 2nd node

Code:

root@node002:~# pvecm status
Quorum information
------------------
Date:             Sat Aug 27 18:47:54 2016
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 31.172.9x.x17 (local)

/etc/corosync/corosync.conf on the 1st node

Code:

root@node001:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: node001
    nodeid: 1
    quorum_votes: 1
    ring0_addr: node001
  }

  node {
    name: node002
    nodeid: 2
    quorum_votes: 1
    ring0_addr: node002
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: avoro
  config_version: 8
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 31.172.9x.x21
    ringnumber: 0
  }

}

and on the 2nd node

Code:

root@node002:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: node001
    nodeid: 1
    quorum_votes: 1
    ring0_addr: node001
  }

  node {
    name: node002
    nodeid: 2
    quorum_votes: 1
    ring0_addr: node002
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: avoro
  config_version: 8
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 31.172.9x.x21
    ringnumber: 0
  }

}

the hosts file on the 1st node

Code:

127.0.0.1 localhost.localdomain localhost
31.172.9x.x21 node001.avoro.eu node001 pvelocalhost
31.172.9x.x17 node002.avoro.eu node002

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

and the 2nd node

Code:

127.0.0.1 localhost.localdomain localhost
31.172.9x.x17 node002.avoro.eu node002 pvelocalhost
31.172.9x.x21 node001.avoro.eu node001

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

31.172.9x.x21 is the IP of the 1st Host (Where I created the cluster)
31.172.9x.x17 of the 2nd (Where I tried to add the cluster)

It would be pretty nice if someone could help me

Thanks!

dcsapak · Aug 30, 2016

please test multicast with omping
see https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues

leifpa · Aug 30, 2016

dcsapak said:
please test multicast with omping
see https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues

Hey,
thanks for your reply!
This is the result of my test:

Code:

root@node002:~# omping -c 10000 -i 0.001 -F -q node001 node002
node001 : waiting for response msg
node001 : joined (S,G) = (*, 232.43.211.234), pinging
node001 : given amount of query messages was sent

node001 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.088/0.153/1.611/0.037
node001 : multicast, xmt/rcv/%loss = 10000/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
root@node002:~#

Code:

root@node001:~# omping -c 10000 -i 0.001 -F -q node001 node002
node002 : waiting for response msg
node002 : joined (S,G) = (*, 232.43.211.234), pinging
node002 : waiting for response msg
node002 : server told us to stop
node002 :   unicast, xmt/rcv/%loss = 9842/9842/0%, min/avg/max/std-dev = 0.094/0.148/1.254/0.028
node002 : multicast, xmt/rcv/%loss = 9842/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

Best regards
Leif

dcsapak · Aug 30, 2016

you have 100% loss of the multicast packages, so this will not work unless you make multicast work

leifpa · Aug 30, 2016

dcsapak said:
you have 100% loss of the multicast packages, so this will not work unless you make multicast work

I already tried to active it on the node side with

Code:

ifconfig eth0 multicast

and my network config looks like this on booth nodes (same subnet):

Code:

auto lo
iface lo inet loopback

auto vmbr0
iface vmbr0 inet static
        address 31.172.9x.x21
        netmask 255.255.255.240
        gateway 31.172.9x.x09
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
        post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )

Best regards
Leif

Search

Search

Adding node to cluster failed (Broken pipe (596))

leifpa

New Member

dcsapak

Proxmox Staff Member

leifpa

New Member

dcsapak

Proxmox Staff Member

leifpa

New Member

We value your privacy