pvecm add faild when cluster use unicast [ transport="udpu" ]

seba · Oct 19, 2013

hello, after several problems with my switche, I spent my pve 2.3 cluster with 3 nodes in unicast.
For over four months I have not had any loss of connections.
I wanted to added a new node, but I get an error " Waiting for quorum... Timed-out waiting for cluster [FAILED]" when I use the command pvecm add 192.168.56.102
I retry with the version "3.1 pve" and only 2 node, and I obtiend the same mistake.

can tell me how to add a node in a cluster in unicast mode?
Thank you very much
seba

install n1 node

Code:

apt-get update
apt-get upgrade
root@n1:~# pveversion
pve-manager/3.1-3/dc0e9b0e (running kernel: 2.6.32-23-pve)


root@n1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback


auto vmbr0
iface vmbr0 inet static
        address 192.168.56.102
        netmask 255.255.255.0
        gateway 192.168.100.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0


root@n1:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.56.102 n1.seb.fr n1 pvelocalhost
192.168.56.103 n2.seb.fr n2


# The following lines are desirable for IPv6 capable hosts


::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


root@n1:~# pvecm create pm3
Restarting pve cluster filesystem: pve-cluster[dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Tuning DLM kernel config... [  OK  ]
   Unfencing self... [  OK  ]
root@n1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2013-10-18 15:48:22  n1
root@n1:~# reboot

change config to unicast

Code:

root@n1:~# cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new
root@n1:~# nano /etc/pve/cluster.conf.new
#add transport="udpu" on line cman
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" >
  </cman>


root@n1:~# ccs_config_validate -v -f /etc/pve/cluster.conf.new
Creating temporary file: /tmp/tmp.9ZQp0CES0s
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Configuration validates
Validation completed


validate modification via the gui / HA / activate
root@n1:~# reboot
root@n1:~# pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: pm3
Cluster Id: 717
Cluster Member: Yes
Cluster Generation: 12
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: n1
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 192.168.56.102


root@n1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M     12   2013-10-18 16:01:38  n1

install n2

Code:

apt-get update
apt-get upgrade
root@n2:~# pveversion
pve-manager/3.1-3/dc0e9b0e (running kernel: 2.6.32-23-pve)


root@n2:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback


auto vmbr0
iface vmbr0 inet static
        address 192.168.56.103
        netmask 255.255.255.0
        gateway 192.168.100.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0


root@n2:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.56.103 n2.seb.fr n2 pvelocalhost
192.168.56.102 n1.seb.fr n1


# The following lines are desirable for IPv6 capable hosts


::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


root@n2:~# ping n1
PING n1.seb.fr (192.168.56.102) 56(84) bytes of data.
64 bytes from n1.seb.fr (192.168.56.102): icmp_req=1 ttl=64 time=2.31 ms
64 bytes from n1.seb.fr (192.168.56.102): icmp_req=2 ttl=64 time=6.87 ms

add n2 to the cluster

Code:

root@n2:~# pvecm nodes
cman_tool: Cannot open connection to cman, is it running ?
root@n2:~#
root@n2:~# pvecm add 192.168.56.102
The authenticity of host '192.168.56.102 (192.168.56.102)' can't be established.
ECDSA key fingerprint is 51:dd:eb:63:4d:82:2d:c3:07:b0:64:d2:6a:cb:6f:7f.
Are you sure you want to continue connecting (yes/no)? yes
root@192.168.56.102's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...^C

log from n1 and n2

Code:

Oct 19 20:38:48 n2 pmxcfs[1940]: [main] notice: teardown filesystem
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: quorum_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pmxcfs[2561]: [confdb] crit: confdb_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pmxcfs[2561]: [dcdb] crit: cpg_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pmxcfs[2561]: [dcdb] crit: cpg_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pvestatd[2118]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Oct 19 20:38:50 n2 kernel: DLM (built Aug  6 2013 06:53:05) installed
Oct 19 20:38:50 n2 pmxcfs[2561]: [status] crit: cpg_send_message failed: 9
Oct 19 20:38:50 n2 pmxcfs[2561]: [status] crit: cpg_send_message failed: 9
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Corosync Cluster Engine ('1.4.5'): started and ready to provide service.
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Corosync built-in features: nss
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Successfully parsed cman config
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Successfully configured openais services to load
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] Initializing transport (UDP/IP Unicast).
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] The network interface [192.168.56.103] is now up.
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Using quorum provider quorum_cman
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Oct 19 20:38:52 n2 corosync[2655]:   [CMAN  ] CMAN 1364188437 (built Mar 25 2013 06:14:01) started
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais cluster membership service B.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais event service B.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais message service B.03.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais distributed locking service B.03.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais timer service A.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync configuration service
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync profile loading service
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Using quorum provider quorum_cman
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] adding new UDPU member {192.168.56.102}
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] adding new UDPU member {192.168.56.103}
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] New Configuration:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Left:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Joined:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] New Configuration:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] #011r(0) ip(192.168.56.103)
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Left:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Joined:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] #011r(0) ip(192.168.56.103)
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Members[1]: 2
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Members[1]: 2
Oct 19 20:38:52 n2 corosync[2655]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.56.103) ; members(old:0 left:0)
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 19 20:38:55 n2 pmxcfs[2561]: [status] notice: update cluster info (cluster name  pm3, version = 2)
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: members: 2/2561
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: all data is up to date
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: members: 2/2561
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: all data is up to date


== on n1 no trace of n2 in the /var/log/syslog ==


Oct 19 20:38:46 n1 pmxcfs[1939]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Oct 19 20:38:47 n1 corosync[2134]:   [QUORUM] Members[1]: 1
Oct 19 20:38:47 n1 pmxcfs[1939]: [status] notice: update cluster info (cluster name  pm3, version = 2)
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] New Configuration:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] #011r(0) ip(192.168.56.102)
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] Members Left:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] Members Joined:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] New Configuration:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] #011r(0) ip(192.168.56.102)
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] Members Left:

l.mierzwa · Aug 7, 2014

I'm having the same issue, did you found any solution?

EDIT - it seems that this is corosync limitation, you need to restart corosync on all nodes when adding new node to cluster using udpu

Search

Search

pvecm add faild when cluster use unicast [ transport="udpu" ]

seba

New Member

l.mierzwa

Member