pvecm add faild when cluster use unicast [ transport="udpu" ]

seba

New Member
Oct 29, 2012
5
0
1
french
hello, after several problems with my switche, I spent my pve 2.3 cluster with 3 nodes in unicast.
For over four months I have not had any loss of connections.
I wanted to added a new node, but I get an error " Waiting for quorum... Timed-out waiting for cluster [FAILED]" when I use the command pvecm add 192.168.56.102
I retry with the version "3.1 pve" and only 2 node, and I obtiend the same mistake.


can tell me how to add a node in a cluster in unicast mode?
Thank you very much
seba



install n1 node
Code:
apt-get update
apt-get upgrade
root@n1:~# pveversion
pve-manager/3.1-3/dc0e9b0e (running kernel: 2.6.32-23-pve)


root@n1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback


auto vmbr0
iface vmbr0 inet static
        address 192.168.56.102
        netmask 255.255.255.0
        gateway 192.168.100.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0


root@n1:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.56.102 n1.seb.fr n1 pvelocalhost
192.168.56.103 n2.seb.fr n2


# The following lines are desirable for IPv6 capable hosts


::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


root@n1:~# pvecm create pm3
Restarting pve cluster filesystem: pve-cluster[dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Tuning DLM kernel config... [  OK  ]
   Unfencing self... [  OK  ]
root@n1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2013-10-18 15:48:22  n1
root@n1:~# reboot
change config to unicast
Code:
root@n1:~# cp /etc/pve/cluster.conf /etc/pve/cluster.conf.new
root@n1:~# nano /etc/pve/cluster.conf.new
#add transport="udpu" on line cman
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu" >
  </cman>


root@n1:~# ccs_config_validate -v -f /etc/pve/cluster.conf.new
Creating temporary file: /tmp/tmp.9ZQp0CES0s
Config interface set to:
Configuration stored in temporary file
Updating relaxng schema
Validating..
Configuration validates
Validation completed


validate modification via the gui / HA / activate
root@n1:~# reboot
root@n1:~# pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: pm3
Cluster Id: 717
Cluster Member: Yes
Cluster Generation: 12
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1
Active subsystems: 5
Flags:
Ports Bound: 0
Node name: n1
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 192.168.56.102


root@n1:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M     12   2013-10-18 16:01:38  n1

install n2
Code:
apt-get update
apt-get upgrade
root@n2:~# pveversion
pve-manager/3.1-3/dc0e9b0e (running kernel: 2.6.32-23-pve)


root@n2:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback


auto vmbr0
iface vmbr0 inet static
        address 192.168.56.103
        netmask 255.255.255.0
        gateway 192.168.100.1
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0


root@n2:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.56.103 n2.seb.fr n2 pvelocalhost
192.168.56.102 n1.seb.fr n1


# The following lines are desirable for IPv6 capable hosts


::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


root@n2:~# ping n1
PING n1.seb.fr (192.168.56.102) 56(84) bytes of data.
64 bytes from n1.seb.fr (192.168.56.102): icmp_req=1 ttl=64 time=2.31 ms
64 bytes from n1.seb.fr (192.168.56.102): icmp_req=2 ttl=64 time=6.87 ms

add n2 to the cluster
Code:
root@n2:~# pvecm nodes
cman_tool: Cannot open connection to cman, is it running ?
root@n2:~#
root@n2:~# pvecm add 192.168.56.102
The authenticity of host '192.168.56.102 (192.168.56.102)' can't be established.
ECDSA key fingerprint is 51:dd:eb:63:4d:82:2d:c3:07:b0:64:d2:6a:cb:6f:7f.
Are you sure you want to continue connecting (yes/no)? yes
root@192.168.56.102's password:
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...^C

log from n1 and n2
Code:
Oct 19 20:38:48 n2 pmxcfs[1940]: [main] notice: teardown filesystem
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: quorum_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pmxcfs[2561]: [confdb] crit: confdb_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pmxcfs[2561]: [dcdb] crit: cpg_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pmxcfs[2561]: [dcdb] crit: cpg_initialize failed: 6
Oct 19 20:38:49 n2 pmxcfs[2561]: [quorum] crit: can't initialize service
Oct 19 20:38:49 n2 pvestatd[2118]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Oct 19 20:38:50 n2 kernel: DLM (built Aug  6 2013 06:53:05) installed
Oct 19 20:38:50 n2 pmxcfs[2561]: [status] crit: cpg_send_message failed: 9
Oct 19 20:38:50 n2 pmxcfs[2561]: [status] crit: cpg_send_message failed: 9
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Corosync Cluster Engine ('1.4.5'): started and ready to provide service.
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Corosync built-in features: nss
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Successfully read config from /etc/cluster/cluster.conf
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Successfully parsed cman config
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Successfully configured openais services to load
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] Initializing transport (UDP/IP Unicast).
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] The network interface [192.168.56.103] is now up.
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Using quorum provider quorum_cman
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Oct 19 20:38:52 n2 corosync[2655]:   [CMAN  ] CMAN 1364188437 (built Mar 25 2013 06:14:01) started
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync CMAN membership service 2.90
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais cluster membership service B.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais event service B.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais checkpoint service B.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais message service B.03.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais distributed locking service B.03.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: openais timer service A.01.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync configuration service
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync profile loading service
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Using quorum provider quorum_cman
Oct 19 20:38:52 n2 corosync[2655]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Compatibility mode set to whitetank.  Using V1 and V2 of the synchronization engine.
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] adding new UDPU member {192.168.56.102}
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] adding new UDPU member {192.168.56.103}
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] New Configuration:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Left:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Joined:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] New Configuration:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] #011r(0) ip(192.168.56.103)
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Left:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] Members Joined:
Oct 19 20:38:52 n2 corosync[2655]:   [CLM   ] #011r(0) ip(192.168.56.103)
Oct 19 20:38:52 n2 corosync[2655]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Members[1]: 2
Oct 19 20:38:52 n2 corosync[2655]:   [QUORUM] Members[1]: 2
Oct 19 20:38:52 n2 corosync[2655]:   [CPG   ] chosen downlist: sender r(0) ip(192.168.56.103) ; members(old:0 left:0)
Oct 19 20:38:52 n2 corosync[2655]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 19 20:38:55 n2 pmxcfs[2561]: [status] notice: update cluster info (cluster name  pm3, version = 2)
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: members: 2/2561
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: all data is up to date
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: members: 2/2561
Oct 19 20:38:55 n2 pmxcfs[2561]: [dcdb] notice: all data is up to date


== on n1 no trace of n2 in the /var/log/syslog ==


Oct 19 20:38:46 n1 pmxcfs[1939]: [dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
Oct 19 20:38:47 n1 corosync[2134]:   [QUORUM] Members[1]: 1
Oct 19 20:38:47 n1 pmxcfs[1939]: [status] notice: update cluster info (cluster name  pm3, version = 2)
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] New Configuration:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] #011r(0) ip(192.168.56.102)
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] Members Left:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] Members Joined:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] CLM CONFIGURATION CHANGE
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] New Configuration:
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] #011r(0) ip(192.168.56.102)
Oct 19 20:38:54 n1 corosync[2134]:   [CLM   ] Members Left:
 
I'm having the same issue, did you found any solution?

EDIT - it seems that this is corosync limitation, you need to restart corosync on all nodes when adding new node to cluster using udpu
 
Last edited: