High Avaibility/Cluster Issue

Antonino89

Member
Jul 13, 2017
76
1
6
36
Hi guys,

From some days i'm trying to configure High Avaibility between two ProxMox physical servers.
I have problem to keep it up on the second server.

Both server are attached to the same Cisco switch SF-300, one server has ip addredd 192.168.2.50(Server1) and the other one has 192.168.2.55(Server 2) (/24). Servers are able to ping each other.

Configuration i've done:

Server1# pvecm create CLUSTERTEST

Server1 # pvecm add 192.168.2.50

"
root@pve:/# pvecm status
Quorum information
------------------
Date: Thu Jul 13 09:59:53 2017
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/88
Quorate: Yes

Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.2.50 (local)"


On server2:
Server2# pvecm add 192.168.2.50
"node pve already defined
copy corosync auth key
stopping pve-cluster service
backup old database
Job for corosync.service failed. See 'systemctl status corosync.service' and 'journalctl -xn' for details
."


root@pve:/etc/corosync# systemctl status corosync.service

● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
Active: failed (Result: exit-code) since Thu 2017-07-13 10:06:28 CEST; 1min 23s ago
Process: 1286 ExecStop=/usr/share/corosync/corosync stop (code=exited, status=0/SUCCESS)
Process: 6770 ExecStart=/usr/share/corosync/corosync start (code=exited, status=1/FAILURE)
Main PID: 1193 (code=killed, signal=ABRT)

Jul 13 10:05:27 pve corosync[6778]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jul 13 10:05:27 pve corosync[6778]: [QB ] server name: quorum
Jul 13 10:05:27 pve corosync[6778]: [TOTEM ] A new membership (192.168.2.55:88) was formed. Members joined: 1
Jul 13 10:05:27 pve corosync[6778]: [QUORUM] Members[1]: 1
Jul 13 10:05:27 pve corosync[6778]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 13 10:05:27 pve corosync[6778]: [TOTEM ] A new membership (192.168.2.50:92) was formed. Members joined: 1
Jul 13 10:06:28 pve corosync[6770]: Starting Corosync Cluster Engine (corosync): [FAILED]
Jul 13 10:06:28 pve systemd[1]: corosync.service: control process exited, code=exited status=1
Jul 13 10:06:28 pve systemd[1]: Failed to start Corosync Cluster Engine.
Jul 13 10:06:28 pve systemd[1]: Unit corosync.service entered failed state.

root@pve:/etc/corosync# pvecm status
Cannot initialize CMAP service

root@pve:/etc/corosync# more corosync.conf (On the second server)
totem {
version: 2
secauth: on
cluster_name: CLUSTERTEST
config_version: 1
ip_version: ipv4
interface {
ringnumber: 0
bindnetaddr: 192.168.2.50
}
}

nodelist {
node {
ring0_addr: pve
name: pve
nodeid: 1
quorum_votes: 1
}
}

quorum {
provider: corosync_votequorum
}

logging {
to_syslog: yes
debug: off
}


Normal ping is working.

root@pve:/etc/corosync# omping -c 10000 -i 0.001 -F -q 192.168.2.50 192.168.2.55
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
192.168.2.50 : waiting for response msg
^C
192.168.2.50 : response message never received

Can someone please help me to figure out in order to solve the problem and set the HighAvaibility?

Thanks all :)
 
first:
ha with two nodes will not work because of self fencing (if a node looses quorum, it restarts itself), so you should use at least 3 nodes

"node pve already defined
it seems both hosts have the same hostname? this will not work.
please give each server its own (cluster-)unique hostname
 
Thaaaanks.

I changed the servers hostname and /etc/hosts file:

root@Server2:/etc# pvecm add 192.168.2.50
copy corosync auth key
stopping pve-cluster service
backup old database
waiting for quorum...OK
generating node certificates
merge known_hosts file
restart services
successfully added node 'Server2' to cluster.

Now should be okay? How can i test if HighAvaibility works? :)