Quorum Error 500

Llewellyn Pienaar · Jun 18, 2019

Good day,

I am unable to create a post for my issue bellow

I have issues with Quorom 500 error and would like some assistance

Primary Node (Host)

Quorum information
------------------
Date: Tue Jun 18 08:09:17 2019
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1/12
Quorate: No

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.0.10.2 (local)

root@node1 ~ # systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2019-06-15 01:16:42 SAST; 3 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 24669 (corosync)
Tasks: 2 (limit: 4915)
Memory: 44.8M
CPU: 49min 20.724s
CGroup: /system.slice/corosync.service
└─24669 /usr/sbin/corosync -f

Jun 15 01:21:07 cloud02-pbx corosync[24669]: warning [CPG ] downlist left_list: 1 received
Jun 15 01:21:07 cloud02-pbx corosync[24669]: notice [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 15 01:21:07 cloud02-pbx corosync[24669]: notice [QUORUM] Members[1]: 1
Jun 15 01:21:07 cloud02-pbx corosync[24669]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jun 15 01:21:07 cloud02-pbx corosync[24669]: [TOTEM ] A new membership (10.0.10.2:12) was formed. Members left: 2
Jun 15 01:21:07 cloud02-pbx corosync[24669]: [TOTEM ] Failed to receive the leave message. failed: 2
Jun 15 01:21:07 cloud02-pbx corosync[24669]: [CPG ] downlist left_list: 1 received
Jun 15 01:21:07 cloud02-pbx corosync[24669]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Jun 15 01:21:07 cloud02-pbx corosync[24669]: [QUORUM] Members[1]: 1
Jun 15 01:21:07 cloud02-pbx corosync[24669]: [MAIN ] Completed service synchronization, ready to provide service.

Second Node

Quorum information
------------------
Date: Tue Jun 18 08:09:21 2019
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2/1108
Quorate: No

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 10.0.10.3 (local)

root@node22:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2019-06-15 01:49:42 SAST; 3 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1690 (corosync)
Tasks: 2 (limit: 11059)
Memory: 43.1M
CPU: 47min 6.818s
CGroup: /system.slice/corosync.service
└─1690 /usr/sbin/corosync -f

Jun 15 01:54:02 server2 corosync[1690]: [QUORUM] Members[1]: 2
Jun 15 01:54:02 server2 corosync[1690]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 15 01:54:03 server2 corosync[1690]: notice [TOTEM ] A new membership (10.0.10.3:1108) was formed. Members
Jun 15 01:54:03 server2 corosync[1690]: warning [CPG ] downlist left_list: 0 received
Jun 15 01:54:03 server2 corosync[1690]: notice [QUORUM] Members[1]: 2
Jun 15 01:54:03 server2 corosync[1690]: notice [MAIN ] Completed service synchronization, ready to provide service.
Jun 15 01:54:03 server2 corosync[1690]: [TOTEM ] A new membership (10.0.10.3:1108) was formed. Members
Jun 15 01:54:03 server2 corosync[1690]: [CPG ] downlist left_list: 0 received
Jun 15 01:54:03 server2 corosync[1690]: [QUORUM] Members[1]: 2
Jun 15 01:54:03 server2 corosync[1690]: [MAIN ] Completed service synchronization, ready to provide service.

OMPING and Ping Between nodes

root@Node1 ~ # omping -c 10000 -i 1 -q 10.0.10.2 10.0.10.3
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : response message never received

root@Node1~# ping 10.0.10.3
PING 10.0.10.3 (10.0.10.3) 56(84) bytes of data.
64 bytes from 10.0.10.3: icmp_seq=1 ttl=64 time=0.220 ms
64 bytes from 10.0.10.3: icmp_seq=2 ttl=64 time=0.221 ms
64 bytes from 10.0.10.3: icmp_seq=3 ttl=64 time=0.285 ms
64 bytes from 10.0.10.3: icmp_seq=4 ttl=64 time=0.246 ms
64 bytes from 10.0.10.3: icmp_seq=5 ttl=64 time=0.343 ms
64 bytes from 10.0.10.3: icmp_seq=6 ttl=64 time=0.268 ms
64 bytes from 10.0.10.3: icmp_seq=7 ttl=64 time=0.236 ms
64 bytes from 10.0.10.3: icmp_seq=8 ttl=64 time=0.279 ms
^C
--- 10.0.10.3 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 7158ms
rtt min/avg/max/mdev = 0.220/0.262/0.343/0.040 ms

Stoiko Ivanov · Jun 18, 2019

Llewellyn Pienaar said:
root@Node1 ~ # omping -c 10000 -i 1 -q 10.0.10.2 10.0.10.3
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : response message never received

I guess you only ran omping on one of the nodes? - It needs to be run parallel on all nodes at the same time

please post the results of all both omping commands from https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network.
Please use code-tags for command-line output.

Thanks!

Llewellyn Pienaar · Jun 18, 2019

root@Node2:~# omping -c 10000 -i 1 -q 10.0.10.3 10.0.10.2
10.0.10.2 : waiting for response msg
10.0.10.2 : joined (S,G) = (*, 232.43.211.234), pinging

root@Node1 ~ # omping -c 10000 -i 1 -q 10.0.10.2 10.0.10.3
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : joined (S,G) = (*, 232.43.211.234), pinging

Llewellyn Pienaar · Jun 18, 2019

Good day,

Also Node 1 is on 5.4-5 and Node 2 is on 5.4-6

Stoiko Ivanov · Jun 18, 2019

could you please post the complete output of both commands in the documentation link (https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network) inside code-tags - the run should take about 10 minutes for the second command and far less for the first one.

thanks

Llewellyn Pienaar · Jun 18, 2019

omping -c 10000 -i 0.001 -F -q 10.0.10.2 10.0.10.3

Code:

root@Node2:~# omping -c 10000 -i 0.001 -F -q 10.0.10.2 10.0.10.3
10.0.10.2 : waiting for response msg
10.0.10.2 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.10.2 : given amount of query messages was sent

10.0.10.2 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.0                                     90/0.196/0.350/0.031
10.0.10.2 : multicast, xmt/rcv/%loss = 10000/9985/0% (seq>=16 0%), min/avg/max/s                                     td-dev = 0.093/0.202/0.357/0.031

root@cloud02-pbx ~ # omping -c 10000 -i 0.001 -F -q 10.0.10.2 10.0.10.3
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.10.3 : waiting for response msg
10.0.10.3 : server told us to stop

10.0.10.3 :   unicast, xmt/rcv/%loss = 9619/9619/0%, min/avg/max/std-dev = 0.092                                     /0.207/0.345/0.030
10.0.10.3 : multicast, xmt/rcv/%loss = 9619/9619/0%, min/avg/max/std-dev = 0.093                                     /0.211/0.355/0.030

omping -c 600 -i 1 -q 10.0.10.2 10.0.10.3

Code:

root@Node2:~# omping -c 600 -i 1 -q 10.0.10.2 10.0.10.3
10.0.10.2 : waiting for response msg
10.0.10.2 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.10.2 : given amount of query messages was sent

10.0.10.2 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.161/0.265/0.420/0.046
10.0.10.2 : multicast, xmt/rcv/%loss = 600/599/0% (seq>=2 0%), min/avg/max/std-dev = 0.165/0.277/0.436/0.047

root@Node1: ~ # omping -c 600 -i 1 -q 10.0.10.2 10.0.10.3
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : waiting for response msg
10.0.10.3 : joined (S,G) = (*, 232.43.211.234), pinging
10.0.10.3 : given amount of query messages was sent

10.0.10.3 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.130/0.264/0.391/0.034
10.0.10.3 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.134/0.272/0.407/0.036

Llewellyn Pienaar · Jun 18, 2019

Also,

I changed default ssh port so when I want to migrate a VM/CT its gives me error because port 22 is disabled. How can I change the port the cluster communicates on via ssh? also, root user is disabled for ssh

note fake IP used. But IP is the external IP not the Seperated Cluster network IP. Is this correct?

Code:

2019-06-18 13:51:29 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=server2' root@123.3.23.169 /bin/true
2019-06-18 13:51:29 ssh: connect to host 123.3.23.169 port 22: Connection refused
2019-06-18 13:51:29 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

Stoiko Ivanov · Jun 18, 2019

Llewellyn Pienaar said:
changed default ssh port so when I want to migrate a VM/CT its gives me error because port 22 is disabled. How can I change the port the cluster communicates on via ssh? also, root user is disabled for ssh

This is not really supported for PVE, since it relies on connecting between cluster-nodes with ssh-keys and on the default port.
* You could try to disable root-logins without-password (see `man sshd_config`) for disabling root-access with password
* for the different port you might have luck and be able to specify the alternative port in either root's ssh-config or in the systemwide one (`man ssh_config`)

However as said, this is neither a supported setup and thus not a widely tested one

As an alternative you could consider setting up a dedicated network for your corosync and migration traffic and configure ssh to listen on port 22 there (and still enable PermitRootLogin without-password)

hope this helps!

P.S. please open a new thread for a new topic!

Search

Search

Quorum Error 500

Llewellyn Pienaar

Member

Stoiko Ivanov

Proxmox Staff Member

Llewellyn Pienaar

Member

Llewellyn Pienaar

Member

Stoiko Ivanov

Proxmox Staff Member

Llewellyn Pienaar

Member

Llewellyn Pienaar

Member

Stoiko Ivanov

Proxmox Staff Member