[SOLVED] Proxmox 4. "No quorum" error.

pumko-adm

New Member
Apr 16, 2014
10
0
1
Hi all!
I have a few Proxmox servers in the cluster. Today, one server has dropped out of the cluster and returns an error "cluster not ready - no quorum".
Tell me, please, where and what to see, what would solve the problem?

Code:
root@R38P-01-VS3:~# pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.15-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
fence-agents-pve: not correctly installed

Code:
root@:~# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: failed (Result: timeout) since Thu 2016-09-08 18:47:00 IRKT; 12min ago
  Process: 5362 ExecStart=/usr/share/corosync/corosync start (code=killed, signal=TERM)

Sep 08 18:45:29 servername corosync[5382]: [MAIN  ] Corosync Cluster Engine ('2.4.0'): started and ready to provide service.
Sep 08 18:45:29 servername corosync[5382]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
Sep 08 18:47:00 servername systemd[1]: corosync.service start operation timed out. Terminating.
Sep 08 18:47:00 servername corosync[5362]: Starting Corosync Cluster Engine (corosync):
Sep 08 18:47:00 servername systemd[1]: Failed to start Corosync Cluster Engine.
Sep 08 18:47:00 servername systemd[1]: Unit corosync.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.
 
execute on all nodes

omping -c 10000 -i 0.001 -F -q node1ip node2ip node3ip
 
root@:~# omping -c 10000 -i 0.001 -F -q 10.38.136.161 10.38.136.162 10.38.136.164
omping: Can't find local address in arguments

UPD:
may be possible to recreate the cluster?
 
Last edited:
can you please send the output of ifconfig.
 
Code:
root@:~# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:15:17:77:6a:9c
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:872057 errors:0 dropped:0 overruns:0 frame:0
          TX packets:134368 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:177066103 (168.8 MiB)  TX bytes:40110225 (38.2 MiB)
          Interrupt:18 Memory:b8820000-b8840000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:30834 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30834 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:12115140 (11.5 MiB)  TX bytes:12115140 (11.5 MiB)

vmbr0     Link encap:Ethernet  HWaddr 00:15:17:77:6a:9c
          inet addr:10.38.136.163  Bcast:10.38.139.255  Mask:255.255.252.0
          inet6 addr: fe80::215:17ff:fe77:6a9c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:852311 errors:0 dropped:35 overruns:0 frame:0
          TX packets:130934 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:160151578 (152.7 MiB)  TX bytes:39182819 (37.3 MiB)
 
Solved a problem.
Servers are not in the domain. I make a record of all the servers in the /etc/hosts - then it worked. Thank you!