Unstable cluster

decibel83

Renowned Member
Oct 15, 2008
210
1
83
Hi.
I have a three nodes Proxmox cluster with Proxmox 5.2.
Expecially one node is failing without any apparent reason and it appears red on the web console.
If I recreate the cluster from scratch every machines are green for about 30 minutes, then some nodes become red randomly:

Screenshot 2018-11-29 at 14.56.10.png Screenshot 2018-11-29 at 14.56.05.png Screenshot 2018-11-29 at 14.56.00.png

PVE version is the very same on all nodes, they are all registered with my PVE subscription, they have the same date/time and they are fully upgraded.

PVE Versions:
Code:
root@servir03:~# pveversion -v
proxmox-ve: 5.2-3 (running kernel: 4.15.18-9-pve)
pve-manager: 5.2-12 (running version: 5.2-12/ba196e4b)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.18-7-pve: 4.15.18-27
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-2
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-42
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-32
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-30
pve-docs: 5.2-10
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-41
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

Multicast tests:

Code:
root@servir01:~# omping -c 10000 -i 0.001 -F -q servir01 servir02 servir03
servir02 : waiting for response msg
servir03 : waiting for response msg
servir02 : waiting for response msg
servir03 : waiting for response msg
servir02 : joined (S,G) = (*, 232.43.211.234), pinging
servir03 : waiting for response msg
servir03 : joined (S,G) = (*, 232.43.211.234), pinging
servir02 : given amount of query messages was sent
servir03 : waiting for response msg
servir03 : server told us to stop

servir02 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.060/0.124/0.805/0.044
servir02 : multicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.082/0.164/0.790/0.044
servir03 :   unicast, xmt/rcv/%loss = 9196/9196/0%, min/avg/max/std-dev = 0.044/0.096/0.374/0.024
servir03 : multicast, xmt/rcv/%loss = 9196/9196/0%, min/avg/max/std-dev = 0.048/0.108/0.378/0.030

root@servir02:~# omping -c 10000 -i 0.001 -F -q servir01 servir02 servir03
servir01 : waiting for response msg
servir03 : waiting for response msg
servir01 : joined (S,G) = (*, 232.43.211.234), pinging
servir03 : waiting for response msg
servir03 : joined (S,G) = (*, 232.43.211.234), pinging
servir01 : given amount of query messages was sent
servir03 : waiting for response msg
servir03 : server told us to stop

servir01 :   unicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 0.059/0.122/0.896/0.044
servir01 : multicast, xmt/rcv/%loss = 10000/9999/0%, min/avg/max/std-dev = 0.079/0.149/0.917/0.044
servir03 :   unicast, xmt/rcv/%loss = 9960/9960/0%, min/avg/max/std-dev = 0.046/0.100/0.351/0.024
servir03 : multicast, xmt/rcv/%loss = 9960/9960/0%, min/avg/max/std-dev = 0.052/0.108/0.353/0.028

root@servir03:~# omping -c 10000 -i 0.001 -F -q servir01 servir02 servir03
servir01 : waiting for response msg
servir02 : waiting for response msg
servir01 : joined (S,G) = (*, 232.43.211.234), pinging
servir02 : joined (S,G) = (*, 232.43.211.234), pinging
servir01 : given amount of query messages was sent
servir02 : given amount of query messages was sent

servir01 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.050/0.123/2.420/0.047
servir01 : multicast, xmt/rcv/%loss = 10000/9991/0% (seq>=10 0%), min/avg/max/std-dev = 0.070/0.161/2.490/0.050
servir02 :   unicast, xmt/rcv/%loss = 10000/10000/0%, min/avg/max/std-dev = 0.055/0.126/1.031/0.041
servir02 : multicast, xmt/rcv/%loss = 10000/9991/0% (seq>=10 0%), min/avg/max/std-dev = 0.073/0.166/1.043/0.041

pvecm status:
Code:
root@servir03:~# pvecm status
Quorum information
------------------
Date:             Thu Nov 29 15:54:47 2018
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000003
Ring ID:          1/3465100
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.2.1
0x00000002          1 192.168.2.2
0x00000003          1 192.168.2.3 (local)
root@servir03:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 192.168.2.1
         2          1 192.168.2.2
         3          1 192.168.2.3 (local)

pvecm nodes:

Code:
root@servir03:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 192.168.2.1
         2          1 192.168.2.2
         3          1 192.168.2.3 (local)

Could you help me please?
Thanks!
 
Last edited:
Let the omping run for longer.

Code:
root@servir01:~# omping -c 600 -i 1 -q servir01 servir02 servir03
servir02 : waiting for response msg
servir03 : waiting for response msg
servir02 : waiting for response msg
servir03 : waiting for response msg
servir02 : joined (S,G) = (*, 232.43.211.234), pinging
servir03 : waiting for response msg
servir03 : joined (S,G) = (*, 232.43.211.234), pinging
servir02 : given amount of query messages was sent
servir03 : given amount of query messages was sent

servir02 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.089/0.217/0.371/0.056
servir02 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.126/0.277/0.465/0.061
servir03 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.063/0.161/1.049/0.062
servir03 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.076/0.185/1.097/0.069

root@servir02:~# omping -c 600 -i 1 -q servir01 servir02 servir03
servir01 : waiting for response msg
servir03 : waiting for response msg
servir01 : joined (S,G) = (*, 232.43.211.234), pinging
servir03 : waiting for response msg
servir03 : joined (S,G) = (*, 232.43.211.234), pinging
servir01 : given amount of query messages was sent
servir03 : given amount of query messages was sent

servir01 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.153/0.235/0.381/0.040
servir01 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.205/0.299/0.392/0.032
servir03 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.098/0.222/0.321/0.033
servir03 : multicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.104/0.237/0.346/0.037

root@servir03:~# omping -c 600 -i 1 -q servir01 servir02 servir03
servir01 : waiting for response msg
servir02 : waiting for response msg
servir02 : joined (S,G) = (*, 232.43.211.234), pinging
servir01 : joined (S,G) = (*, 232.43.211.234), pinging
servir01 : given amount of query messages was sent
servir02 : given amount of query messages was sent

servir01 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.070/0.173/0.290/0.041
servir01 : multicast, xmt/rcv/%loss = 600/261/56% (seq>=2 56%), min/avg/max/std-dev = 0.101/0.234/0.333/0.044
servir02 :   unicast, xmt/rcv/%loss = 600/600/0%, min/avg/max/std-dev = 0.085/0.180/0.412/0.048
servir02 : multicast, xmt/rcv/%loss = 600/261/56% (seq>=2 56%), min/avg/max/std-dev = 0.131/0.239/0.413/0.053

Further, the corosync traffic needs to be on a separate physical interface to have a stable cluster.

Yes, I will create a dedicated VLAN for this.

Thank you very much.
Bye
 
Yes, I will create a dedicated VLAN for this.
No, corosync traffic needs to be on a separate physical interface to have a stable cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!