"Waiting for quorum" in Proxmox VE 4.0 Beta 2

Sep 12, 2015
25
0
41
Hi.

I'm from a small company in Portugal. We use Proxmox VE 3.4 in production with 3 community licenses connected to a redundant NFS servers. We are ver pleased with proxmox.
We are thinking about move to 4 different servers running Proxmox 4 and Ceph.

So, I have a Windows PC with an AMD quad core and 16 GB of Ram and an SSD to test proxmox 4 and ceph i'm using Vmware Workstation 12.0 with nested virtualization.

I tested Beta 1. Formed a cluster and all went well. But Beta 1 doesn't have ceph packages, so i waited for Beta 2.

I did a clean install with Beta 2. Four nodes in vmware workstation, each one having 2.5 GB of Ram, 2 boot disk in ZFS Raid 1 and 6 disks for Ceph. Software wise i did an apt-get update && apt-get upgrade on all the four nodes and installed open-vm-tools on all four nodes.

Everything went well, until i tried to join the second node to the cluster.

To form the cluster i ran:
Code:
pvecm create cluster
In the fist node, and the ran:
Code:
pvecm add 192.168.10.221
in the second node.

The second node waits eternally "waiting for quorum"

And the command:
Code:
pvecm status
Shows "Quorum: 2 Activity blocked"

Can someone help me with this problem? I know vmware products with nested virtualization are not supported for testing a Proxmox Cluster, but this problem didn't happen with Beta 1 and it's the only way that i have to test Proxmox.

Thanks and sorry for my poor english. :)
 
Last edited:
Hi.

Just to add some more information about this problem:

I have the Proxmox VE 4.0 Beta 1 iso, so i installed the Beta 1 on the four nodes that i am using, created the cluster on node one and joined the other three nodes to the cluster without any problems. The four nodes were in the cluster and there was quorum.
Then i run a "apt-get update && apt-get upgrade" on node one to upgrade to Beta 2. In the end i rebooted the first node and following the boot, node one doesn't have quorum anymore. I upgraded the other three nodes, rebooted, but now i don't have quorum on the cluster.

Something on the Beta 2 does this to the cluster, possibly only on Vmware nested virtualization.

The forum doesn't permit me to post part of the syslog. "You are not allowed to post any kinds of links, images or videos until you post a few times."

Can someone help? What logs can i see to find what is the problem in Beta 1?

Thanks in advance.

EDIT: I see this in the syslog

Sep 12 20:46:26 prox1 pmxcfs[1046]: [quorum] crit: quorum_initialize failed: 2
Sep 12 20:46:26 prox1 pmxcfs[1046]: [quorum] crit: can't initialize service
Sep 12 20:46:26 prox1 pmxcfs[1046]: [confdb] crit: cmap_initialize failed: 2
Sep 12 20:46:26 prox1 pmxcfs[1046]: [confdb] crit: can't initialize service
Sep 12 20:46:26 prox1 pmxcfs[1046]: [dcdb] crit: cpg_initialize failed: 2
Sep 12 20:46:26 prox1 pmxcfs[1046]: [dcdb] crit: can't initialize service
Sep 12 20:46:26 prox1 pmxcfs[1046]: [status] crit: cpg_initialize failed: 2
Sep 12 20:46:26 prox1 pmxcfs[1046]: [status] crit: can't initialize service
 
Last edited:
Hi,

if you multicast is working
you can use omping to test this.

You can now send links.
 
Hi,

if you multicast is working
you can use omping to test this.

You can now send links.

Hi, thanks for the answer.

I tested omping on the four nodes, and here are the results:

192.168.10.222 : unicast, xmt/rcv/%loss = 17/17/0%, min/avg/max/std-dev = 0.468/1.254/3.758/0.753
192.168.10.222 : multicast, xmt/rcv/%loss = 17/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
192.168.10.223 : unicast, xmt/rcv/%loss = 14/14/0%, min/avg/max/std-dev = 0.346/1.029/1.869/0.475
192.168.10.223 : multicast, xmt/rcv/%loss = 14/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000
192.168.10.224 : unicast, xmt/rcv/%loss = 12/12/0%, min/avg/max/std-dev = 0.406/6.728/67.749/19.225
192.168.10.224 : multicast, xmt/rcv/%loss = 12/0/100%, min/avg/max/std-dev = 0.000/0.000/0.000/0.000

If i read this correcty, multicasting is not working on my test lan.

Two questions:
- Does the cluster need multicast to work?
- Is this a change from Beta1? In Beta1 i had a working cluster, but when i updated or clean installed to Beta2, the cluster stop working. Multicast is requiered in Beta2?

Thanks in advance. :)
 
Just to add more information.

I installed again Proxmox 4.0 Beta1 on the nodes and in Beta1, with omping i have 100% success with multicast. Then, i updated the nodes to Beta2 and now i have 100% loss on multicast with omping.
Just to reiterate Beta1 100% success in multicasting. Beta2 100% loss in multicasting.

Vmware uses the e1000 network card.

What can i do now? File a bug in bugzilla?

Thanks again.
 
Using unicast must be setup manually and is not recommended.

change the type of network card.

this locks like your network card do not work correct.
 
Last edited by a moderator:
Using unicast must be setup manually and is not recommended.

change the type of network card.

this locks like your network card do not work correct.

Hi,

Tried the following network cards in vmware: "e1000e", "vlance" and "vmxnet3".

None of them permit me to do multicast. Only unicast, with Beta2. Beta1 works ok.

Do you think this might be a Vmware bug, even if Beta1 works ok and Beta2 doesn't work?

Thanks.
 
Pleas try to downgrade the kernel to 4.1, to ensure that is no VMware bug.
 
Hi.

I tested with kernel 4.1 and the latest, kernel 4.2.

With Kernel 4.1, 100% success with multicasting:
Screenshot-from-20150916-1439371442411051.png


With Kernel 4.2, 100% loss with multicasting:
Screenshot-from-20150916-1441471442411057.png


A Kernel 4.2 bug? What i find strange is that no one else is reporting problems. Can it a combination of Kernel 4.2 and vmware?
If you want me to, i can run other tests, if that helps.

Thanks.
 
There are some Hw related bugs with this kernel. I inform you if we have a new version to test.
 
There are some Hw related bugs with this kernel. I inform you if we have a new version to test.

Hi. I will be viewing this thread and doing "apt-get update && apt-get upgrade" on the nodes.
If you need other tests, i'm available because these are not production machines.

Thanks for your support. :)
 
Hi. I tested with kernel 4.1 and the latest, kernel 4.2. With Kernel 4.1, 100% success with multicasting: Hi, I am experiencing exactly the same situation as you. Could you tell me please how did you downgrade the kernel from 4.2 to 4.1. Is there any document over there which explains it? Thanks.
 
Hi, I am experiencing exactly the same situation as you. Could you tell me please how did you downgrade the kernel from 4.2 to 4.1. Is there any document over there which explains it? Thanks.

Hi

Code:
wget http://download.proxmox.com/debian/dists/jessie/pvetest/binary-amd64.beta1/pve-kernel-4.1.3-1-pve_4.1.3-7_amd64.deb
dpkg -i pve-kernel-4.1.3-1-pve_4.1.3-7_amd64.deb

At the Grub Menu select "Advanced options for Proxmox Virtual Environment GNU/Linux". And then select kernel 4.1.3.
 
There are some Hw related bugs with this kernel. I inform you if we have a new version to test.


Hi,

I noticed that you released a new version of the kernel (4.2.0-13).

I tested it and the results are really strange. Doing an omping between two nodes, multicasting worked for a few seconds. Then my first node gave me a kernel panic and rebooted. Since then, only unicast works. :(

Thanks again.
 
Thank you very much Nemesis11. It works perfect now and I've been able to create the cluster without any trouble with kernel 4.1
 
Hi.

I tested with kernel 4.2.1-1 and the problem is the same. No kernel panic, but only unicast. No multicast inside Vmware workstation and possibly with other vmware products.

Thanks.
 
Hi.

I tested with kernel 4.2.1-1 and the problem is the same. No kernel panic, but only unicast. No multicast inside Vmware workstation and possibly with other vmware products.

Thanks.

one of my 4.0 beta2 test cluster inside VMware Workstation 11.1.2 build-2780323 (on win7) works. the network uses bridge mode.

how do you test multicast? with omping. Which VWware version do you run exactly?
 
one of my 4.0 beta2 test cluster inside VMware Workstation 11.1.2 build-2780323 (on win7) works. the network uses bridge mode.

how do you test multicast? with omping. Which VWware version do you run exactly?


Hi. Thanks for the reply. :)

I use Vmware Workstation 12.0.0 build-2985596 inside Windows 10. Initially i tested multicast with proxmox clustering between 4 Virtual Machines. With beta 1 and beta 2 with kernel 4.1 clustering runs fine and omping runs with no dropped packages in multicast.
With Beta 2 and the three kernel 4.2 versions, clustering doesn't work and omping fails with multicast. Only unicast works.
With kernel version 4.2.0-13 initially multicast run fine, but at the end of 5 or 6 multicast packages, Virtual Machine 1 crashed and rebooted. I think it was a kernel panic or something similar.
Since then, no multicast packages work with omping, including kernel 4.2.1-1.
If i boot with kernel 4.1, everything works.

I use also network bridge. I tried e1000, e1000e, vmxnet3 and vlance as ethernet cards, but no luck.

I don't know what else to do.

Thanks one more time. :)
 
Not seeing this on real HW with beta 2 it seems:

root@n2:/# uname -a
Linux n2 4.2.0-1-pve #1 SMP Mon Sep 21 10:49:08 CEST 2015 x86_64 GNU/Linux

n1 : unicast, xmt/rcv/%loss = 301/301/0%, min/avg/max/std-dev = 0.085/0.130/0.258/0.028
n1 : multicast, xmt/rcv/%loss = 301/301/0%, min/avg/max/std-dev = 0.091/0.138/0.251/0.030
 
Hi, I noticed that you released a new version of the kernel (4.2.0-13). I tested it and the results are really strange. Doing an omping between two nodes, multicasting worked for a few seconds.
I have the same problem with kernel 4.2 multicast stop working. I did
Code:
service networking restart
multicast start working but after few seconds multicast again stop working. So my cluster is broken nice.

Problem occurs with BRIDGE (VMBR) I didn't notice any multicast problem with ETH or BOND.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!