Serious problem of Quorum - Servers are in Production

spirit · May 2, 2013

for my production servers (10 nodes cluster), I use

2 network card - bond active-passive : switches : for my vm lans + proxmox hosts communication
2 network card - bond lacp - other dedicated switches : for storage

But I use separate vlans + network range for proxmox hosts and vms lan.
It's more by security, I don't wan't that my vms have network access to my proxmox hosts.

Now, if you have only 1 vlan for both proxmox host and vms (even if you use differents ip range), all the multicast packets are going to all yours vms,

So i'm not sure about the impact on corosync.
As you have unmanagement switchs, I think you cannot manage vlan ? So maybe it's better to have dedicated nics/swiches for proxmox hosts ?

cesarpk · May 2, 2013

spirit said:
for my production servers (10 nodes cluster), I use

2 network card - bond active-passive : switches : for my vm lans + proxmox hosts communication
2 network card - bond lacp - other dedicated switches : for storage

But I use separate vlans + network range for proxmox hosts and vms lan.
It's more by security, I don't wan't that my vms have network access to my proxmox hosts.

Now, if you have only 1 vlan for both proxmox host and vms (even if you use differents ip range), all the multicast packets are going to all yours vms,

So i'm not sure about the impact on corosync.
As you have unmanagement switchs, I think you cannot manage vlan ? So maybe it's better to have dedicated nics/swiches for proxmox hosts ?

Thanks spirit for your prompt response

But if I have dedicated nics/switches for PVE Hosts, i will owe connect the switches in cascade for manage my PVE Hosts
(Switch for workstations in cascade with Switch for PVEs Hosts), then corosync will make multicast to all machines, VMs and workstations of the LAN

what do you think will be the best for me?

cesarpk · May 2, 2013

Hi spirit or Ditemar

Pleasse, see this link:
http://comments.gmane.org/gmane.linux.pve.devel/2988

On this link you will see the conversation between Dietmar Maurer and Alexandre DERUMIER about of multicast bug on bridges: "corosync, multicast problem because of vmbr multicast_snooping enabled"

I publish this post in order to know their opinions to solve my problem (or at least temporarily)

Best regards
Cesar

Re-Edit: This conversation was since 03 of March 2013 to 08 of March 2013

spirit · May 2, 2013

cesarpk said:
Hi spirit or Ditemar

Pleasse, see this link:
http://comments.gmane.org/gmane.linux.pve.devel/2988

On this link you will see the conversation between Dietmar Maurer and Alexandre DERUMIER about of multicast bug on bridges: "corosync, multicast problem because of vmbr multicast_snooping enabled"

I publish this post in order to know their opinions to solve my problem (or at least temporarily)

Best regards
Cesar

Re-Edit: This conversation was since 03 of March 2013 to 08 of March 2013

should not be a problem for you if you use unmanaged switches. (without multicast snooping/filtering). (This mainly a "bug" with cisco managed switches).

(I Forget to say, I'm Alexandre

cesarpk · May 2, 2013

spirit said:
should not be a problem for you if you use unmanaged switches. (without multicast snooping/filtering). (This mainly a "bug" with cisco managed switches).

(I Forget to say, I'm Alexandre

Thanks again for your answer (I feel I talk to great experts)
But i have bridge on my PVE host, and this is used for cluster communication of my PVE Hosts, then can this be a problem?

Re-Edit: Think that i have in my logs of corosync the same error: TOTEM...etc, Is not it too coincidental?

Re-Edit: Today, more later, i will have only 1 hour for make tests, or discard it and work again on a next occasion, then your words is very important for me at this moment.

Re-Edit: And excuse me if i am reiterative. but i repeat the words of previous post:
If I have dedicated nics/switches for PVE Hosts (like you recommend), i will owe connect the switches in cascade for manage my PVE Hosts.
Switch for workstations in cascade with Switch for PVEs Hosts, then corosync will make multicast to all machines, VMs and workstations of the LAN

what do you think will be the best for me?

Best Regards
Cesar

spirit · May 2, 2013

cesarpk said:
Thanks again for your answer (I feel I talk to great experts)
But i have bridge on my PVE host, and this is used for cluster communication of my PVE Hosts, then can this be a problem?

Think that i have in my logs of corosync the same error: TOTEM...etc, Is not it too coincidental?

Best Regards
Cesar

No, I don't think it's a problem, as this is the default proxmox setup.
But you can try withtout bridge, putting directly host ip on your bond.

Totem error is corosync error when nodes can't communicates. (multicast error, network overload,...)
It's also possible that you have a node a lot slower than other

Maybe can you try to add
<totem window_size="300"/> to your cluster.conf
see:
http://forum.proxmox.com/threads/9743-corosync-364738-TOTEM-Retransmit-List-ca8-ca9-caa-cab
http://www.hastexo.com/resources/hints-and-kinks/whats-totem-retransmit-list-all-about-corosync

dietmar · May 2, 2013

cesarpk said:
4- Additional recommendations?

First, I know that the following does not really help in your situation, but it is usually a good idea to:

- use similar hardware for all nodes (easier to debug)
- use server hardware
- use hardware RAID
- do not run additional services on pve node (no storage server)

cesarpk · May 2, 2013

Hi spirit

A question:

if i have only 2 PVE nodes with HA, is necessary enable rgmanager service in all PVE nodes?

Best regards
Cesar

cesarpk · May 3, 2013

dietmar said:
First, I know that the following does not really help in your situation, but it is usually a good idea to:

- use similar hardware for all nodes (easier to debug)
- use server hardware
- use hardware RAID
- do not run additional services on pve node (no storage server)

Hi Dietmar and Spirit
As I am a fan of PVE, I found the solution to my problem of lack of quorum, and I want to cooperate for PVE with my two cents.

I have three PVEs Nodes in Cluster and are in Production Enviroment, all Nodes are using bond active-backup and bridge, and each node lost Quorum each half hour, and my Switch is unmanaged, then I applied the solution shown by Alexandre according to this link:
http://comments.gmane.org/gmane.linux.pve.devel/2988

As a stress test, in working hours, I even transfer qcow2 files between PVE nodes using the Cluster IP, and still corosync.log not show any mistake in all Nodes.

Re-Edit this line: After these tests, I have concluded that there is a bug between corosync and the bridge and/or bond of PVE Node.

Interestingly using the same hardware (DELL 2900) with my Unmanaged Switch, and with PVE 1.8, i did balance-alb bond, and it works perfectly this setting. But with another DELL 2900 on the same LAN and Switch, I had installed PVE 2.1 (long time ago), I used the same bond settings and had erratic behavior, so I was forced to change it to active-backup configuration. I think it is due to the same problem, I'm not even sure of this, I didn't make test to confirm it due to the servers are in production.

With these guidelines I think I'll give a bit of work to developers.

And finally I would like to make a request:
If in future versions of PVE these problems are solved, please let me know to make the respective changes.

Best regards
Cesar

spirit · May 3, 2013

Thanks to report !

So the solution is to disable multicast_snooping on bridge ?
echo 0 > /sys/class/net/vmbr0/bridge/multicast_snooping

Maybe it's a bug in current kernel with bonding.
multicast snooping is filtering to avoid multicast packets to go to all interfaces, but with active-passive bond I don't known exactly how it's work.

Note that for my production servers, I don't put anymore the host ip on vmbr0, but directly on bond.

I'll try to find more informations about this bug.

spirit · May 3, 2013

Maybe is it related to this :

https://lkml.org/lkml/2012/3/12/65

could be related with igmp queries going or coming to wrong interface (slave) sometime. So igmp snooping randomly cut the multicast traffic.

cesarpk · May 3, 2013

spirit said:
Thanks to report !

So the solution is to disable multicast_snooping on bridge ?
echo 0 > /sys/class/net/vmbr0/bridge/multicast_snooping

Maybe it's a bug in current kernel with bonding.
multicast snooping is filtering to avoid multicast packets to go to all interfaces, but with active-passive bond I don't known exactly how it's work.

Note that for my production servers, I don't put anymore the host ip on vmbr0, but directly on bond.

I'll try to find more informations about this bug.

Yes, this is a temporal solution.
I have active-passive bond + bridge, but i didn't bond tests for try to know that will will pass with corosync (ie disconnect the active NIC), I guess it will work.

If with my poor knowledge, I can help with anything, please let me know.

Best regards
Cesar

cesarpk · May 3, 2013

spirit said:
Maybe is it related to this :

https://lkml.org/lkml/2012/3/12/65

could be related with igmp queries going or coming to wrong interface (slave) sometime. So igmp snooping randomly cut the multicast traffic.

I think so, i'm not sure

cesarpk · May 3, 2013

In two weeks i will make test balance-alb/tlb bond + bridge + PVE Cluster with PVE 2.3 last updates and with my unmanaged Switch 1 Gb/s for see if these test works well.

If you want to know the results, please let me know.

Best regards
Cesar

cesarpk · Jul 18, 2013

spirit said:
Maybe is it related to this :

https://lkml.org/lkml/2012/3/12/65

could be related with igmp queries going or coming to wrong interface (slave) sometime. So igmp snooping randomly cut the multicast traffic.

Hi spirit

Can you help me?
I'll be grateful if you pass me a hand,
because unfortunately no one responded,

In this link I do a request about problems in latest versions of PVE with bridge + balance-alb + unmanaged Switch
http://forum.proxmox.com/threads/14684-Error-only-on-the-latest-versions-of-PVE-making-bonding
Notes:
1- My documents tell me that balance-alb can works in any Switch
2- This link of kernel.org also tell the same "and does not require any special switch support":
https://www.kernel.org/doc/Documentation/networking/bonding.txt

Then i did these tests:
Test 1:
PC-Workstation with NICs Realtek and with: disable multicast_snooping on bridge = I lost the cluster communication
(may be that NICs Realtek are the worst there is)

Test 2:
DELL R-710 and DELL 2900 (NICs are broadcom) and without: disable multicast_snooping on bridge = I lost the cluster communication

Test pending:
The same as the test 2, but disabling the multicast_snooping on bridge,
and as the Servers are in production, for me is very dificult turn them off (I should ask permission first and arrange tee times)

Hoping you can help me, I say see you soon

Best regards
Cesar

bradkollmyer · Nov 4, 2014

Any still seeing this (pve-manager/3.3-5/bfebec03 (running kernel: 2.6.32-33-pve)). I have had my cluster of four identical nodes go down 4 times in the past week with this type of error. I'm using a managed 10gb switch. I have not found a way to make this reproducible. Seems to be happening at random. Network traffic is not saturated.

I have tried setting totem netmtu and window_size:

Code:

<totem netmtu="8982" window_size="50" />

I just set window_size on my last crash.

I will try multicast_snooping fix next.

Search

Search

Serious problem of Quorum - Servers are in Production

spirit

Distinguished Member

cesarpk

Well-Known Member

cesarpk

Well-Known Member

spirit

Distinguished Member

cesarpk

Well-Known Member

spirit

Distinguished Member

dietmar

Proxmox Staff Member

cesarpk

Well-Known Member

cesarpk

Well-Known Member

spirit

Distinguished Member

spirit

Distinguished Member

cesarpk

Well-Known Member

cesarpk

Well-Known Member

cesarpk

Well-Known Member

cesarpk

Well-Known Member

bradkollmyer

Renowned Member

We value your privacy