rgmanager deadlock

e100 · Feb 27, 2012

If you do not have redundant switches and network ports for your cluster communications this is an issue you will want to be aware of.

I am communicating over bonded interfaces over redundant switches with redundant power supplies connected to redundant power.
So hopefully this issue never happens to me in real life.

But we all know that sometimes things go wrong so I wanted to know what will happen if my switches both failed at the same time.

So I disconnected all the network cables.
First obvious thing is that quorum is lost, that was expected.
But then some unexpected things happened when I reconnected all the cables.

Quorum came back, but rgmanager is missing!:

Code:

# clustat
Timed out waiting for a response from Resource Group Manager
Cluster Status for kmitestcluster @ Mon Feb 27 12:28:35 2012

Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vm5                                                                 1 Online
 vm6                                                                 2 Online, Local
 disaster                                                            3 Offline

Then after a few minutes I get kernel messages!

Code:

INFO: task rgmanager:3378 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager     D ffff8806310f2c80     0  3378      1    0 0x00000000
 ffff88062d7db9d0 0000000000000046 0000000000000000 ffff88062d7db948
 000000018109b3dd ffff880000000000 ffff88063fc0c300 ffff88004179e200
 0000000000000069 ffff8806310f3220 ffff88062d7dbfd8 ffff88062d7dbfd8
Call Trace:
 [<ffffffff8104d31d>] ? check_preempt_curr+0x6d/0x90
 [<ffffffff814ff235>] rwsem_down_failed_common+0x95/0x1d0
 [<ffffffff814ff3c6>] rwsem_down_read_failed+0x26/0x30
 [<ffffffff8126b8c4>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffff814fe8b4>] ? down_read+0x24/0x30
 [<ffffffffa052872d>] dlm_clear_proc_locks+0x3d/0x2a0 [dlm]
 [<ffffffff811a63ef>] ? destroy_inode+0x4f/0x60
 [<ffffffff811a26a5>] ? __d_free+0x45/0x60
 [<ffffffffa0533c66>] device_close+0x66/0xc0 [dlm]
 [<ffffffff8118cea5>] __fput+0xf5/0x280
 [<ffffffff8118d055>] fput+0x25/0x30
 [<ffffffff811885dd>] filp_close+0x5d/0x90
 [<ffffffff8106dbbf>] put_files_struct+0x7f/0xf0
 [<ffffffff8106dc83>] exit_files+0x53/0x70
 [<ffffffff8106f86d>] do_exit+0x1ad/0x920
 [<ffffffff81070038>] do_group_exit+0x58/0xd0
 [<ffffffff81086606>] get_signal_to_deliver+0x1f6/0x470
 [<ffffffff8100a335>] do_signal+0x75/0x800
 [<ffffffff8125e131>] ? cpumask_any_but+0x31/0x50
 [<ffffffff810b2ddb>] ? sys_futex+0x7b/0x170
 [<ffffffff8100ab50>] do_notify_resume+0x90/0xc0
 [<ffffffff8100b451>] int_signal+0x12/0x17

I attempted all sorts of things to recover from this but the bottom line is that rgmanager can not be stopped so there is not way to recover other than rebooting every single node one by one.
Turn off as many services as possible.
Then run reboot twice, once will not work since it will hang trying to stop rgmanager.

dietmar · Feb 28, 2012

And fencing is working? Any hints in /var/log/cluster/*

e100 · Feb 28, 2012

dietmar said:
And fencing is working? Any hints in /var/log/cluster/*

Fencing works, we are using APC PDU.
In this case, since quorum is lost, I would not expect fencing to kick in.

I looked through the logs and did not see any clues.
Very easy to reproduce, just disrupt the cluster communication between all nodes until quorum is lost then restore the communication.

dietmar · Feb 28, 2012

e100 said:
Very easy to reproduce, just disrupt the cluster communication between all nodes until quorum is lost then restore the communication.

Would you mind to file a bug at bugzilla.proxmox.com?

e100 · Feb 29, 2012

https://bugzilla.proxmox.com/show_bug.cgi?id=105

dietmar · Feb 29, 2012

e100 said:
https://bugzilla.proxmox.com/show_bug.cgi?id=105

Thanks.

Lee · Apr 5, 2012

I've come across the same issue whilst testing with a 2 node cluster, it seems if you lose network on either node, you end up having to reboot? :S

e100 · Apr 5, 2012

Lee said:
I've come across the same issue whilst testing with a 2 node cluster, it seems if you lose network on either node, you end up having to reboot? :S

I have only been able to repeat this when all nodes can not talk to each other.
With a 2 node cluster if one fails, then all nodes can not talk good reason to have three nodes at a minimum.

Lee · Apr 5, 2012

Yeah...our production set up has 22 nodes, have just been trying out 2.0 with some spare servers in the office - spose I'll have to find an extra one.

giuseppe.torrisi · Sep 16, 2012

Hi,
i have the same problem "deadlock rgmanager" if all nodes lose connection between themes.
when there will be a patch for debian ?
Greetings
Giuseppe

dietmar · Oct 4, 2012

I verified with developers, this is 'expected' behavior.

See https://bugzilla.proxmox.com/show_bug.cgi?id=105

e100 · Oct 4, 2012

Having an explanation of why this happens is helpful.

Essentially rgmanager expects that all nodes need fenced if quorum is lost but fence never happens because there is no quorum so you need to do it yourself.

dietmar · Oct 4, 2012

e100 said:
Having an explanation of why this happens is helpful.

In short, you have several partitions with different fencing state, and state merge is not possible.

Kerrigan · Jun 30, 2014

Any progress on this thread?
I have the exact same problem.
I don't think that it is expected behaviour, when you have to restart your whole cluster in order to get rgmanager working again.

I'd be grateful for any hints.

cesarpk · Jul 1, 2014

dietmar said:
I verified with developers, this is 'expected' behavior.

See https://bugzilla.proxmox.com/show_bug.cgi?id=105

Dietmar, correct that this is the 'expected' behavior, but i think that rgmanager need a subsystem of auto recuperation with a set of logical operations for get a successful recovery in this case (be solid as a rock).

For other hand, i don't know if i will have some problems, but soon i will have to move all my PVE servers that are in a PVE Cluster, of a locality to other locality, and obviously PVE servers will be turned off for do this move.

Some recommendations for the moment of turn it on?
Notes:
- My PVE servers in cluster PVE have versions 2.3, 3.1 and 3.2
- The Backup of VMs are with NFS and between these same servers (crusaders backups)

Best regards
Cesar

spirit · Jul 1, 2014

Hi,

I see that redhat have replaced rgmanager by pacemaker in rhel7, I don't known if they are advantages for this problem.

cesarpk · Jul 1, 2014

spirit said:
I see that redhat have replaced rgmanager by pacemaker in rhel7, I don't known if they are advantages for this problem.

Good change for RHEL, but PVE will continue using openais?

spirit · Jul 1, 2014

cesarpk said:
Good change for RHEL, but PVE will continue using openais?

proxmox use corosync, not openais.

the change in rhel7 is corosync + pacemaker , instead corosync + rgmanager

cesarpk · Jul 2, 2014

spirit said:
proxmox use corosync, not openais.

the change in rhel7 is corosync + pacemaker , instead corosync + rgmanager

Spirit, thanks for the answer, but i have in my PVE nodes these programs:
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3

What is the mission of each one?

spirit · Jul 2, 2014

cesarpk said:
Spirit, thanks for the answer, but i have in my PVE nodes these programs:
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3

What is the mission of each one?

OH yes, sorry, openais is used too. I found an explain on internet:

"clustering include two part:1.cluster resource management
2.infrastructure with massaging layer
legacy heartbeat is broken into heartbeat message layer and pacemaker so pacemaker is CRM.
and we have two option on message layer:heartbeat,openais. openais/corosync is preferred as:http://comments.gmane.org/gmane.linux.highavailability.user/32355
There are, however, features in Pacemaker that require OpenAIS which will work only with Corosync, not Heartbeat. Those features are concerned with the distributed lock managers used by cLVM (but not regular LVM), GFS/GFS2, and OCFS2. If you need that functionality, you must select OpenAIS/Corosync. If you do not, you're free to choose.
as: http://www.clusterlabs.org/wiki/FAQ
Originally Corosync and OpenAIS were the same thing. Then they split into two parts... the core messaging and membership capabilities are now called Corosync, and OpenAIS retained the layer containing the implementation of the AIS standard.
Pacemaker itself only needs the Corosync piece in order to function, however some of the applications it can manage (such as OCFS2 and GFS2) require the OpenAIS layer as well.
so i went to openais/corosync and integrate it with pacemaker.
"

rgmanager deadlock

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

New Member

Renowned Member

New Member

New Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

New Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member

Well-Known Member

Distinguished Member