Proxmox 3.0 Cluster corosync running system out of memory

Odie1

New Member
Jun 24, 2013
5
0
1
We've been running proxmox non-clustered for awhile and clustering up a number of machines and clustering went fairly smoothly until we started noticing a number of OOM and oom-killer taking down host systems. After some digging we've isolated the issue to corosync running a system completely out of memory. The corosync lists don't turn up anything, so maybe others here might have some tips or suggestions.

We've stopped corosync on critical boxes and just isolated to running on 3 machines to maintain quorum.

Memory consumption (over 11 hours) on systems and corosync at 80% memory utilization:
pmox1:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 218488 0.2 81.2 3412584 3273512 S<Lsl 11:34:40 corosync -f


pmox2:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 204975 0.2 81.8 3437340 3298664 S<Lsl 11:39:36 corosync -f


pmox3:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 358776 0.2 80.3 3435464 3296348 S<Lsl 11:38:54 corosync -f

Debian Wheezy

Proxmox versions:
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1

I should add when I turn off corosync I don't run into any OOM issues, but obviously I can't cluster and get the nice integrated management.

Cluster.conf:
<?xml version="1.0"?>
<cluster config_version="18" name="clrdev">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<clusternodes>
<clusternode name="int-proxmox2" nodeid="1" votes="1"/>
<clusternode name="int-proxmox1" nodeid="2" votes="1"/>
<clusternode name="proxmox4" nodeid="3" votes="1"/>
<clusternode name="proxmox3" nodeid="4" votes="1"/>
<clusternode name="proxmox7" nodeid="5" votes="1"/>
<clusternode name="proxmox6" nodeid="6" votes="1"/>
</clusternodes>
<rm/>
</cluster>




The corosync.log is fairly chatty and I've provided a link to make sure this is normal chatter/experience: http://pastebin.com/zf13srf5

If anything else is needed or helpful please let me know.
 
Hello Dietmar,

I accidentally included the old cluster.conf and correct one only has 3 members. I wasn't sure if the messages from corosync were normal or not, but your comment leads me to believe this isn't normal behavior.

It's unclear to me why they would join/leave the cluster, but this looks more of a network problem now. It's also surprising to me that corosync just keeps consuming more memory and eventually leading to an OOM, so not sure what's going on with that but I'll kick it over to the corosync list.

We are using a juniper switch that serves other multicast traffic, but I'll try testing this with another switch to see if I can reproduce or isolate.

Thank you for the response.

Regards,

O
 
It's unclear to me why they would join/leave the cluster, but this looks more of a network problem now.

Yes, this usually indicates a problem with the network (multicast). I guess memory consumption is related to that - although this is clearly a bug (memory leak).
 
I have that problem with the actual version 3.2 too.

Hello Shim,

I investigated the issue and isolated it to a possible bug within corosync itself. The root cause in my case was identified by using a combination of strace and tcpdump to find out exactly what was causing the issue.

The issue was a bit of a pita to isolate and eventually led us to a combination of switch and router configs. In our case the TTL on the switch was stomping the multicast TTL and causing ridiculous amounts of chatter which was a self-induced DOS. Once we addressed the switch configs the issue went away. In addition to that we also identified some problems with router multicast configs being inconsistent. The fix on the router side entailed addressing some issues with multicast-routing and the various PIM (sparse and dense modes) until we found what worked best.

Our situation was a lab setup so as is the case with many lab setups it can be inconsistent but this did help expose an interesting corner case. No bug has been filed with corosync since I moved on to other projects.

I am curious to hear what your investigations lead to and I also highly recommend turning up your logging to isolate the issue.

Good luck and hope this helps.

Regards,
Omar
 
Last edited:
Thanks for your reply,
The issue was a bit of a pita to isolate and eventually led us to a combination of switch and router configs. In our case the TTL on the switch was stomping the multicast TTL and causing ridiculous amounts of chatter which was a self-induced DOS. Once we addressed the switch configs the issue went away. In addition to that we also identified some problems with router multicast configs being inconsistent. The fix on the router side entailed addressing some issues with multicast-routing and the various PIM (sparse and dense modes) until we found what worked best.

I don't use multicast and I don't have access to the swicht configs. Is there any possibility to fix it by another way ?
 
Thanks for your reply,


I don't use multicast and I don't have access to the swicht configs. Is there any possibility to fix it by another way ?

This one may be a hack but when I was proving this out I set up an iptables rule to increase the TTL for corosync communication which in my case was multicast but I was also able to prove it out by increasing the TTL on unicast as well.

You could also try playing with sending traffic on loopback although it would defeat purpose of clustering but could help prove things out in the meantime.

Another last resort is just using a crossover cable between the two servers (yeah, not ideal) and setting a dedicated interface for testing two server communication or if you have a cheap switch for multiple nodes.

I've done all the above since it took awhile for our network admin to implement changes.

Regards,
Omar
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!