Hello All,
We've been using proxmox in standalone mode for awhile and recently moved to the clustered setup per various guides on website. The setup went without any problems and unified management looks good. The problem we started having after setup were due to OOM and OOM-killer randomly killing all the servers. After some digging we isolated the issue to corosync running the system out of memory. After searching on corosync lists and newsgroups nothing was found, so curious if anyone here has any tips or suggestions on dealing with this issue.
Due to us having these problems I've turned off corosync on all but 3 nodes due to OOM to continue investigating.
Distro: Wheezy
Memory consumption over 12hours:
pmox1:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 218488 0.2 81.2 3412584 3273512 S<Lsl 11:34:40 corosync -f
pmox2:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 204975 0.2 81.8 3437340 3298664 S<Lsl 11:39:36 corosync -f
pmox3:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 358776 0.2 80.3 3435464 3296348 S<Lsl 11:38:54 corosync -f
Proxmox Version:
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
cluster.conf:
<?xml version="1.0"?>
<cluster config_version="18" name="clrdev">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<clusternodes>
<clusternode name="int-proxmox2" nodeid="1" votes="1"/>
<clusternode name="int-proxmox1" nodeid="2" votes="1"/>
<clusternode name="proxmox4" nodeid="3" votes="1"/>
<clusternode name="proxmox3" nodeid="4" votes="1"/>
<clusternode name="proxmox7" nodeid="5" votes="1"/>
<clusternode name="proxmox6" nodeid="6" votes="1"/>
</clusternodes>
<rm/>
</cluster>
Partial cluster/daemon.log
http://pastebin.com/zf13srf5
Thanks,
Omar
We've been using proxmox in standalone mode for awhile and recently moved to the clustered setup per various guides on website. The setup went without any problems and unified management looks good. The problem we started having after setup were due to OOM and OOM-killer randomly killing all the servers. After some digging we isolated the issue to corosync running the system out of memory. After searching on corosync lists and newsgroups nothing was found, so curious if anyone here has any tips or suggestions on dealing with this issue.
Due to us having these problems I've turned off corosync on all but 3 nodes due to OOM to continue investigating.
Distro: Wheezy
Memory consumption over 12hours:
pmox1:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 218488 0.2 81.2 3412584 3273512 S<Lsl 11:34:40 corosync -f
pmox2:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 204975 0.2 81.8 3437340 3298664 S<Lsl 11:39:36 corosync -f
pmox3:
USER PID %CPU %MEM VSZ RSS STAT ELAPSED COMMAND
root 358776 0.2 80.3 3435464 3296348 S<Lsl 11:38:54 corosync -f
Proxmox Version:
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
cluster.conf:
<?xml version="1.0"?>
<cluster config_version="18" name="clrdev">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<clusternodes>
<clusternode name="int-proxmox2" nodeid="1" votes="1"/>
<clusternode name="int-proxmox1" nodeid="2" votes="1"/>
<clusternode name="proxmox4" nodeid="3" votes="1"/>
<clusternode name="proxmox3" nodeid="4" votes="1"/>
<clusternode name="proxmox7" nodeid="5" votes="1"/>
<clusternode name="proxmox6" nodeid="6" votes="1"/>
</clusternodes>
<rm/>
</cluster>
Partial cluster/daemon.log
http://pastebin.com/zf13srf5
Thanks,
Omar