Best strategy to handle strange JVM errors inside VPS

Confirmed for me too.
Zimbra runs flawlessly and rock solid for 60 hours with only the kernel downgraded to 2.6.32-4; all the rest is still Proxmox 1.9.

pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1dso1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
 
I created topic according this problems on the OpenVz forum http://forum.openvz.org/index.php?t=rview&goto=43571&th=10025#msg_43571
....with no response...

But it seems there is perhaps a fix for this problem. Promox 1.9 latest version (update).
I assigned each OpenVZ container with a Java WebApp 2 CPUs. Now apps like Alfresco and Liferay are running without crash or hanging. With only 1 CPU Liferay dies after an hour. (1)
I will monitor the Container for the next days, if this a perhaps a real fix, or the dead of the JVM is only delayed.

Our aim to migrate from VMware to PROXMAX will be delayed this problem is really solved.

(1) Test Scenario -
i) Create a Debain Container (Proxmox Template) - 2GB RAM and only 1 CPU
ii) apt-get install openjdk-6-jre
iii) Download Liferay Portal Community Edition with TOMCAT from http://www.liferay.com/de/downloads/liferay-portal/available-releases
iv) unzip liferay-portal-tomcat-6.0.6-20110225.zip to /opt
v) start app: /opt/liferay-portal-6.0.6/tomcat-6.0.29/bin/startup.sh
vi) access portal http://container-ip:8080
... wait for an hour - system will crash
search for the JVM dump like: /opt/liferay-portal-6.0.6/tomcat-6.0.29/bin/hs_err_pid7650.log
 
just tried, got immediately a failcount on privvmpages which is out-of-memory.
 
Hello,

I've tried the Pezi's workaround, and I confirm, its works for us as well. To answer to Tom, in ours cases, we've always got 0 failcount.

1. we start the container with only 1 CPU, the JVM dies.
PHP:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (objectMonitor.cpp:1559), pid=1996, tid=140506417796864
#  guarantee(_recursions == 0) failed: invariant
#
# JRE version: 6.0_26-b03
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.1-b02 mixed mode linux-amd64 compressed oops)
# An error report file with more information is saved as:
# /root/hs_err_pid1996.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted
and
PHP:
       uid  resource                     held              maxheld              barrier                limit              failcnt
            kmemsize                 18055460             23142400  9223372036854775807  9223372036854775807                    0
            privvmpages                 60263               563969              1048576              1061076                    0
            physpages                   88778               407884                    0  9223372036854775807                    0
            vmguarpages                     0                    0              1048576  9223372036854775807                    0
            oomguarpages                14097               329674              1048576  9223372036854775807                    0

Now, I stop the container, change the number of cpu to 2 (or more), start it and the JVM works flawless.
 
Hello,

I've tried the Pezi's workaround, and I confirm, its works for us as well. ..

did I miss something? what is the workaround?
 
Ok - the memory is to small. During my short tests there was no failcount on privvmpages.
Please change the memory to a higher value e.g. 3GB mem/3B swap - tested now.
No failcount on privvmpages. But the JVM crash or hangs after one hour.

On the other hand - the failcount on privvmpages problem should only lead to an out of memory related error - no crash. Is this correct?
 
if you got out of memory the kernel kill processes (OOM killer). the upcoming 2.0 beta (available very soon) will support vswap, would be interesting how this works here and if it solved some issues with container memory handling with java.
 
Tom, the problem described here is not related to failcount, it's something other.
Please see my first post above.

Using a vswap enabled configuration (based on samples in /etc/vz/conf), I achieve to get vswap and no failcount at all (of course, as most is set as unlimited), but with kernel 2.6.32-6 JVM simply stop responding (after various periods of time), with kernel 2.6.32-4 everything works rock solid. In my case, no message is available at all, nowhere, nor on the host, nor in the container, it simply stops answering.

Some have JVM/apps stop responding (my case - all processes are still there but nothing works), others have JVM/apps crash.

If you take pezi's use case and either set high enough ram/swap or my config based on vswap, you'll see the problem.
All failcount stays at 0, but JVM dies or stop responding.

2 solutions exist for this right now:
- rollback kernel to 2.6.32-4 (tested this, Zimbra still runs flawlessly after 6 days)
- set 2 cpu in the container (not tested my self)
 
Do we have any progress with this bug? A use case has been defined; do you need another one (I can describe Zimbra installation that would raise the problem, but it's a little bit more complicated than pezi's use case)?
 
anyone tested on 2.0 beta? (uses vswap)

as there exists a workaround for 1.9 I think we should concentrate on vswap on 2.0.
 
I haven't tested the 2.0 (no spare machine) nor the second workaround (2 CPU per VM with Java), only the downgrade of kernel (not a real workaround though). I already use a vswap enabled config with 1.8 and 1.9 but the problem with JVM also exists with this configuration and kernel -6. Will this be different with 2.0?
 
yes, only 2.0 uses vswap.

for 1.9 with 2.6.32-6, just add full cpu power (number depends on your hardware) as it is with 2.6.32-4 (and 2.6.32-5). again, the old kernel always assigned all available cpu´s to the container, only 2.6.18 respected the cpu setting.
 
Tom,

As you've asked, I've done some quick tests.

I've done an vzdump (proxmox 1.8) and vzrestore on proxmox 2.0 of some VMs (with and without JVM).
The host have 8 cores (2x Intel Xeon) and 16Gb of memory. There is only one VM running on the host at the same time.

Proxmox 2.0 has been installed on a clean debian squeeze instalation (not from proxmox cdrom).
PHP:
root@blade400:~# uname -a
Linux blade400 2.6.32-6-pve #1 SMP Mon Sep 26 10:35:47 CEST 2011 x86_64 GNU/Linux
root@blade400:~# pveversion -v
pve-manager: 2.0-7 (pve-manager/2.0/de5d8ab1)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-46
pve-kernel-2.6.32-6-pve: 2.6.32-46
lvm2: 2.02.86-1pve1
clvm: 2.02.86-1pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.5.1-1
redhat-cluster-pve: 3.1.7-1
pve-cluster: 1.0-9
qemu-server: 2.0-2
pve-firmware: 1.0-13
libpve-common-perl: 1.0-6
libpve-access-control: 1.0-1
libpve-storage-perl: 2.0-4
vncterm: 1.0-2
vzctl: 3.0.29-3pve2
vzdump: 1.2.6-1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.1-1

I'm able to run without trouble the VMs without JVM (zabbix, mysql server, or drupal for example), but not the VMs with JVM.
As long I stay with 1 CPU, the JVM crash o freeze, even with 8Gb ram and 8Gb of swap or 16Gb of ram. If I'm adding a 2d CPU it works.
 
I am seeing the same thing. With 2.0 all kinds of weird things happen with JVM/Tomcat stuff. When I add a second CPU to the instance it works fine.


pve-manager: 2.0-38 (pve-manager/2.0/af81df02)
running kernel: 2.6.32-7-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-6-pve: 2.6.32-55
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-23
qemu-server: 2.0-25
pve-firmware: 1.0-15
libpve-common-perl: 1.0-17
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-12
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-5
ksm-control-daemon: 1.1-1
root@proxmox2:~# uname -a
Linux proxmox2 2.6.32-7-pve #1 SMP Thu Feb 16 09:00:32 CET 2012 x86_64 GNU/Linux
root@proxmox2:~#
 
I can confirm this issue.. I wish I had found this post earlier as I have been spending quite a bit of time with my application Vendor to solve this problem.

I have a debian 6 container with Sun Java 6 and Crashplan ProE server. After upgrading Proxmox from version 1.8 to 1.9 I encountered some update issues with Java which after fixing them Crashplan ProE did not work. No errors were found in the application logs but I noticed that the application was not listening on the required ports.

After increasing the number of CPU's from 1 to 2 the app was able to run with no problems!! Thank you Pezi for the workaround!!

Hopefully this problem will be fixed in the next kernel update! Thanks everyone for your posts!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!