1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Best strategy to handle strange JVM errors inside VPS

Discussion in 'Proxmox VE 1.x: Installation and configuration' started by pezi, Sep 10, 2011.

  1. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    I am playing arround with PROXMOX /openvz to migrate a couple of VMware instances. Most applications are Java based, which can easy be migrate to OpenVZ. For each customer/project an own VPS.

    Now I try to to migrate Alfresco an Open Source Document Mangement System. The actual problem - the JVM which serves the Tomcat Container dies after an hour. Tested with 3 different VMs - SUN JDK two different versions and OpenJDK - same behaviour. Search in the Alfresco forum there is no hint according this problem. Seems to be an JVM/OpenVZ problem. An other VPS with Open-Xchanage (Java based Groupware) works fine.

    Enviroment:
    pve-manager: 1.8-23 (pve-manager/1.8/6533)
    running kernel: 2.6.32-6-pve
    proxmox-ve-2.6.32: 1.8-42
    pve-kernel-2.6.32-6-pve: 2.6.32-42
    qemu-server: 1.1-31
    pve-firmware: 1.0-13
    libpve-storage-perl: 1.0-19
    vncterm: 0.9-2
    vzctl: 3.0.28-1pve5
    vzdump: 1.2-15
    vzprocps: 2.0.11-2
    vzquota: 3.0.11-1
    pve-qemu-kvm: 0.15.0-1
    ksm-control-daemon: 1.0-6

    VPS settings
    105.jpg

    Question: It is allowed to use VZ templates from OpenVZ - or is there a restriction to use only VZ templates provides by PROXMOX ( tested, etc.).

    Any idea to handle such problems?

    With best reagards
    Peter

    JVM dump
    PHP:
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  Internal Error (synchronizer.cpp:1401), pid=672, tid=6302576
    #  guarantee(mid->header()->is_neutral()) failed: invariant
    #
    # JRE version: 6.0_22-b22
    # Java VM: OpenJDK Client VM (20.0-b11 mixed mode, sharing linux-x86 )
    # Derivative: IcedTea6 1.10.2
    # Distribution: Ubuntu 11.04, package 6b22-1.10.2-0ubuntu1~11.04.1
    # If you would like to submit a bug report, please include
    # instructions how to reproduce the bug and visit:
    #   https://bugs.launchpad.net/ubuntu/+source/openjdk-6/
    #

    ---------------  T H R E A D  ---------------

    Current thread (0x08873800):  VMThread [stack0x00582000,0x00603000] [id=675]

     

    Attached Files:

    • 105.jpg
      105.jpg
      File size:
      162.7 KB
      Views:
      30
  2. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    10,979
    Likes Received:
    47
  3. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    10,979
    Likes Received:
    47
    you can use whatever template you want but the Debian 6 template are very preferred here and can also be created with dab)
     
  4. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    Thanks for the quick response - Yes, there are failcounts
    privvmpages 26283 562327 655360 667860 14
    There are some hints to this "problem" -I will try solve this problem with this hints. I will post my results
     
  5. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    10,979
    Likes Received:
    47
    The java vm does have problems inside a openVZ container with the calculation of available memory. this issues is known and discussed several time when you run Zimbra on OpenVZ - just search for this.
     
  6. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    My newest results - the problem with the fail counts for the property privvmpages was fixed by increasing the memory! But this step doesn't fix the main problem: The page fault of the VM.
    I played arround with the GC parameter of the VM, I created a debian template (instead of Ubuntu) with Alfresco - no chance - the JVM dies after a while.

    As a last test I moved the this template to the test node of the cluster - surprise, there is no problem with the JVM, Alfresco runs since two days!

    Master node:
    model name : Intel(R) Core(TM) i3-2100T CPU @ 2.50GHz
    cpu MHz : 2499.544
    16 GB Ram

    Second node:
    vendor_id : AuthenticAMD
    model name : AMD Athlon(tm)64 X2 Dual Core Processor 4200+
    4 GB Ram

    Both nodes are has beend updated to Proxmox 1.9. Very strange problem! There is an other VZ template on the master node with an open source java stack - Open Xchange since three week. No problems.
     
  7. cadiolis

    cadiolis New Member

    Joined:
    Feb 11, 2010
    Messages:
    16
    Likes Received:
    0
    I am having a similar problem. I just updated two Proxmox installs from 1.8 to 1.9. Now various Java VEs are having strange problems similar to those you report above. One install is our custom Java webapp and it just seems to freeze after awhile. No errors, no high cpu, no nothing. Just quits working. We had a Hudson build server (a Java app) that just won't run after the 1.9 upgrade. It starts fine with no errors but then same thing, just seems to freeze. Every once in awhile it will segfault as well.

    Do you have any additional thoughts on what is going on here?
     
  8. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    I gave up to get Alfreso running on the master node - using Proxmox 1.8 at the start ot this thread, now Proxmox 1.9 latest version including PVE test.

    I tried various JVM paramters - but on the master node Alfreso dies after period. Most JVM Dump messages were internal memory management related.

    On the other hand - on the old test PC (test node) the JVM (Alfresco) runs. I think this is a problem related to a timing problem: Just in time compiler (JVM - different results for different CPUs) in combination with new hardware (Intel(R) Core(TM) i3-2100T CPU) and OpenVZ
     
  9. cadiolis

    cadiolis New Member

    Joined:
    Feb 11, 2010
    Messages:
    16
    Likes Received:
    0
    Ughh... this is incredibly frustrating. Java apps seem to be running fine but then just stop responding.

    I guess I'll try rolling back to 1.8
     
  10. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    14,143
    Likes Received:
    66
    The new kernel forces cpu limits as set in the vm configuration. So maybe it helps if you asssign more cpu power.
     
  11. iti-asi

    iti-asi Member

    Joined:
    Jul 14, 2009
    Messages:
    52
    Likes Received:
    0
    We've got exactly the same problem: after upgrading from 1.8 to 1.9 all ours virtual machines with applications using JVM (jboss, tomcat, nuxeo) stop to work.

    We did those upgrade (proxmox 1.8 to 1.9):

    Kernel 2.6.18 -> kernel 2.6.32-6
    Kernel 2.6.24 -> kernel 2.6.32-6
    Kernel 2.6.32-4 -> kernel 2.6.32-6

    The virtual machines (with JVM) working fine before the upgrade, and after upgrading to 2.6.32-6, they stop to work (the jvm crashing or stopping to respond).
    After rebooting the hosts on their initial kernels (2.6.18, 2.6.24, 2.6.32-4), everything works fine again.
    As well, we've migrated a virtual machine with this trouble in 2.6.32-6 to a cluster still under proxmox 1.8 with kernel 2.6.32-4 and it works fine.
     
  12. cadiolis

    cadiolis New Member

    Joined:
    Feb 11, 2010
    Messages:
    16
    Likes Received:
    0
    Good to know we have a real bug here. Dietmar, I assume you mean to bump up the 'CPUs' option on the VE web config. I will try this but then I will need to downgrade to 1.8 (or boot into the old kernel) as I need these machines operational asap
     
  13. ChristOff

    ChristOff New Member

    Joined:
    Sep 8, 2011
    Messages:
    7
    Likes Received:
    0
    +1 for me.

    I use Zimbra in a Lucid OpenVZ container (configuration based on ve-vswap-1024m.conf-sample in /etc/vz/conf so most parameters but PHYSPAGES, SWAPPAGES, KMEMSIZE and LOCKEDPAGES are on unlimited), all failcnt are on 0 but after "some time" (5 minutes, 6 hours, 15 hours), Zimbra stops responding with no error message at all (checked all the logs in /var/log, on the host and in the container - also in /opt/zimbra/log here). SSH connexion are still possible when this occurs, JVM/Zimbra simply stop answering. Only way to get it back is to reboot the container.

    Yesterday evening it goes worse: the whole host was unanswering (ping ok, but no https, no ssh, and no access to any container), after a reboot of the host I cannot find anything in the logs neither ("grep -Ri error /var/log" displays nothing interesting, cron jobs have run past the point where all services were unavailable but were unable to communicate with outside world)

    I'll try downgrading the kernel to 2.6.32-4 and see if it helps.

    Host: Core i5 i2400, 16GB RAM

    lspci:
    lspci
    00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
    00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
    00:16.0 Communication controller: Intel Corporation 6 Series/C200 Series Chipset Family MEI Controller #1 (rev 04)
    00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05)
    00:1a.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #2 (rev 05)
    00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset Family High Definition Audio Controller (rev 05)
    00:1c.0 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 1 (rev b5)
    00:1c.3 PCI bridge: Intel Corporation 6 Series/C200 Series Chipset Family PCI Express Root Port 4 (rev b5)
    00:1d.0 USB Controller: Intel Corporation 6 Series/C200 Series Chipset Family USB Enhanced Host Controller #1 (rev 05)
    00:1f.0 ISA bridge: Intel Corporation H67 Express Chipset Family LPC Controller (rev 05)
    00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller (rev 05)
    00:1f.3 SMBus: Intel Corporation 6 Series/C200 Series Chipset Family SMBus Controller (rev 05)
    01:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 10)
    03:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)

    pveversion -v:
    running kernel: 2.6.32-6-pve
    proxmox-ve-2.6.32: 1.9-43
    pve-kernel-2.6.32-4-pve: 2.6.32-33
    pve-kernel-2.6.32-6-pve: 2.6.32-43
    qemu-server: 1.1-32
    pve-firmware: 1.0-13
    libpve-storage-perl: 1.0-19
    vncterm: 0.9-2
    vzctl: 3.0.28-1pve5
    vzdump: 1.2-15
    vzprocps: 2.0.11-2
    vzquota: 3.0.11-1dso1
    pve-qemu-kvm: 0.15.0-1
    ksm-control-daemon: 1.0-6
     
  14. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    I made a kernel downgrade to
    pve-kernel-2.6.32-4-pve: 2.6.32-33

    Alfresco runs now since 6 hours without crash. :p. I will monitor this app tov verify that the old kernel fix the Jav/OpenVZ problem!
     
  15. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    Hi!

    Switching to the prior kernel, "fixes" the JVM crash problem!

    During my tests with Alfresco we discovered 3 types of JVM/application missbehaviour.
    - JVM crash - problem Nr. 1
    - JVM runs with 100% CPU, but the appilcation can handle HTTP request
    - JVM seems to be still alive - but the application doesn't response

    You wrote
    http://forum.proxmox.com/threads/7023-Proxmox-VE-1.9-released!?p=40248#post40248
    So I belive to find the exact problem will be difficult.

    Can we do anything for you to fix this problem. Testing, etc.?

    with best
    regards
    Peter
     
  16. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    14,143
    Likes Received:
    66
    It would be great if you find an easy why to reproduce that bug. You can also report the bug on the openvz forum - maybe someone there has an idea.
     
  17. dik23

    dik23 Member

    Joined:
    Apr 3, 2011
    Messages:
    53
    Likes Received:
    0
    Can I confirm that this is an issue solely with the new kernel ? Is the rest of the 1.9 update safe for use with jvm ?
     
  18. cadiolis

    cadiolis New Member

    Joined:
    Feb 11, 2010
    Messages:
    16
    Likes Received:
    0
    I downgraded two machines to 1.8 (had some other errors trying to boot into original kernel) and everything is working again.

    I think this will be a difficult bug to track down. To reproduce it you could probably do what I did when trying to rebuild my build server. I just created a new Debian VE, installed Java, downloaded Jenkins (or Hudson) and ran it with 'java -jar jenkins.war'
     
  19. pezi

    pezi New Member

    Joined:
    Aug 2, 2011
    Messages:
    25
    Likes Received:
    0
    I wil try to find a test case for a easy reproducible JVM fail.

    For posting on the openvz forum. Which relationship exists between the pve kernel and the offical openvz-kernel
    http://download.openvz.org/kernel/branches/rhel6-2.6.32/current/
    pve kernel = openvz-kernel +some modifications e.g. newer driver?
     
  20. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    10,979
    Likes Received:
    47
    the latest 2.6.32-6 kernel is based on the stable OpenVZ branch (RHEL6) but with some small modifications, and a bunch of newer drivers for NICĀ“s and raid controllers.
     

Share This Page