VPS is randomly crashing without "warning" before

SPQRInc

Member
Jul 27, 2015
57
1
6
Hi there,

I'm using Proxmox to get some VPS which are running Debian 7.8 with LAMP.

Now for any reason, I don't get why, these boxes are freezing randomly. If they do I can not connect via SSH. All services are down.

If I connect via VNC I can see, that there is an error message like `kernel hung - task XYZ blocked for more than 120 seconds`.

The specific tasks are MySQL, qmgr, log stash-forwarder, apache, ... .


The problem is: The servers are fine. The monitoring shows that the servers are idling around all day. And randomly the are not reachable until I reboot them.

I do not even get the reason, because the system stops logging in this moment. The monitoring does not get the reason (high load) because the connection via ssh fails.

This is the information about my proxmox-setup:


Code:
[COLOR=#000000][FONT=monospace]proxmox-ve-2.6.32: 3.4-157 (running kernel: 3.2.0-4-amd64)pve-manager: 3.4-6 (running version: 3.4-6/102d4547)pve-kernel-2.6.32-39-pve: 2.6.32-157pve-kernel-2.6.32-37-pve: 2.6.32-150pve-kernel-2.6.32-26-pve: 2.6.32-114lvm2: 2.02.98-pve4clvm: 2.02.98-pve4corosync-pve: 1.4.7-1openais-pve: 1.1.4-3libqb0: 0.11.1-2redhat-cluster-pve: 3.2.0-2resource-agents-pve: 3.9.2-4fence-agents-pve: 4.0.10-2pve-cluster: 3.0-18qemu-server: 3.4-6pve-firmware: 1.1-4libpve-common-perl: 3.0-24libpve-access-control: 3.0-16libpve-storage-perl: 3.0-33pve-libspice-server1: 0.12.4-3vncterm: 1.1-8vzctl: 4.0-1pve6vzprocps: 2.0.11-2vzquota: 3.1-2pve-qemu-kvm: 2.2-10ksm-control-daemon: 1.1-1glusterfs-client: 3.5.2-1[/FONT][/COLOR]

Any idea what I could do here?
 
kernel hung - task XYZ blocked for more than 120 seconds`:

based on my experience, this couldd maybe that your machines are starving for I/O at some time, or heavily swapping
( cpu is not the only ressource needed for a process to run )

I remembered I had this every day at 6:00 am in my xen cluster, when 12 VMS where to starting to run the daily cron jobs on the same physical host.
I solved this by adding a random delta time before starting the daily cron jobs

to debug this pb you can try:
* at what time hapeen the pb ?
* test the storage of your VM when the hang is happening with bonnie++:

see http://www.jamescoyle.net/how-to/599-benchmark-disk-io-with-dd-and-bonnie for how to do that
 
Hi manu,

thanks for your answer. Well, but wouldn't be all boxes slow on this moment? All boxes are working fine, but only this one is stuck.

The last problems (with different VPS) were on:

- 2 am (friday)
- 6 am (sunday)
- 9 am (today - monday)
 
I would advise to start a terminal on the machine, keep it open, and the next time it is stuck, run the command iotop and free, to see what's going on.
Even better, install munin and you will get graphs about that.

Since only one box is affected, I am 99% sure you're problem has nothing to do with proxmox, but with the stuff you're running inside the VM.
 
Hi there,

well, it's not one box that is affected. It's one box at the same time. But all boxes have these problems on different time.
 
Hi again
OK. But we still some raw bonnie++ and free command output to go any further.
 
Okay, I'll run atop and free for some hours now and will have a look what happens next time it crashes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!