KVMs stopped by themselves...

baskin

New Member
Nov 19, 2013
9
0
1
Hello to the community.

Although it is my first post, I'm using Proxmox for some time now (and trying also to persuade my supervisors for subscription, because I believe in paying the good work). My issues until now were trivial but today I faced a very strange issue that happened two times.

I'm running a cluster of 5 machines (not HA, just for management, central backups and offline migrations) where on a node I have 3 Ubuntu 12.04 KVMs (with 3.8.0 kernel). Two of them had stopped working (not at the same time) without any reason or any trace in the logs (at least where I looked).

When one of the three was stopped the others continue to work as expected and also the stopped one started without issues. After some hours another one KVM (not the same as the first time) stopped and had to start it.

So I'm looking for any hints on how to debug this.

The KVMs are running on local fs and my versions are:

Code:
# pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-24-pve: 2.6.32-111
pve-kernel-2.6.32-25-pve: 2.6.32-113
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

Thanks in advance for any hints and if more information is needed, I will try to provide it.

Keep up the good work.
 
Haven't found anything until now and after it happened two times (on different VMs) it hasn't re-appeared again (at least until now).
 
baskin, do you backup these VMs?

Yes. I thought about that. The first VM stop happened at a time when backup was not running, the second hitted the backup time. So I couldn't make any conclusion from that, but any hint will be helpfull, even if it has to do with backups.
 
Yes. I thought about that. The first VM stop happened at a time when backup was not running, the second hitted the backup time. So I couldn't make any conclusion from that, but any hint will be helpfull, even if it has to do with backups.

My VMs hangs during "shutdown backup" if they no disks to backup but not excluded from backup task.
 
I don't fit in that scenario, especially with the first occurrence of the issue and also I'm using snapshots, but thanks for the info, I would have it in mind.
 
Yes they are not HA. Those are on standalone servers, "clustered" with openvpn connection just for central management.

I was monitoring the server at the exact time of the last incident. No errors on the logs inside the KVM or on the host node (at least on the logs I have searched). It will be very helpful if you can point me to where I can search for any trace.

Thanks.
 
Yes they are not HA. Those are on standalone servers, "clustered" with openvpn connection just for central management.

I was monitoring the server at the exact time of the last incident. No errors on the logs inside the KVM or on the host node (at least on the logs I have searched). It will be very helpful if you can point me to where I can search for any trace.

Thanks.

Are all the guests which have this issue running the same kernel? How active are the guests? Is there any type of load when this takes place?

I wonder if its acpi related at all.

I would start with the obvious logs.

/var/log/syslog
/var/log/kern.log
 
Yes, I thought that also (about those logs). Pitifully, those were complete clean. The only trace is the virtual network interface of the KVM stopping like on a normal shutdown or manual stop of the machine.

The VMs are all Ubuntu 12.04 LTS with kernel 3.8.0-33 (the last incident made the one pickup the 3.8.0-34 but when it happened it had the same with the other two). Load on the VMs exists but it is not something extreme. I can say that they are on a normal load. Main services are LAMP stack.
 
Could you provide a bit more on your setup.

- Hardware
- Disk IO storage
- Amount of memory in host and guest
- Guest disk format (raw/qcow?)
 
Hardware is a Hewlett Packard DL360e G8 with 2x Intel Xeon E5-2420 and 32x Gigabyte RAM.
Storage is local fs on the server, a RAID1 array with 2x 2000 GB SATA 3,5" 7.200 rpm on HP SmartArrayP410.

Guests machines (3 Ubuntu 12.04 LTS KVMs) are with raw disks with 4-11GB dynamic memory assignment (KSM is active and working).

The host node does not pass over 22GB of memory usage for all the time this setup is working.
 
Hardware is a Hewlett Packard DL360e G8 with 2x Intel Xeon E5-2420 and 32x Gigabyte RAM.
Storage is local fs on the server, a RAID1 array with 2x 2000 GB SATA 3,5" 7.200 rpm on HP SmartArrayP410.

Guests machines (3 Ubuntu 12.04 LTS KVMs) are with raw disks with 4-11GB dynamic memory assignment (KSM is active and working).

The host node does not pass over 22GB of memory usage for all the time this setup is working.

How do you know it hasn't went over 22GB, are you monitoring this with snmp or something? If you are just going by the gui, I don't think that is the best bet.

I honestly would try setting all 3 VM's to a static amount of ram as a first step.

This looks similar to your issue.

https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/1022901
 
Last edited:
I'm using zabbix to monitor host node and VMs, so I'm checking that from there.

Static amount should disable KMS or not?

Thanks for the link I will check it.
 
I'm using zabbix to monitor host node and VMs, so I'm checking that from there.

Static amount should disable KMS or not?

Thanks for the link I will check it.

I wonder if the host won't show the consumed ram because the VM wasn't able to get it, and it dies. Similar to what they are saying in the bug report. I could be way off hear, just a shot in the dark.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!