Proxmox 3.4-13: KVM host completly freezes every couple of days. Nothing on logs

itsjustme

New Member
Apr 6, 2016
8
0
1
41
Hello everyone,

hopefully someone here can shed some light on what may be going on with my only KVM VPS that i'm running. I mainly use OpenVZ on Proxmox but for this particular VPS, because it's running under CloudLinux, I had to use KVM instead. The VPS ran fine for almost a year but now it's freezing every couple of days for no apparent reason. I have no way of telling what is going on because nothing is registered on the log files. The VPS completely freezes, there's no CPU/IO activity, and if I go to the console it shows the login screen but input isn't working. I have to forcefully stop and start the VPS again. I've tried updating everything, including kernels, on both the Host and VPS but it is still happening.

Considering there are no log entries at all how can I diagnose this to try and fix the problem?

I'm running under a fully updated Proxmox 3.4-13, 2.6.32-45-pve

The VPS is under CloudLinux Server release 6.7, 2.6.32-673.8.1.lve1.4.3.el6.x86_64 using the following settings:
4GB Memory
2 virtio drives, format raw, no cache, no throttling
1 virtio network device in bridged mode, no rate limit

Any help would be greatly appreciated as this is happening on a production server.
 
Small update on this matter, I noticed that the host is logging these messages every now and then:

Apr 5 06:54:43 xx kernel: kvm: 3843: cpu0 unhandled rdmsr: 0xce
Apr 5 06:54:43 xx kernel: kvm: 3843: cpu1 unhandled rdmsr: 0xce

The timestamp however does not match the time when the KVM guest freezes so i'm not sure they are related, it is the only reference to KVM I can find in logs that seems relevant though.
 
Sorry for bumping this but I can't find any help anywhere and the problem seems to be getting worse. It happened twice in the same day just last week and once again today. And every time I can't find any single indication in the logs, or server load, or memory usage, that could indicate what the problem is.

Is there anything I can do to even diagnose what's happening?
 
Hey,

As the VPS ran fine for almost a year you would guess that "something" has changed. Any changes you can recall to the host or guest before the issue started?

Nothing in the logs of either host or guest is strange, and very unfortunate of course.

So, stabbing in the dark, perhaps try to change the guest's CPU type to host (if not already host). You may want to read more about it here: http://pve.proxmox.com/wiki/Allow_Guests_Access_to_Host_CPU

Out of interest, how many vCPU for the guest and have you installed Qemu GuestAgent?
 
It currently has 1 socket and 2 cores assigned, in the default kvm64 type. I will definitely try the host type as soon as I have a chance to reboot the VM.

As for changes, it is a VM with cPanel installed so the cPanel packages are updated automatically, but the kernel was never updated, I updated it only after this started happening in hopes that the problem would be fixed. Other then that the only thing I can remember was adding another another Hard Disk in virtio (same as the first disk)

Oh and no we don't have the agent installed.
 
No real change to speak of then. Very challenging without any logs.

As mentioned, it is really a stab in the dark. When you do make the change please note the following remark on the wiki: Stop then start the VM (a reboot is not enough to initiate the change).
 
Ok, I was able to reboot the VM today, I changed the CPU type, I went ahead and changed the NIC and HDD from VIRTIO to Intel and SATA, you never know...

I'll let you know how it goes, so far so good.
 
Hello everyone,

hopefully someone here can shed some light on what may be going on with my only KVM VPS that i'm running. I mainly use OpenVZ on Proxmox but for this particular VPS, because it's running under CloudLinux, I had to use KVM instead. The VPS ran fine for almost a year but now it's freezing every couple of days for no apparent reason. I have no way of telling what is going on because nothing is registered on the log files. The VPS completely freezes, there's no CPU/IO activity, and if I go to the console it shows the login screen but input isn't working. I have to forcefully stop and start the VPS again. I've tried updating everything, including kernels, on both the Host and VPS but it is still happening.

Considering there are no log entries at all how can I diagnose this to try and fix the problem?

I'm running under a fully updated Proxmox 3.4-13, 2.6.32-45-pve

The VPS is under CloudLinux Server release 6.7, 2.6.32-673.8.1.lve1.4.3.el6.x86_64 using the following settings:
4GB Memory
2 virtio drives, format raw, no cache, no throttling
1 virtio network device in bridged mode, no rate limit

Any help would be greatly appreciated as this is happening on a production server.
Hi,
it's freezed the OpenVZ-CT only? The node is running fine?

For node-freeze without logs it's often parts like memory / psu / cpu (thermal things).

On which storage the CT is running? local/nfs?

Udo
 
No, the OpenVZ VMs never stopped working, only the KVM one stopped. The Node never had any problems either.

I'm using local storage.
 
No, the OpenVZ VMs never stopped working, only the KVM one stopped. The Node never had any problems either.

I'm using local storage.
Hi,
and which OS is running inside the kvm-VM? Some distros has special kernel for running inside an VM (like Ubuntu).
I had an ubuntu server which also hang after some days... after switching the kernel the VM was stable for years.

Udo
 
I have the same problem.
vm (KVM) lost network.
Not use the kernel 2.6.32-45-pve !!!

VM is centos 6.x or windows 2008 R2

I will provide more info if I can.
 
Last edited:
I have the same problem.
vm (KVM) lost network.
Not use the kernel 2.6.32-45-pve !!!

VM is centos 6.x or windows 2008 R2

I will provide more info if I can.
Hi,
this issue have not the 2.6.32-45 alone - I have see the same on many 2.6.32 pve-kernel (not often, but every some weeks on different VMs).

If this happens, an disconnect nic and connect again in the gui provide an working VM again...
This was one reason for us to switch to 3.10 on pve 3.x and with an part of the host to pve4.

Udo
 
Ok, I was hopeful that not using VirtIO and changing the CPU type to Host had fixed the problem because it took longer to happen... but it just happened again... And once more, there's is nothing at all on the log files.

Is there any sort of debug mode or something similar that I can use to try and get more information on the problem?
 
Sorry for bumping this thread but this keeps happening. I've now switched to Proxmox 4.2-2 (4.4.6-1-pve) but still it keeps happening. The most frustrating part is that there are no logs that can help me identify the problem. This only seems to be happening on CloudLinux servers.

Right now our uptime for our CloudLinux server running under KVM is this:
Zl0yUFo.png


That's how frequently the server reboots. I asked before but is there any sort of debug mode or something similar that I can use to try and get more information on the problem?
 
I noticed something weird also on our cloudlinux servers but it does not go down. But may help you if you can check it aswell.

Install iperf and check the network performance.

I noticed on centos, debian, ubuntu KVM servers on same node this is the performance we get:

[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 13.8 MBytes 115 Mbits/sec
[ 3] 1.0- 2.0 sec 10.0 MBytes 83.9 Mbits/sec
[ 3] 2.0- 3.0 sec 11.4 MBytes 95.4 Mbits/sec
[ 3] 3.0- 4.0 sec 11.2 MBytes 94.4 Mbits/sec
[ 3] 4.0- 5.0 sec 11.2 MBytes 94.4 Mbits/sec
[ 3] 5.0- 6.0 sec 11.2 MBytes 94.4 Mbits/sec
[ 3] 6.0- 7.0 sec 11.2 MBytes 94.4 Mbits/sec
[ 3] 7.0- 8.0 sec 11.4 MBytes 95.4 Mbits/sec
[ 3] 8.0- 9.0 sec 10.0 MBytes 83.9 Mbits/sec
[ 3] 9.0-10.0 sec 11.2 MBytes 94.4 Mbits/sec
[ 3] 0.0-10.0 sec 113 MBytes 94.5 Mbits/sec

But for some reason on cloudlinux KVM servers this is the performance we get:

[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 3.50 MBytes 29.4 Mbits/sec
[ 3] 1.0- 2.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 2.0- 3.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 3.0- 4.0 sec 3.38 MBytes 28.3 Mbits/sec
[ 3] 4.0- 5.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 5.0- 6.0 sec 3.38 MBytes 28.3 Mbits/sec
[ 3] 6.0- 7.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 7.0- 8.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 8.0- 9.0 sec 3.38 MBytes 28.3 Mbits/sec
[ 3] 9.0-10.0 sec 3.25 MBytes 27.3 Mbits/sec
[ 3] 0.0-10.1 sec 33.2 MBytes 27.7 Mbits/sec

So I think its a cloudlinux issue not proxmox as we run different versions of proxmox even some older ones and I tested it on random Nodes on the KVMs hoping I would be wrong.

Going to setup a fresh install of Cloudlinux and CentOS now aswell to be 100% I'm right.

Also note I tested it from one city to another city. We in South Africa. So tested it from our Cape Town Server test box to our Johannesburg Servers which is another city in a different data center.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!