CPU usage 100%. Virtual machine hangs

Miguel

Member
Nov 27, 2017
44
0
11
47
Hi,

I migrated two proxmox virtual machines from Proxmox 4.4 to 5.1. I took advantage and I migrated to ZFS. The old server had SATA disks while the new one has SSD disks.

I have only two VPS running in this server. Both running Centos 6.9, one plesk and the other one Cpanel.

The Plesk machine has been running without issues since the migration. It has only 8 Gb of RAM and two CPUs.

the Cpanel machine had in the old server 2 CPUs and since the migration I increased it to 4 CPUs. Today after two crashes I have decided to downgrade it to 2 CPUs just in case. RAM has been always 17 Gb.

Cpanel support doesn´t find the issue and since the other VPS is working as a charm I can´t blame Proxmox for this.

So my questions:

- I haven´t found any log in proxmox /var/log/messages reporting the high CPU usage (100% found this morning). Where should I look at?

- Is there any way I can restart the VPS when is hanged? I.e.: CPU 100% for let´s say X minutes, restart the VPS.

Thanks!

Miguel
 
When you have a high CPU usage, you can try the following. You should get a backtrace in dmesg.
Code:
echo 1 > /proc/sys/kernel/sysrq
echo l > /proc/sysrq-trigger
https://www.kernel.org/doc/html/v4.13/admin-guide/sysrq.html

Also check your PVE system utilization, maybe there is something else going too.
 
I have the high CPU usage in the virtual machine, not in the host machine, so I am not sure how this can help (Cpanel runs 2.4.x kernels)

The virtual machine is not reactive and doesn´t log anything, I have to reset it or stop it (shutdown doesn´t work) and start it again.

Where do I look for system utlization logs?

I was looking for at least an automatic way to restart this VM when hangs because of high CPU usage.
 
Cpanel runs 2.4.x kernels
Well, you run a very old kernel, there might be some kvm settings, which might help (eg. cpu type), but you need to test.

Where do I look for system utlization logs?
On PVE you have the graphs on the summary page of the VM, inside the VM a performance monitoring system needs to be installed to get some graphs over time.

I was looking for at least an automatic way to restart this VM when hangs because of high CPU usage.
As a simple hack, you can use a shell script, triggered by cron and reset the machine, when no ping, or url access.
 
You can utilize the qemu-guest-agent, the ping just tells if the guest-agent is running, not if the machine is responding. But with a combination of agent and ping or some wget, you might be able to get some reliable information.
 
I used the ping and at least today when the VM was hanging, it got restarted without manual intervention. Cpanel people is going to look into it but in the meantime someone in the Proxmox Facebook group said that It could be the use of scsi driver on ZFS instead of using virtio. I thought scsi was the recommended driver.

If I have to move to virtio, can I just change it or do I have to make any change on the CentOS guest OS?

Thanks!

Miguel
 
If I have to move to virtio, can I just change it or do I have to make any change on the CentOS guest OS?
The question is, IF, what version of virtio is in the 2.4 kernel. You may need to make some adjustments, as the boot devices changes.
 
But should I move to virtio? How do I know which version of virtio is included in 2.4 kernels?
 
I have had two crashes in the last week. The script pinging the machine reported the machine was down and had to restart it automatically. I don´t see now any high CPU usage or RAM. In fact there is nothing in the logs of the VM that give any idea of why the machine freezes.

Is there any log in the host proxmox machine that I should look at? Anything that I can enable?
 
The crash is happening inside the VM, there might be nothing to find on the host. But it is always good to go through the syslog/dmesg log and see if anything shows up on them.

For the VM settings, it can be anything related to the drivers, CPU type, RAM ballooning, networking, disks. You need to try what works best (after all, you can clone the VM). The major problem is though, that the Kernel (2.4) is very old and I assume that the best way to fix it, is to upgrade the VM.
 
This VM with this old kernel has been functioning for about three years without problems (by the way, CentOS and Redhat are using those old kernels)

The issue started to happen when I migrated from Proxmox 4.4 to 5.1 and adopted ZFS.

Yesterday I upgraded to the latest version of Proxmox (5.1-43, I was at 5.1.36). Let´s see what happens
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!