CPU stuck

cyberbootje

Member
Nov 29, 2009
77
0
6
Hi

I am getting a cpu stuck for 110 etc.. seconds on a kvm debian machine.
On the node are 4 debian machines and all of them except one showed a cpu stock for xxx seconds on the console.

The master has 2 kvm machines, also debian but did not show a warning yet.

Master:
1 x Core2Duo 24
Kernel: 2.6.24-8-pve

Node:
2 x Intel Nehalem Xeon 5520 Quad Core 2.26Ghz
Kernel: 2.6.24-9-pve

The node is recently installed with a apt-get update dist-upgrade from the beginning.
I also have snmpd on the node and that stops completely, have todo a snmpd restart.

Any idea's?
 
Hi

I am getting a cpu stuck for 110 etc.. seconds on a kvm debian machine.
On the node are 4 debian machines and all of them except one showed a cpu stock for xxx seconds on the console.

The master has 2 kvm machines, also debian but did not show a warning yet.

Master:
1 x Core2Duo 24
Kernel: 2.6.24-8-pve

Node:
2 x Intel Nehalem Xeon 5520 Quad Core 2.26Ghz
Kernel: 2.6.24-9-pve

The node is recently installed with a apt-get update dist-upgrade from the beginning.
I also have snmpd on the node and that stops completely, have todo a snmpd restart.

Any idea's?

its always a good idea to use the latest version to start finding issues.

take a look here: http://pve.proxmox.com/wiki/Proxmox_VE_Kernel
and decide which kernel is right for you and then upgrade.
 
I did take a look but my problem is, is is safe to do that on a production machine?
And if it brakes, can i revert?

its always a good idea to have a test environment to get familiar with the update process. e.g. you can install a Proxmox VE inside Proxmox VE or inside a Desktop virtualization product (in the case you have no extra server) to play with the update process.

if you run a standard system installed from ISO and no custom settings it should go painless but there is always no guarantee in life ...

BTW, we release tomorrow and new 2.6.18 kernel, maybe you wait for this.
 
its always a good idea to have a test environment to get familiar with the update process. e.g. you can install a Proxmox VE inside Proxmox VE or inside a Desktop virtualization product (in the case you have no extra server) to play with the update process.

if you run a standard system installed from ISO and no custom settings it should go painless but there is always no guarantee in life ...

BTW, we release tomorrow and new 2.6.18 kernel, maybe you wait for this.

I think proxmox-ve-2.6.32 is OK voor me but, the master has the other(older) kernel.
Is that a problem?
 
Hi,

I 've got the same problem here on pve 1.5. The error was: cpu stuck for 10 seconds displayed on console of VM.
In the the pve host, in the message log, I've got error message relative to the CD drive.
Jan 17 06:17:43 dvmh003 kernel: ata1.00: qc timeout (cmd 0xa0)
Jan 17 06:17:43 dvmh003 kernel: cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Jan 17 06:17:43 dvmh003 kernel: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
Jan 17 06:17:48 dvmh003 kernel: ata1: port is slow to respond, please be patient (Status 0xd0)
Jan 17 06:17:53 dvmh003 kernel: ata1: device not ready (errno=-16), forcing hardreset
Jan 17 06:17:53 dvmh003 kernel: ata1: soft resetting link
Jan 17 06:17:54 dvmh003 kernel: ata1.00: configured for UDMA/25
Jan 17 06:17:54 dvmh003 kernel: ata1: EH complete

Some of the VM had a CD connected to the physical CD drive. I've search around and somebody else did have the same issue:
Some CD drives with a poor firmware could lock the linux kernel when this one pools the CD drive for newly inserted CD. (I can't find the website from where I've found this)

I've virtually disconnected every CD drives from my VMs and the problem has gone. (then I unload the cdrom module from my hypervisor just to be sure).

Hope it can help you.
 
By upgrading to 1.5 AND changing kernel to 2.6.32 now the problem is gone for about 2 weeks.
So for all others who have the problem, that was the fix for me!

P.S: KSM works great!
 
I installed Proxmox 1.5 2.6.32-1 three weeks ago. Starting to experience CPU stuck problems right away. Yesterday, when I woke up, one of the server disk was at read-only. Reason? Millions of CPU stuck in the syslog. And I lost all data on that disk. 450GB of data gone after I had run fsck for an half hour.

Have had problems that virtio-net crashing at high load and so on. I have posted this already on the forum.

Proxmox is bye bye for me. It was nice GUI but the stability of this product is not good.
 
Last edited:
Hi,

Do you have any unusual error message in /var/log/messages on the hypervisor host ?
I'm using proxmox for about 2 years, and I did not experience any huge problems.

Hope we can still help you.
 
Maby the problem is that the disk or controller isn't functioning correct.

Correct me if i am wrong, but a read only disk often occurs if the disk is about to die.
 
Hi,

Do you have any unusual error message in /var/log/messages on the hypervisor host ?
I'm using proxmox for about 2 years, and I did not experience any huge problems.

Hope we can still help you.
It's like being in a restaurant and the food ain't so fantastic. They always ask if it was good or not and you just say, yes it was okay. Then you never go back to that restaurant.
Proxmox have been replaced. No need to troubleshoot this issue..

Maby the problem is that the disk or controller isn't functioning correct.
Correct me if i am wrong, but a read only disk often occurs if the disk is about to die.
We have the same problem at work. Tons of CPU stuck errors on both of our virtualization machines when the load is high. Luckily, no data has been lost at work yet. Currently planning the migration to something else.