I don't know about others, but what i did to resolve this problem was CPU downgrade from Xeon E5-2620 v2 to Xeon E5 2620.
Since then, we don't had this error anymore. I would like to know, if this problems exists with latest Intel Xeon v3 CPUs ?
I can confirm this, it still happens with exactly the same synoptic that e100 described.
I have 2 sites running dell r730xd servers with 2 x E5-2630 v3 processors and this issue still manifests on high loaded VMs. Numa is enabled, drive and network is set to Virtio and SCSI controller type to default LSI.
We are talking about, 3.4-11 version with 3.10.0-13-pve kernel installed. There is no pattern on this problem, but from what I can tell, on one site I have all VMs (dedicated not OpenVZ) running debian 7.9 updated (local SSD storage via HW RAID controller) and on second side centos 6.7 with 2.6.32-573.8.1.el6.x86_64 running (running via NFS -- tested via ISCSI) on a dedicated 10G network to central storage solution. So this fuss on local/remote storage place is pointless now.
An interesting point, which I see that nobody replied is
nanonettr post which I will give the change a try.
The issue and what I have tested is described in more details here: (problem #2):
http://forum.proxmox.com/threads/24277-VM-high-vCPU-usage-issues
On the other hand, I come with an additional information which I have tested on both setups
- when the VM is in lock state and it prints on the console the kernel hung task timeout messages, adding another disk (doesn't matter the site or storage type) over GUI of proxmox automatically pulls out the locked CPU thread wait IO time from 100% to 0 and everything comes back to normal.
So 2 different setups, different network and storage designs, different KVM and kernel VM guests !
Therefore adding other disk, just to remove after the VM calms down to it (no format or other operations needed on the drive), does somehow a VM's disk/configuration refresh in qemu that snaps the VM out of the locking state.
I tested to see if this is a general add/remove component to the subject VMs by mounting/unmounting an iso image, adding/removing a network card, but it only reacts to add/remove hdd.
Any other clues ?