As said, I returned with more info on the topic in order to shed some light on the gathered research I've manage to do so far.
Considering my last post, I have focused on the way the internal disk scheduler is set, from default values towards changing it to deadline on all VMs. This has improved the stability quite a bit, but was not enough to stop this bug from manifesting.
So, since the last post, I was constantly checking the status of the VMs via my NMS and their resource utilisation. What I could observe is that at the time the VM gets stuck, on a low loaded VM, the memory buffers and cached values start to rise pretty solid (even if the lock is cleared afterwards via the earlier add/del disk described method). While the vCPUs are in a "lock state" the host context switches, system interrupts and load average go sky rocket on the graphs and the system I/O Activity freezes completely.
These started to seem more like a memory leak, therefore the first point to start was to check what is new/different in v3 processors in comparision with older versions.
The following link provided a starting point:
https://software.intel.com/en-us/bl...ies-on-the-latest-intel-xeon-are-you-ready-to
From all the described technologies on the site, the VMCS Shadowing provided the mostly kernel errors pages on current (in use) kernel branches. Therefore a further lookup over, reveals the "
kvm_vm_ioctl" KVM kernel functions to be the central point of all sort of misbehaviours.
Below I have added a few useful links I could find related to this:
https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.47 - search for
commit 264f8746aa6ebf1a62588c653a5e3c4891f69fee
http://www.gossamer-threads.com/lists/linux/kernel/2207193
http://stackoverflow.com/questions/33192729/vmwrite-error-when-updating-vmcs-from-kvm-vm-ioctl
https://bugzilla.kernel.org/show_bug.cgi?id=93251 - affecting 3.19 branch
So as I understand this (I'm not a kernel developer) the "leak" comes from the following logic:
Running a virtual machine -> allocates the corresponding configured vCPUs to the KVM process (to vCPU & ioctl) as well as setting io scheduling in kvm instance in qemu-kvm. The vCPUs are tighten to the physical CPUs & ram memory that binds on a KVM specific instruction set to open ioctl system calls. These should try to create a set of file descriptors to the current process involving the disk access role.
The lock'up only occurs when, the kvm_vm_ioctl tries to free-up some memory resources previously allocated.
What is mostly interesting is that there is no version of 3.16.x to test with in pve repository, and this bug was supposedly fixed in 3.10 and we don't know the correlation that it is between 3.10.47 and the pve-kernel-3.10.0-13-pve revision and if it includes the fix, but might explain why
robhost is running stable on 3.16.x branch from backports and that supposedly got fixed in the 4.1.x kernel branch.
Regarding Spirit's CPU version in comparison to mine, it is well known the fact that each hardware CPU branch version/revision (mine entry-to-middle, his high end) has a major architecture, thus minor changes between different high-to-low gamma, whereas to the cpu microcode support included in each bios update by all hardware/mobo vendors.
Currently, I'm under stability testing with 3.16.x kernel from backports, forcing me to drop the 3.10.x pve stable kernel release (maybe until a 3.16 pve might raise - although I don't believe it so, since 3.10 is dead next year and the progress on 4.x branch is way to far ongoing on the Proxmox 4 versions for somebody to reconsider bug fixing on a dead-end kernel version/product).
Hopefully my logic and explanations are close to right and this will help others in the future.