New windows VM keeps dying

Cecil

Well-Known Member
Sep 22, 2017
57
2
48
45
I just did a brand new clean windows server 2016 vm (cloned from a template)

It runs great but then the next day the console just shows "Start boot option" bios screen and the vm is no longer working but still shows running status.

I cannot stop/reset/shutdown it, just get errors after a very long time like:
trying to acquire lock...TASK ERROR: can't lock file '/var/lock/qemu-server/lock-1025.conf' - got timeout
TASK ERROR: VM quit/powerdown failed

Every time i have to go into /var/lock/wemu-server/ and delete the lock file before I can start it again.

When I look at the lock file I see:
root@pve:/var/lock/qemu-server# lsof lock-1025.conf
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
task\x20U 111255 root 6wW REG 0,25 0 15 lock-1025.conf

Anyone have some ideas of how I can diagnose what is going wrong?
 
I have some more info, it turns out that windows is crashing with:
The bugcheck was: 0x00000109 (0xa39fdadeac097720, 0xb3b6e764fe8a8a51, 0x0000034000000000, 0x0000000000000017)

I googled that and found:

This problem occurs because the system detects a Critical MSR modification, and then it crashes.

specifically for VM's they recommend:
This is a known issue that affects ESXi 5.0.x. For more information, contact VMWare.
To work around this issue, manually create a CPUID mask for the affected virtual machines.
(see: https://support.microsoft.com/en-us...-structure-corruption-on-a-vmware-virtual-mac)
is there a proxmox equivalent or something else that I might have set wrong?

I have the config execatly the same as my other server 2016 except that one has 2 sockets and 6 cores each where this one is 1 socket and 4 cores.
I turned on numa for both vm's but maybe I should disable it for a single socket?
 
Ok.. more updates haha, So there is already a thread : https://forum.proxmox.com/threads/blue-screen-with-5-1.37664/page-10

And seems latest kernel 4.13.8-26 fixes this?
What is weird is that I have 4 server 2016 VM's(only 1 is production sofar the others are test) and the first one runs perfect with no BSOD or crashes the 4th one that I'm trying to do testing with is crashing.

I guess I'll try and change my sources to pve-no-subscription and see if I can get the kernel updated and test again. (hopefully it doesn't break things!)

Using pve test respo I updated and had 3 server 2016 VM's running fine all night long :)
 
Last edited: