Blue screen with 5.1

cybermcm

Well-Known Member
Aug 20, 2017
101
11
58
Hi,

I'm running a small lab at home (old PC hardware, no special server hardware). Until yesterday I used version 5.0, today I did an inplace upgrade to version 5.1. Since version 5.1 my server 2016 VMs are getting blue screens (server 2016 core and 2016 with GUI). With 5.0 everything ran stable.
Blue screen is caused by ntoskrnl.exe bugcheck code 0x00000109.
The host itself seems to run stable;
Guest systems have virtio-win-0.1.141 drivers installed.

Any idea where to look for a solution?
 
Any idea where to look for a solution?

Can you try to boot the previous kernel in the GRUB boot menu? Should be an 4.10 while PVE 5.1 uses one in version 4.13
If that solves it there may be a regression in a kernel module, maybe KVM.

Did you had the 5.0 also updated or on an older state, e.g. the one from the ISO installer? Trying to rule things out here.
 
I did apply updates regularly, I was on 4.10 before the upgrade. It was stable with 4.10.
Is there probably a log file which helps to track down the issue?
 
Hmm, look into the journal (journalctl) or dmesg if you see anything resembling a kernel error or stack trace from around the time where the VM bluescreens.

Else, it could also be a bad coincidence and a memory or storage (hardware) problem...
 
I didn't find anything in the journalctl log, my dmesg is attached, I'm not really sure if there is a problem visible. Maybe you can take a look at it. If this doesn't help I'll revert back to the old kernel to see if the system is stable again
 

Attachments

  • dmesg.txt
    56.7 KB · Views: 15
Did memory testing today, seems fine. SMART values from the harddisks OK ->
I'm now trying the old kernel (Linux host04 4.10.17-3-pve #1 SMP PVE 4.10.17-23 (Tue, 19 Sep 2017 09:43:50 +0200) x86_64 GNU/Linux) again via advanced GRUB startup. Let's see if it is stable again
 
Update: Stable again with the old kernel (no blue screen during the night). Is there anything on my side I can do to track down the issue?
another question: how can I modify grub to start 4.10? currently 4.13 is starting which isn't useful at the moment.
I tried
GRUB_DEFAULT=saved
GRUB_SAVEDEFAULT=true
and update-grub but this doesn't work.
 
Last edited:
Hi , fresh clean install 5.1.
New Dell T330 hardware.
VM Windows 2012r2 restored from 5.0 backup.

Continuous BSOD. : CRITICAL_STRUCTURE_CORRUPTION

Tested other VM's working well with the 5.0 version.

proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90

Some clue?

Regards.
 
Same situation here on a Lab Installation. A Mix of Debian, Centos VM's and one Windows 10 VM. while the Linux VM's run stable, the Win 10 VM get regular CRITICAL_STRUCTURE_CORRUPTION Blue Screen Error (0x00000109) after a few hours in operation. Before running PVE 4 and upgraded to PVE 5. the blue screen started shortly after starting the VM.
The VM's run on a SSD, the PVE 5.1 on regular HDD.
The Windows VM was running on a older virtio-win driver. Upgrading to virtio-win-0.1.141 drivers did not helped.

Important detail:
the unstabile Situation Happens while Running the VM on a single SSD. After I moved the VM to a regular HHD within the PVE5.1 Node everthing was stable as expected.
 
my servers are using kvm64 as well
host cpu:
root@host04:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Model name: Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
Stepping: 7
CPU MHz: 2504.885
CPU max MHz: 2499.0000
CPU min MHz: 2003.0000
BogoMIPS: 5009.77
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 3072K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority dtherm
 
my servers are using kvm64 as well
host cpu:
root@host04:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Model name: Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
Stepping: 7
CPU MHz: 2504.885
CPU max MHz: 2499.0000
CPU min MHz: 2003.0000
BogoMIPS: 5009.77
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 3072K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority dtherm

interessting, with core2

I found this note on centos

"
Limited CPU support for Windows 10 and Windows Server 2016 guests

On a Red Hat Enterprise 6 host, Windows 10 and Windows Server 2016 guests can only be created when using the following CPU models:

* the Intel Xeon E series
* the Intel Xeon E7 family
* Intel Xeon v2, v3, and v4
* Opteron G2, G3, G4, G5, and G6

For these CPU models, also make sure to set the CPU model of the guest to match the CPU model detected by running the "virsh capabilities" command on the host. Using the application default or hypervisor default prevents the guests from booting properly.

To be able to use Windows 10 guests on Legacy Intel Core 2 processors (also known as Penryn) or Intel Xeon 55xx and 75xx processor families (also known as Nehalem), add the following flag to the Domain XML file, with either Penryn or Nehalem as MODELNAME:

<cpu mode='custom' match='exact'>
<model>MODELNAME</model>
<feature name='erms' policy='require'/>
</cpu>

Other CPU models are not supported, and both Windows 10 guests and Windows Server 2016 guests created on them are likely to become unresponsive during the boot process.
"

I need to dig a little bit more
 
I tried to run my VMs with CPUs in host mode but same result -> blue screen.
Back with kernel 4.10, no problems...
 
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Xeon(R) CPU E3-1225 v6 @ 3.30GHz
Stepping: 9
CPU MHz: 3300.000
CPU max MHz: 3700.0000
CPU min MHz: 800.0000
BogoMIPS: 6624.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp


Yesterday I did some tests, with "CPU HOST" no problems.

Last night I restored again the VM with "CPU KVM64" and we had BSOD at first boot.

I'm going to do the same but with "CPU HOST" and i'll post the results.

Thanks to all.
 
BSOD again CRITICAL_STRUCTURE_CORRUPTION. this time using "CPU HOST" .

CPU use on the VM is 100% and besides I got:

TASK ERROR: VM quit/powerdown failed - got timeout when I trying to shutdown the machine.

with "STOP" I got : trying to acquire lock...TASK ERROR: can't lock file '/var/lock/qemu-server/lock-101.conf' - got timeout

Then I restarted pve-cluster service and finally it stopped.

Perhaps I could try the old kernel, but, where I can download it from , coming from a clean fresh 5.1 PVE installation?

Thanks.
 
BSOD again CRITICAL_STRUCTURE_CORRUPTION. this time using "CPU HOST" .

CPU use on the VM is 100% and besides I got:

TASK ERROR: VM quit/powerdown failed - got timeout when I trying to shutdown the machine.

with "STOP" I got : trying to acquire lock...TASK ERROR: can't lock file '/var/lock/qemu-server/lock-101.conf' - got timeout

Then I restarted pve-cluster service and finally it stopped.

Perhaps I could try the old kernel, but, where I can download it from , coming from a clean fresh 5.1 PVE installation?

Thanks.

the pve-no-subscription repository also contains all the old packages:
http://download.proxmox.com/debian/dists/stretch/pve-no-subscription/binary-amd64/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!