Blue screen with 5.1

Discussion in 'Proxmox VE: Installation and configuration' started by cybermcm, Oct 24, 2017.

  1. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    Hi,

    I'm running a small lab at home (old PC hardware, no special server hardware). Until yesterday I used version 5.0, today I did an inplace upgrade to version 5.1. Since version 5.1 my server 2016 VMs are getting blue screens (server 2016 core and 2016 with GUI). With 5.0 everything ran stable.
    Blue screen is caused by ntoskrnl.exe bugcheck code 0x00000109.
    The host itself seems to run stable;
    Guest systems have virtio-win-0.1.141 drivers installed.

    Any idea where to look for a solution?
     
  2. t.lamprecht

    t.lamprecht Proxmox Staff Member
    Staff Member

    Joined:
    Jul 28, 2015
    Messages:
    1,245
    Likes Received:
    176
    Can you try to boot the previous kernel in the GRUB boot menu? Should be an 4.10 while PVE 5.1 uses one in version 4.13
    If that solves it there may be a regression in a kernel module, maybe KVM.

    Did you had the 5.0 also updated or on an older state, e.g. the one from the ISO installer? Trying to rule things out here.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    I did apply updates regularly, I was on 4.10 before the upgrade. It was stable with 4.10.
    Is there probably a log file which helps to track down the issue?
     
  4. t.lamprecht

    t.lamprecht Proxmox Staff Member
    Staff Member

    Joined:
    Jul 28, 2015
    Messages:
    1,245
    Likes Received:
    176
    Hmm, look into the journal (journalctl) or dmesg if you see anything resembling a kernel error or stack trace from around the time where the VM bluescreens.

    Else, it could also be a bad coincidence and a memory or storage (hardware) problem...
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    I didn't find anything in the journalctl log, my dmesg is attached, I'm not really sure if there is a problem visible. Maybe you can take a look at it. If this doesn't help I'll revert back to the old kernel to see if the system is stable again
     

    Attached Files:

  6. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    Did memory testing today, seems fine. SMART values from the harddisks OK ->
    I'm now trying the old kernel (Linux host04 4.10.17-3-pve #1 SMP PVE 4.10.17-23 (Tue, 19 Sep 2017 09:43:50 +0200) x86_64 GNU/Linux) again via advanced GRUB startup. Let's see if it is stable again
     
  7. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    Update: Stable again with the old kernel (no blue screen during the night). Is there anything on my side I can do to track down the issue?
    another question: how can I modify grub to start 4.10? currently 4.13 is starting which isn't useful at the moment.
    I tried
    GRUB_DEFAULT=saved
    GRUB_SAVEDEFAULT=true
    and update-grub but this doesn't work.
     
    #7 cybermcm, Oct 26, 2017
    Last edited: Oct 26, 2017
  8. Pascual

    Pascual New Member

    Joined:
    Oct 26, 2017
    Messages:
    9
    Likes Received:
    0
    Hi , fresh clean install 5.1.
    New Dell T330 hardware.
    VM Windows 2012r2 restored from 5.0 backup.

    Continuous BSOD. : CRITICAL_STRUCTURE_CORRUPTION

    Tested other VM's working well with the 5.0 version.

    proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
    pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
    pve-kernel-4.13.4-1-pve: 4.13.4-25
    libpve-http-server-perl: 2.0-6
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-15
    qemu-server: 5.0-17
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-20
    libpve-guest-common-perl: 2.0-13
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-16
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-2
    pve-docs: 5.1-12
    pve-qemu-kvm: 2.9.1-2
    pve-container: 2.0-17
    pve-firewall: 3.0-3
    pve-ha-manager: 2.0-3
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.0-2
    lxcfs: 2.0.7-pve4
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.2-pve1~bpo90

    Some clue?

    Regards.
     
  9. sumsum

    sumsum Member
    Proxmox Subscriber

    Joined:
    Oct 26, 2009
    Messages:
    157
    Likes Received:
    2
    Same situation here on a Lab Installation. A Mix of Debian, Centos VM's and one Windows 10 VM. while the Linux VM's run stable, the Win 10 VM get regular CRITICAL_STRUCTURE_CORRUPTION Blue Screen Error (0x00000109) after a few hours in operation. Before running PVE 4 and upgraded to PVE 5. the blue screen started shortly after starting the VM.
    The VM's run on a SSD, the PVE 5.1 on regular HDD.
    The Windows VM was running on a older virtio-win driver. Upgrading to virtio-win-0.1.141 drivers did not helped.

    Important detail:
    the unstabile Situation Happens while Running the VM on a single SSD. After I moved the VM to a regular HHD within the PVE5.1 Node everthing was stable as expected.
     
  10. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    @sumsum: I'll do a test with a Linux system later. My Windows machines are on a SSD disk, Virtio drivers 0.1.141
     
  11. aderumier

    aderumier Member

    Joined:
    May 14, 2013
    Messages:
    203
    Likes Received:
    18
  12. sumsum

    sumsum Member
    Proxmox Subscriber

    Joined:
    Oct 26, 2009
    Messages:
    157
    Likes Received:
    2
    for my part, the Windows10 VM is running on Default (kvm64)
     
  13. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    my servers are using kvm64 as well
    host cpu:
    root@host04:~# lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 4
    On-line CPU(s) list: 0-3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 23
    Model name: Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
    Stepping: 7
    CPU MHz: 2504.885
    CPU max MHz: 2499.0000
    CPU min MHz: 2003.0000
    BogoMIPS: 5009.77
    Virtualization: VT-x
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 3072K
    NUMA node0 CPU(s): 0-3
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority dtherm
     
  14. Pascual

    Pascual New Member

    Joined:
    Oct 26, 2017
    Messages:
    9
    Likes Received:
    0
    My case: default kvm64.
    I'll change to host and tell the results.
    I'm now checking with one VM, and the boot time is very fast now.
    I'll post the results.

    Regards.
     
  15. aderumier

    aderumier Member

    Joined:
    May 14, 2013
    Messages:
    203
    Likes Received:
    18
    interessting, with core2

    I found this note on centos

    "
    Limited CPU support for Windows 10 and Windows Server 2016 guests

    On a Red Hat Enterprise 6 host, Windows 10 and Windows Server 2016 guests can only be created when using the following CPU models:

    * the Intel Xeon E series
    * the Intel Xeon E7 family
    * Intel Xeon v2, v3, and v4
    * Opteron G2, G3, G4, G5, and G6

    For these CPU models, also make sure to set the CPU model of the guest to match the CPU model detected by running the "virsh capabilities" command on the host. Using the application default or hypervisor default prevents the guests from booting properly.

    To be able to use Windows 10 guests on Legacy Intel Core 2 processors (also known as Penryn) or Intel Xeon 55xx and 75xx processor families (also known as Nehalem), add the following flag to the Domain XML file, with either Penryn or Nehalem as MODELNAME:

    <cpu mode='custom' match='exact'>
    <model>MODELNAME</model>
    <feature name='erms' policy='require'/>
    </cpu>

    Other CPU models are not supported, and both Windows 10 guests and Windows Server 2016 guests created on them are likely to become unresponsive during the boot process.
    "

    I need to dig a little bit more
     
  16. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    90
    Likes Received:
    10
    I tried to run my VMs with CPUs in host mode but same result -> blue screen.
    Back with kernel 4.10, no problems...
     
  17. aderumier

    aderumier Member

    Joined:
    May 14, 2013
    Messages:
    203
    Likes Received:
    18
    what is your physical cpu model ?
     
  18. Pascual

    Pascual New Member

    Joined:
    Oct 26, 2017
    Messages:
    9
    Likes Received:
    0
    lscpu
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 4
    On-line CPU(s) list: 0-3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s): 1
    NUMA node(s): 1
    Vendor ID: GenuineIntel
    CPU family: 6
    Model: 158
    Model name: Intel(R) Xeon(R) CPU E3-1225 v6 @ 3.30GHz
    Stepping: 9
    CPU MHz: 3300.000
    CPU max MHz: 3700.0000
    CPU min MHz: 800.0000
    BogoMIPS: 6624.00
    Virtualization: VT-x
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 256K
    L3 cache: 8192K
    NUMA node0 CPU(s): 0-3
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp


    Yesterday I did some tests, with "CPU HOST" no problems.

    Last night I restored again the VM with "CPU KVM64" and we had BSOD at first boot.

    I'm going to do the same but with "CPU HOST" and i'll post the results.

    Thanks to all.
     
  19. Pascual

    Pascual New Member

    Joined:
    Oct 26, 2017
    Messages:
    9
    Likes Received:
    0
    BSOD again CRITICAL_STRUCTURE_CORRUPTION. this time using "CPU HOST" .

    CPU use on the VM is 100% and besides I got:

    TASK ERROR: VM quit/powerdown failed - got timeout when I trying to shutdown the machine.

    with "STOP" I got : trying to acquire lock...TASK ERROR: can't lock file '/var/lock/qemu-server/lock-101.conf' - got timeout

    Then I restarted pve-cluster service and finally it stopped.

    Perhaps I could try the old kernel, but, where I can download it from , coming from a clean fresh 5.1 PVE installation?

    Thanks.
     
  20. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,269
    Likes Received:
    505
    the pve-no-subscription repository also contains all the old packages:
    http://download.proxmox.com/debian/dists/stretch/pve-no-subscription/binary-amd64/
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice