1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Blue screen with 5.1

Discussion in 'Proxmox VE: Installation and configuration' started by cybermcm, Oct 24, 2017.

  1. DerMerowinger

    DerMerowinger New Member

    Joined:
    Nov 5, 2017
    Messages:
    3
    Likes Received:
    0
    Oh that was fast.
    Thank you very much!
     
  2. cpierr03

    cpierr03 New Member

    Joined:
    Nov 1, 2017
    Messages:
    4
    Likes Received:
    0
    pve-kernel-4.10.17-5 works well.

    FWIW, the problem was MOST pronounced on ZFS drives with the 4.13 kernel. As in, Windows was seldom able to boot without BSODs.

    The error occurred on my XFS drive as well, but only a handful of times, and performance seemed to be the same. On ZFS drives with the new kernel, there was a severe degradation in VM I/O performance.
     
  3. DerMerowinger

    DerMerowinger New Member

    Joined:
    Nov 5, 2017
    Messages:
    3
    Likes Received:
    0
    Unfortunately I am a Linux-Noob. As you can see here
    "GRUB_CMDLINE_LINUX_DEFAULT="quiet" put scsi_mod.use_blk_mq=n

    root@pve:~# update-grub
    /usr/sbin/grub-mkconfig: 9: /etc/default/grub: put: not found"
    What am I missing?
     
  4. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    52
    Likes Received:
    5
    it should be:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet scsi_mod.use_blk_mq=n"
    but it didn't work, at least not for me, bug still exists with 4.13 kernel
    and don't forget to initiate
    update-grub
    before reboot
     
  5. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    2,473
    Likes Received:
    113
    Has someone a good testcase to reproduce the windows Bluescreen?
    Because on my machine it can take up to 24 hours that Windows crash.
    Debugging is hard with this conditions.
     
  6. brwainer

    brwainer New Member

    Joined:
    Jun 20, 2017
    Messages:
    15
    Likes Received:
    1
    I am also experiencing the CPU flag related issue, and not the VirtIO related one with kernel 4.13.
    • Two VMs, both 2012R2 fully updated as of 11/9/17, no VirtIO drivers installed in either, they are more or less identical as they are AD DCs. I have ensured that VM configurations are identical, but that doesn't matter anyway because it is always the one on the Xeons that crashes. I've tried host, kvm64, and qemu64 CPU types, no difference between them related to crashes.
      balloon: 0
      boot: dcn
      bootdisk: ide0
      cores: 4
      cpu: qemu64
      ide0: HDDs:vm-113-disk-1,size=127G
      ide2: none,media=cdrom
      memory: 1024
      name: NETSERV2
      net0: e1000=00:15:5D:01:87:02,bridge=vmbr0
      numa: 0
      onboot: 1
      ostype: win8
      smbios1: uuid=73e9a13f-9e97-48d5-8ef0-443d0b16c3df
      sockets: 1
      startup: order=2
    • I have two hosts, one with 2x Opteron 6220, the other with 2x Xeon L5420. On the Xeon system, either VM will BSOD with Critical_Structure_Corruption after a few minutes up to a few hours. On the Opteron system, both VMs are stable. Both Proxmox systems were fully updated on 10/24 and are running:
      proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
      pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
      pve-kernel-4.13.4-1-pve: 4.13.4-25
      libpve-http-server-perl: 2.0-6
      lvm2: 2.02.168-pve6
      corosync: 2.4.2-pve3
      libqb0: 1.0.1-1
      pve-cluster: 5.0-15
      qemu-server: 5.0-17
      pve-firmware: 2.0-3
      libpve-common-perl: 5.0-20
      libpve-guest-common-perl: 2.0-13
      libpve-access-control: 5.0-7
      libpve-storage-perl: 5.0-16
      pve-libspice-server1: 0.12.8-3
      vncterm: 1.5-2
      pve-docs: 5.1-12
      pve-qemu-kvm: 2.9.1-2
      pve-container: 2.0-17
      pve-firewall: 3.0-3
      pve-ha-manager: 2.0-3
      ksm-control-daemon: 1.2-2
      glusterfs-client: 3.8.8-1
      lxc-pve: 2.1.0-2
      lxcfs: 2.0.7-pve4
      criu: 2.11.1-1~bpo90
      novnc-pve: 0.6-4
      smartmontools: 6.5+svn4324-1
      zfsutils-linux: 0.7.2-pve1~bpo90
    I have just updated the Xeon system to the pve-kernel-4.10.17-5-pve_4.10.17-25_amd64.deb package (and everything zfs related to 0.7.3) and am about to reboot it to see if that resolves the issue. I have not tried updating the microcode, and would like more details about that before I try it.
     
  7. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    2,473
    Likes Received:
    113
  8. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    52
    Likes Received:
    5
    @wolfgang: I still have the issue with the 4.13 kernel and normally Windows crashes within one hour. How can I assist you to track this down?
     
  9. brwainer

    brwainer New Member

    Joined:
    Jun 20, 2017
    Messages:
    15
    Likes Received:
    1
    Thanks. For apples-to-apples comparison, I have done the following items:
    • on the Xeon system, installed intel-microcode, and all updates except the kernel
    • on the Opteron system, installed amd64-microcode, and all updates including the kernel
    Xeon L5420 microcode was 0xa0b, and the Opteron 6220 microcode was 0x600063d, neither changed after the install and reboot. However per Blue screen with 5.1 I don't expect this to make a significant difference even if there was an update. Also, the BIOS is the latest for each board, so maybe that's why the microcode was already up to date. At least now I can offer a direct comparison between 4.10.17.5 and 4.13.4.1. If I don't post again, you can assume that the VM hasn't crashed with the Critical_Structure_Corruption BSOD, otherwise if it does I'll report it. I'll be keeping an eye on this thread either way.

    Edit: I confirmed with dmesg that the microcode update driver did indeed run during boot on both systems.
     
  10. vankooch

    vankooch New Member

    Joined:
    Nov 5, 2017
    Messages:
    7
    Likes Received:
    0
    @wolfgang

    I'll try to find a system, if we have enough spare parts around. I'll check that out on monday and give you feedback.

    Regards
     
  11. brwainer

    brwainer New Member

    Joined:
    Jun 20, 2017
    Messages:
    15
    Likes Received:
    1
    @wolfgang is there some command that inventories the CPU features that we could run that would help determine what the least common denominator for this issue is?
     
  12. vankooch

    vankooch New Member

    Joined:
    Nov 5, 2017
    Messages:
    7
    Likes Received:
    0
    Have you tried this?

    cat /proc/cpuinfo
     
  13. brwainer

    brwainer New Member

    Joined:
    Jun 20, 2017
    Messages:
    15
    Likes Received:
    1
    I’m aware of that, but I don’t know if that would specifically help @wolfgang see the least common denominator for CPU features.
     
  14. due

    due New Member

    Joined:
    Oct 17, 2017
    Messages:
    4
    Likes Received:
    0
    Based on certain things I've done on my W10 VM and one of the proxmox hosts, here some short reporting from my site.
    1. upgrade to virtio-win-0.1.141 --> blue screen appears again
    2. upgrade the intel microcode to 0x20 --> blue screen appears again
    3. download/install and running the 4.10.17-5-pve kernel --> blue screen does not appearing again...cross the fingers.
    Windows upgrade from 1703 to 1709, performed on step 1 and 2 was not able because blue screens repeatedly.
    Only after done the step 3, I'm was able to upgrade the window from version 1703 to 1709. And it is still stable.

    @wolfgang: regards to the provided special kernel, what do you think, would it be possible to expected an solution (maybe in the near future)? So we would be able to use again the standard apt-get upgrade process with all the standard components from the pve-no-subscription repository.

    many thanks for your effort.

    cheers
     
  15. morph027

    morph027 Active Member

    Joined:
    Mar 22, 2013
    Messages:
    327
    Likes Received:
    36
    I've migrated a physical windows to vm last week and tracked my bsod down to the qxl driver.
     
  16. FastLaneJB

    FastLaneJB Member

    Joined:
    Feb 3, 2012
    Messages:
    64
    Likes Received:
    4
    Interesting. I've upgraded two hosts so far with no issues but I don't use the qxl driver in Windows at all.
     
  17. Serverhamster

    Serverhamster New Member

    Joined:
    Nov 5, 2017
    Messages:
    2
    Likes Received:
    0
    For me, running that 4.10 kernel is the only solution with proven stability. Achieved a week uptime now, instead of 2 or 3 blue screens every day.
     
  18. cybermcm

    cybermcm Member

    Joined:
    Aug 20, 2017
    Messages:
    52
    Likes Received:
    5
  19. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    2,473
    Likes Received:
    113
    @cybermcm It looks like a problem in the mmu of kvm.
     
  20. Sean Brackeen

    Sean Brackeen New Member

    Joined:
    Thursday
    Messages:
    1
    Likes Received:
    0
    I encountered this on a fresh 5.1 install on a Windows Server 2012 R2 VM, and a Windows Server 2012R2 VM on a upgraded system. I am currently downgrading them both to kernel 4.10

    New system:
    • Dual Xeon E5-2620V4s
    • RAID backed storage on an Adaptec 8805 HBA
    • SuperMicro X10-DRW-i mainboard
    Old system, upgraded:
    • Dell Poweredge R520
    • RAID backed storage on a PERC H710 HBA
    • Dual Xeon E5-2430 V0s
    Fortunately the SMC system isn't in production yet. It definitely BSOD'd at least once under heavy I/O load. If there's any more information I can provide please let me know

    Edit: I noticed looking through the dmesg output a ton of messages regarding linux_edac scrolled by that don’t on 4.10. A bunch of PCI IDs then it complained about not being able to find a Broadcom device. I don’t have a full output unfortunately
     
    #120 Sean Brackeen, Nov 16, 2017 at 00:44
    Last edited: Nov 16, 2017 at 03:35

Share This Page