I am also experiencing the CPU flag related issue, and not the VirtIO related one with kernel 4.13.
Two VMs, both 2012R2 fully updated as of 11/9/17, no VirtIO drivers installed in either, they are more or less identical as they are AD DCs. I have ensured that VM configurations are identical, but that doesn't matter anyway because it is always the one on the Xeons that crashes. I've tried host, kvm64, and qemu64 CPU types, no difference between them related to crashes.
I have two hosts, one with 2x Opteron 6220, the other with 2x Xeon L5420. On the Xeon system, either VM will BSOD with Critical_Structure_Corruption after a few minutes up to a few hours. On the Opteron system, both VMs are stable. Both Proxmox systems were fully updated on 10/24 and are running:
I have just updated the Xeon system to the pve-kernel-4.10.17-5-pve_4.10.17-25_amd64.deb package (and everything zfs related to 0.7.3) and am about to reboot it to see if that resolves the issue. I have not tried updating the microcode, and would like more details about that before I try it.
Thanks. For apples-to-apples comparison, I have done the following items:
on the Xeon system, installed intel-microcode, and all updates except the kernel
on the Opteron system, installed amd64-microcode, and all updates including the kernel
Xeon L5420 microcode was 0xa0b, and the Opteron 6220 microcode was 0x600063d, neither changed after the install and reboot. However per Blue screen with 5.1 I don't expect this to make a significant difference even if there was an update. Also, the BIOS is the latest for each board, so maybe that's why the microcode was already up to date. At least now I can offer a direct comparison between 22.214.171.124 and 126.96.36.199. If I don't post again, you can assume that the VM hasn't crashed with the Critical_Structure_Corruption BSOD, otherwise if it does I'll report it. I'll be keeping an eye on this thread either way.
Edit: I confirmed with dmesg that the microcode update driver did indeed run during boot on both systems.
Based on certain things I've done on my W10 VM and one of the proxmox hosts, here some short reporting from my site.
upgrade to virtio-win-0.1.141 --> blue screen appears again
upgrade the intel microcode to 0x20 --> blue screen appears again
download/install and running the 4.10.17-5-pve kernel --> blue screen does not appearing again...cross the fingers.
Windows upgrade from 1703 to 1709, performed on step 1 and 2 was not able because blue screens repeatedly.
Only after done the step 3, I'm was able to upgrade the window from version 1703 to 1709. And it is still stable.
@wolfgang: regards to the provided special kernel, what do you think, would it be possible to expected an solution (maybe in the near future)? So we would be able to use again the standard apt-get upgrade process with all the standard components from the pve-no-subscription repository.
I encountered this on a fresh 5.1 install on a Windows Server 2012 R2 VM, and a Windows Server 2012R2 VM on a upgraded system. I am currently downgrading them both to kernel 4.10
Dual Xeon E5-2620V4s
RAID backed storage on an Adaptec 8805 HBA
SuperMicro X10-DRW-i mainboard
Old system, upgraded:
Dell Poweredge R520
RAID backed storage on a PERC H710 HBA
Dual Xeon E5-2430 V0s
Fortunately the SMC system isn't in production yet. It definitely BSOD'd at least once under heavy I/O load. If there's any more information I can provide please let me know
Edit: I noticed looking through the dmesg output a ton of messages regarding linux_edac scrolled by that don’t on 4.10. A bunch of PCI IDs then it complained about not being able to find a Broadcom device. I don’t have a full output unfortunately