VM's crashing on Ivybridge Xeon's

nz_monkey

Renowned Member
Jan 17, 2013
61
0
71
Hi,

We have some new Dell C8220 sleds with E5-2680 v2 xeon processors. These have been added along side our sleds with E5-2680 v1 processors.

When loading Proxmox 3.2 on to them with the 3.10.0-1-pve kernel we notice that Windows guests will either not boot, or will cause the host to kernel panic.

If we start these up on our Sandybridge hosts they are fine.

Has anyone else experienced this problem ?
 
3.10 kernel is not stable yet.

you can have a look at redhat bugzilla

https://bugzilla.redhat.com/buglist.cgi?quicksearch=rhel7 kvm&list_id=2393042

That may be the case. But we have Proxmox 3.0 with the 3.10 kernel running on our C8220's with SandyBridge Xeon's, each with around 80x Windows VM's for around 6 months now, and it has been stable.

It is just on Ivybridge Xeon's that we have problems.

I will try a custom kernel build and see if we can make progress.
 
Hi Spirit.

Yes. We have tested with 2.6.32 (pve-kernel), 3.10 (pve-kernel), 3.10.36 (our build) and 3.15rc1 (mainline) all exhibit the exact same behavior.

Im not sure if the problem is the Ivybridge Xeon's, the 256GB of RAM or possibly a change in QEMU.
 
Last edited:
OK we have narrowed this down some more.

- Linux guests work perfectly
- Windows 2008r2 guests blue screen on boot with the OS set to Windows 2008r2
- Windows 2008r2 guests get further but then display "Windows does not support your hardware" if we set the OS type to Other
- Windows 2012 blue screens on boot no matter what

The differences between this C8220 and our working production ones is,

Production
PVE 3.0 with 3.10 Kernel and QEMU 1.4.2
2x 8 Core Sandybridge Xeon Processors
192GB ECC RAM

New
PVE3.2 with QEMU 1.7.1
2x 10 Core Ivybridge Xeon Processors
256GB ECC RAM

Have the options passed to the guests changed between PVE 3.0 and PVE 3.2 ?

I can see PVE3.0 graphics used to be "-vga cirrus" and PVE 3.2 is now "-vga std" as well as it now passes "-cpu kvm64,hv_spinlocks=0xffff,hv_relaxed,+lahf_lm,+x2apic,+sep" where PVE3.0 used to not pass any particular CPU options.

Could the extra "-cpu kvm64,hv_spinlocks=0xffff,hv_relaxed,+lahf_lm,+x2apic,+sep" options be causing this?

Is anyone else running PVE3.2 on Ivybridge Xeon E5's ?
 
Could the extra "-cpu kvm64,hv_spinlocks=0xffff,hv_relaxed,+lahf_lm,+x2a pic,+sep" options be causing this?

hv_spinlocks=0xffff,hv_relaxed are hyper-v enlightements (optimisations) (for >win2008)
+sep is needed for win2012/win8
+lahf_lm is needed for win2012r2/win8.1

I don't think it's related, but if you want to test, you can copy-paste the kvm command line, and just remove the cpu flags.

 
OK, so I did a bit more testing.

I reloaded one of the Ivybridge hosts with Proxmox VE 3.0, installed the 3.10.0-2-pve kernel and rebooted. It will boot and run all Windows VM's perfectly.

So the problem seems to be related to a change that has happened in PVE 3.2.

On PVE 3.2 I tried changing the guests to OS type "other" to remove the extra flags that were added in PVE 3.2, but the Windows guests still crash on boot.

We can repeat these crashes consistently on two separate hosts running PVE 3.2.


Is anyone else running PVE 3.2 on E5-XXXv2 hosts e.g. Ivybridge Xeon's ?
 
...

Is anyone else running PVE 3.2 on E5-XXXv2 hosts e.g. Ivybridge Xeon's ?

no issues on our E5-2620v2, all windows variants are running great.
 
Post your VM config:

> qm config VMID
 
Post your VM config:

> qm config VMID

Hi Tom

Code:
balloon: 512boot: c
bootdisk: scsi0
cores: 1
description: SPLA via KMS%0Avirtio-scsi disks%0AKVM Tools 13.10%0A%0AUse this template for IaaS customers! %0A
hotplug: 1
ide2: none,media=cdrom
machine: pc-i440fx-1.4
memory: 4096
name: win2k12-ivybridge-test03
net0: virtio=AA:C5:FF:BE:6D:18,bridge=vmbr0,tag=201
ostype: win8
scsi0: tier3-rbd-orbit:vm-197-disk-1,cache=writeback,size=40G
scsihw: virtio-scsi-pci
sockets: 2
tablet: 0

These are cloned from our templates built on PVE3.0. The VM's are cloned from a sysprepped image, then on boot they bluescreen.
We also tried starting Windows VM's that were not-sysprepped, e.g. they were already known to work perfectly on PVE3.0, they also bluescreen on boot when started on PVE3.2
 
Last edited:
I think you are hitting a problem which is currently under investigation. Problem is that it seems that using scsi disks is somehow broken in pve-3.2. Does your VM work if you change to IDE or Virtio instead of SCSI?
 
use virtio (not virtio-scsi).
 
I think you are hitting a problem which is currently under investigation. Problem is that it seems that using scsi disks is somehow broken in pve-3.2. Does your VM work if you change to IDE or Virtio instead of SCSI?

OK, I tried two scenarios:

1. Machine previously booted with virtio-scsi and had bluescreened. Changed to virtio-blk and started. Result = Blue screen as soon as Windows HAL loads drivers
2. Clone from template, change to virtio-blk before boot, started. Machine boots, does Windows HAL detection and then boots correctly.

Soo, the issue appears to be with scsi disks in guests on PVE3.2

Thanks for all the help.

What can I do to help get this issue resolved ?
 
Last edited:
I only see this in windows guests. but why do you use scsi for windows? virtio is faster and stable.
 
Yes this only happens on Windows guests which is 95% of what we run...

We use virtio-scsi due to support of trim/discard which frees unused blocks on the backing storage.

virtio-scsi is also just as fast as virtio-blk in our and Redhat's testing, and until the latest Proxmox had been perfectly stable.
 
Last edited: