Windows 2003 64-Bit Random Crashes - APIC related?

mazer9

New Member
Oct 25, 2010
18
0
1
I have a Proxmox Server with the following specs:

Version:

pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-28
pve-kernel-2.6.32-4-pve: 2.6.32-28
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-2
ksm-control-daemon: 1.0-4

VM Configuration:

name: TS64
ide2: none,media=cdrom
bootdisk: ide0
ostype: w2k3
ide0: data:vm-104-disk-1
memory: 10240
sockets: 1
vlan0: virtio=C6:4C:B3:BB:AD:67
onboot: 1
cores: 4
boot: cad
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1

CPU 2x Xeon Quad Core E5620 2.4GHZ Processors:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2400.323
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 9
cpu cores : 4
apicid : 19
initial apicid : 19
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.19
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Performance:

CPU BOGOMIPS: 76803.21
REGEX/SECOND: 850066
HD SIZE: 33.96 GB (/dev/mapper/pve-root)
BUFFERED READS: 333.03 MB/sec
AVERAGE SEEK TIME: 6.10 ms
FSYNCS/SECOND: 2948.85
DNS EXT: 131.42 ms
DNS INT: 1.28 ms

I've been successfully running 2 Windows 2003 32-Bit Standard Edition Servers on this server for over a month now. Both were migrations from actual physical servers. However, I've continued to receive random crashes on a Windows 2003 64-bit standard edition running terminal services, which was a fresh install. The server runs fine for hours under a decent load (20 users) and then will crash with a 3B bug check code (System_Service_Exception). I opened a ticket with Microsoft and submitted multiple memory dumps and their engineers suggested the following:

Dump Analyses Result:
===================

What happened is that the OS initiated an APIC /software interrupt. This is handled by the APIC in real hardware. In our Virtual Environment case where we are using Linux based VM – Proxmox, the VM implementation somehow has to make it happen on a virtual machine with the same latency in the virtual APIC. The problem is that there is a latency between when it was initiated and when it happened.


Below are the details for understanding the process or concept of APIC interrupts:

What the Local APIC Is
The Local APIC (LAPIC) is a circuit that is part of the CPU chip. It contains these basic elements:
A mechanism for generating
1. interrupts
2. A mechanism for accepting interrupts
3. A timer

If you have a multiprocessor system, the APIC's are wired together so they can communicate. So the LAPIC on CPU 0 can communicate with the LAPIC on CPU 1, etc.


What the IO APIC Is

This is a separate chip that is wired to the Local APIC's so it can forward interrupts on to the CPU chips. It is programmed similar to the 8259's but has more flexibility.
It is wired to the same bus as the Local APIC's so it can communicate with them.

Note:- In our scenario, it’s all Virtualized interrupts or calls because of hypervisor in picture and thus we need to contact the VM application vendor to get a check of this latency issue in APIC interrupt.
------------------------------------------------End of Message----------------------------------



Their engineers are saying that there is a latency issue with APIC calls. I'm not exactly sure how this can be corrected. Is this a known issue and is their a solution to this problem. I love Proxmox, but my main reason for using it was to upgrade my terminal server to better hardware, while leveraging it for other virtual machines as well.
 
I'm not sure. However, wouldn't Windows see that as a cpu hardware issue and panic? It's strange because I'm only experiencing this issue on the 64-bit VM.

Anyways, the fastest way to resolve such issues to report a bug on the qemu bug tracker:

https://bugs.launchpad.net/qemu

including detailed instructions how to reproduce the bug.
 
Any news about this issue? I.e. have you tried kernel 2.6.35 and solved?
I've almost similar situation, a server with 2 physical CPU (Xeon E5506), a win2003-32 r2-sp2 guest than runs just fine, and the win2003-64 r2-sp2 that has BSOD randomly.
# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

# cat /etc/qemu-server/102.conf
name: Win2003_OS1
vlan1: virtio=7E:AC:F5:66:9D:3D,e1000=1E:AA:04:26:5B:29,rtl8139=A2:61:F9:70:2F:67
bootdisk: virtio0
ostype: w2k3
memory: 6144
sockets: 1
onboot: 1
description: Win2003-R2-SP2 64 bit SQL 2008R2, IP .7, host srvctrl1
cores: 4
virtio0: sas:vm-102-disk-1
ide0: none,media=cdrom
boot: cad
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
virtio1: sas:vm-102-disk-2

The BSOD is 3B (see attached image, if I will be able to attach, I don't run proprietary flash but gnash)
Wondering if migrate to kernel 2.6.35 can solve, otherwise I'm in a big trouble and have to go for physical server or try to reinstall everything on a 32 bit version
 

Attachments

  • dump_win.png
    dump_win.png
    41.1 KB · Views: 29
Last edited:
I've upgraded proxmox to kernel 2.6.35 from 2.6.32:
proxmox:~# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.7-9
pve-kernel-2.6.32-4-pve: 2.6.32-30
pve-kernel-2.6.35-1-pve: 2.6.35-9
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4
proxmox:~#

In the quest I''ve also updated disk virtio driver to latest (from 1.1.11 to 1.1.16), ethernet is still e1000 from latest intel drivers.
I know should have been a better troubleshooting try new virtio driver on the old kernel, and then upgrade the kernel, but I'm in big troubles so I had to try everything possible in last desperate attempt of make it work.
Since then, I had NO MORE crashes (14 days so far), while before I had 2-3 crash a week.
The downside is that I've no OpenVZ, but is not fundamental as reliable KVM guests :)
Hope this can help others
 
I couldn't afford the instability and decided to migrate the virtual machine to a standalone physical server. The migration was successful and the server is now stable. If you don't mind, I'd be interested to hear any feedback on stability/instability in the future. Perhaps I will be able to go virtual again at some point.
 
Bad news from today :( The Win2003-64 server crashed abruptly again, while was working fine since my last post on the subject (this time the error is 0x0000003b instead of 7e, don't know if is significant)
Did it lasted so long because the SQL database was moved to physical hardware so the virtual one has been less stressed? Or what happened to trigger the crash?
My last hope is grab from the test repository the newest kernel and kvm 0.14, but is risky since the other Win2003 VM (32 bit) seems stable (never crashed), and I could end with a totally unusable server. On the other end I have to move the db back to VM sooner or later, and can't have a SQL server that is unstable that way (ok, crashes seem more rare, but how long will them be recoverable? Will I end up with a "sorry, too many crashes, your system is irremediably corrupted?).
My boss is far from happy of this situation, and he probably will look at a different (proprietary) VM solution for the client. Maybe then that solution will crash also, but proprietary programs can crash without scandal, for FOSS seems is not permitted :( (of course, personally I will never give up my own usage of the wonderful Proxmox)
I'm a bit depressed and in bad moods, sorry.
If anyone has some news or can share his experience I would be grateful.
 
Last edited:
Again, the engineers at Windows who analyzed my dump files told me it was a processor latency issue, which probably explains why the crashes appear to be random.

Until this issue is resolved by QEMU, I wouldn't feel comfortable running a 64-bit 2003 VM on this platform. I migrated the same VM image over to a physical machine and it has not crashed since. This might be your only solution in the meantime.

I still have a couple productions 32-bit 2003 servers that have been running without a hitch since I migrated them to the platform a few months ago.

If only the 64-bit issues could be resolved I would absolutely love Proxmox!

Good luck!
 
I encourage everybody to test the latest release KVM 0.14 (currently pvetest repo) - and give feedback if it fixes the issues for you.
 
Unfortunatly, is a production system where database has been ported to a physical Win2003-64 server, and now we have prepared a Win2003-32 guest to bring it back in a few days. Is too risky test 0.14, if we break the functioning of the other Win2003-32 VM the client will fire us asap (I've seen too many regressions in FOSS to be confident about the upgrade). In addition, is the only dual socket server I've ever seen in my life, so doubt I will have a chance to test such a configuration in the near future.
Don't know if you already have, but maybe there is the need to have a "test lab" with some stress-test (I've no idea how to set them up in MsWin), with "advanced" configurations. I mean, kvm in normal usage has had some issues (i.e. XP or debian with nic "frozzen") but generally worked fine so far. Instead with this installation (64 bit, multicore guest, dual socket host, SQL-2008, Domain Controller, etc.) I've been really in troubles (had to go to e1000 due really strange connectivity problems with virtio in both machines, these 64bit crashes, and the constant fear that if something does not work is due to kvm... everyone here pointing their fingers at me for every problem, and I don't know M$os well enough to fight back and find misconfiguration or other possible issues). Even ping latency (average 0.658 with e1000) is something I'm not happy with (could timing be critical in some M$win operations? Who knows!).
Btw, I've applied the registry change suggested in the wiki to improve performances, could this create problems as well?
Proxmox is great, but working with proprietary os as guest is not simple (wondering if some M$Win patch checks if is running inside kvm and does random crashes... is not the first time it happens, I remember DR-DOS and first releases of Windows, or Paradox DB when Access was launched).
 
I understand. And yes, we operate a test lab but we do not see such issues here. So it would help if you do also tests in your environment, just use test equipment.
 
I had the same problem. KVM+(Win64(8G Ram)+mssql64) and unexpected reboot 3-5 times per day with bsod
The computer has rebooted from a bugcheck. The bugcheck was: 0x0000003b (0x0000000080000003, 0xfffff80001026cd0, 0xfffffadf5a0f8d50, 0x0000000000000000).

It stops after applying hotfix from MS http://support.microsoft.com/kb/941410

--
Denis Zhuravlev (myinformix@narod.ru)
 
Last edited:
yes I have also seen this crash on 64 and 32 bit systems it is due to updates not being installed on the guest system.

Makes me wonder how much memory this 64 bit guest is using running sql?
 
I see that this fix is really old (year 2007), how can it be that is not automatically included in Win2003 automatic update procedure? I mean, usually after installation and registration I update M$os with it's automatic update procedure, so that fix should already be included in my vm. In any case, I will "resurrect" the vm and try to apply the fix, and see if crashes again (even if now does have any load, since I've moved everything to a 32 bit VM + phisical one (database))
 
I know is a very old thread, but this way people involved in the old issue will be notified about my message :)
We probably move all VM to a "backup" proxmox server, clean install Proxmox 2.1 in the dual socket server where the issue was present (also updating the MB BIOS), try the BIOS parameter suggested here (if found) and try again a Win 2003 64Bit VM installation.
Someone here can give advices if should probably be a good solution, if 2.1 brings news (or unresolved) issues in this configurations, or whatever?
Thanks a lot in advance
 
in any case, try KVM 1.1 (currently available in pvetest only)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!