Windows 2003 Guest crashing

jigils

New Member
Mar 27, 2012
4
0
1
Crystal Lake, IL
We have one particular Windows 2003 guest that crashes intermittently (less than once a week). No BSOD, no Windows error logs, no particular date, time, reason. Nothing to indicate why. The guest just ends up in 'stopped" mode until we manually re-start it.

On the same host, we have another Windows 2003 guest that has been running without issue.

Any ideas?
 
without any details about your system, without logs and error messages it will be hard to help.

pls provide:


  1. make sure you run the very latest packages
  2. provide "pveversion -v"
  3. run "pveperf" and tell details about your physical hardware
  4. cat /etc/pve/qemu-server/VMID.conf
  5. any logs? (see /var/log/syslog)
 
Thanks, Tom, here's what we got:

pveversion -v
--------------
pve-manager: 2.0-12 (pve-manager/2.0/784729f4)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-53
pve-kernel-2.6.32-6-pve: 2.6.32-53
lvm2: 2.02.86-1pve2
clvm: 2.02.86-1pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.6.0-1
redhat-cluster-pve: 3.1.7-1
pve-cluster: 1.0-12
qemu-server: 2.0-9
pve-firmware: 1.0-13
libpve-common-perl: 1.0-8
libpve-access-control: 1.0-2
libpve-storage-perl: 2.0-8
vncterm: 1.0-2
vzctl: 3.0.29-3pve3
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.1-1



pveperf
-------
CPU BOGOMIPS: 6400.17
REGEX/SECOND: 591278
HD SIZE: 24.85 GB (/dev/mapper/pve-root)
BUFFERED READS: 103.80 MB/sec
AVERAGE SEEK TIME: 5.52 ms
FSYNCS/SECOND: 1448.94
DNS EXT: 123.00 ms
DNS INT: 0.99 ms (fob.com)



cat /etc/pve/qemu-server/102.conf
---------------------------------
cpu: qemu32
net0: e1000=72:80:4E:13:C0:C1,bridge=vmbr0
ide2: cdrom,media=cdrom
name: TH-SRV-MAIL
bootdisk: scsi0
cores: 1
scsi0: SnapPVE:102/vm-102-disk-1.raw
ostype: wxp
memory: 2208
sockets: 2
scsi1: SnapPVE:102/vm-102-disk-2.raw
onboot: 1



Hardware
--------
HP ProLiant DL360 G5
(2) Intel Xeon 1.60GHz
5 GB RAM
Proxmox installed on internal drive
VMs hosted on NFS share (Snap Server 410)
GB connection between server and NFS share.




Syslog excerpt
--------------
Mar 27 07:17:01 th-srv-prxmx01 /USR/SBIN/CRON[987124]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: flushing old values
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: rotating journals
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: started new journal /var/lib/rrdcached/journal//rrd.journal.1332851545.995433
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1332844345.995458
Mar 27 08:15:40 th-srv-prxmx01 kernel: vmbr0: port 3(tap102i0) entering disabled state
Mar 27 08:15:40 th-srv-prxmx01 kernel: vmbr0: port 3(tap102i0) entering disabled state
Mar 27 08:17:01 th-srv-prxmx01 /USR/SBIN/CRON[989407]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 27 08:17:26 th-srv-prxmx01 ntpd[1278]: Deleting interface #24 tap102i0, fe80::f8cb:91ff:fecb:cc9c#123, interface stats: received=0, sent=0, dropped=0, active_time=936000 secs




The syslog is an excerpt from one particular incident this morning. The machine went down right about 8:15. The logs from other occasions are very similar, with no obvious error. Windows event logs show nothing out of the ordinary.
 
Thanks, Tom, here's what we got:

pveversion -v
--------------
pve-manager: 2.0-12 (pve-manager/2.0/784729f4)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-53
pve-kernel-2.6.32-6-pve: 2.6.32-53
lvm2: 2.02.86-1pve2
clvm: 2.02.86-1pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.6.0-1
redhat-cluster-pve: 3.1.7-1
pve-cluster: 1.0-12
qemu-server: 2.0-9
pve-firmware: 1.0-13
libpve-common-perl: 1.0-8
libpve-access-control: 1.0-2
libpve-storage-perl: 2.0-8
vncterm: 1.0-2
vzctl: 3.0.29-3pve3
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.1-1

you did not follow point 1. use the latest packages. run 'aptitude update && aptitude full-upgrade"


pveperf
-------
CPU BOGOMIPS: 6400.17
REGEX/SECOND: 591278
HD SIZE: 24.85 GB (/dev/mapper/pve-root)
BUFFERED READS: 103.80 MB/sec
AVERAGE SEEK TIME: 5.52 ms
FSYNCS/SECOND: 1448.94
DNS EXT: 123.00 ms
DNS INT: 0.99 ms (fob.com)

look ok.

cat /etc/pve/qemu-server/102.conf
---------------------------------
cpu: qemu32
net0: e1000=72:80:4E:13:C0:C1,bridge=vmbr0
ide2: cdrom,media=cdrom
name: TH-SRV-MAIL
bootdisk: scsi0
cores: 1
scsi0: SnapPVE:102/vm-102-disk-1.raw
ostype: wxp
memory: 2208
sockets: 2
scsi1: SnapPVE:102/vm-102-disk-2.raw
onboot: 1

do not use scsi, especially not for windows. switch to virtio, see http://pve.proxmox.com/wiki/Paravirtualized_Block_Drivers_for_Windows#Adding_a_temporary_drive



Hardware
--------
HP ProLiant DL360 G5
(2) Intel Xeon 1.60GHz
5 GB RAM
Proxmox installed on internal drive
VMs hosted on NFS share (Snap Server 410)
GB connection between server and NFS share.




Syslog excerpt
--------------
Mar 27 07:17:01 th-srv-prxmx01 /USR/SBIN/CRON[987124]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: flushing old values
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: rotating journals
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: started new journal /var/lib/rrdcached/journal//rrd.journal.1332851545.995433
Mar 27 07:32:25 th-srv-prxmx01 rrdcached[1325]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1332844345.995458
Mar 27 08:15:40 th-srv-prxmx01 kernel: vmbr0: port 3(tap102i0) entering disabled state
Mar 27 08:15:40 th-srv-prxmx01 kernel: vmbr0: port 3(tap102i0) entering disabled state
Mar 27 08:17:01 th-srv-prxmx01 /USR/SBIN/CRON[989407]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 27 08:17:26 th-srv-prxmx01 ntpd[1278]: Deleting interface #24 tap102i0, fe80::f8cb:91ff:fecb:cc9c#123, interface stats: received=0, sent=0, dropped=0, active_time=936000 secs




The syslog is an excerpt from one particular incident this morning. The machine went down right about 8:15. The logs from other occasions are very similar, with no obvious error. Windows event logs show nothing out of the ordinary.
 
I followed what you were saying in point 1 but I can't do those updates anytime I want. I'll have to schedule a time to get those and the switch to virtio drives done. Thanks again and I'll post back after I've had time to take those steps.
 
I used to have similar issue with one windows guest randomly crashing. I ended up fixing it by making a snapshot backup and then deleting the guest and restoring from backup. Havent had that happen in a few months now.
 
yes
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!