Windows BSOD happening during backups

mythicalbox

New Member
Aug 30, 2012
16
0
1
Hello, I'm experiencing some very serious fatal errors with my Windows hosts that occur during my nightly Proxmox backups.

The message is as follows:

Code:
Problem signature:
  Problem Event Name:	BlueScreen
  OS Version:	6.0.6002.2.2.0.272.7
  Locale ID:	1033


Additional information about the problem:
  BCCode:	101
  BCP1:	0000000000000030
  BCP2:	0000000000000000
  BCP3:	FFFFFA60005EC180
  BCP4:	0000000000000001
  OS Version:	6_0_6002
  Service Pack:	2_0
  Product:	272_3


Files that help describe the problem:
  C:\Windows\Minidump\Mini090712-01.dmp
  C:\Users\administrator.VINE\AppData\Local\Temp\1\WER-249984-0.sysdata.xml
  C:\Users\administrator.VINE\AppData\Local\Temp\1\WERFB0B.tmp.version.txt


Read our privacy statement:
  http://go.microsoft.com/fwlink/?linkid=50163&clcid=0x0409

I did notice that my proxmox-pve, pve-qemu-kvm, and qemu-server packages are being held back:

Code:
root@proxmox03:~# apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages have been kept back:
  proxmox-ve-2.6.32 pve-qemu-kvm qemu-server
0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.

Could that ^^ be causing this? How can I push them through?

This is my current pve version:

Code:
root@proxmox03:~# pveversion
pve-manager/2.1/f32f3f46


In case it helps here are my pveperf stats:

Code:
root@proxmox02:/etc/pve/qemu-server# pveperf
CPU BOGOMIPS:      76597.44
REGEX/SECOND:      737449
HD SIZE:           94.49 GB (/dev/mapper/pve-root)
BUFFERED READS:    226.45 MB/sec
AVERAGE SEEK TIME: 8.33 ms
FSYNCS/SECOND:     1406.62
DNS EXT:           122.96 ms
DNS INT:           1.09 ms (mycompany.org)
 
if you upgrade, you should follow the upgrade howto´s.

run either 'aptitude update && aptitude full-upgrade' or 'apt-get update && apt-get dist-upgrade'

to check your packages, post the output of 'pveversion -v'
 
Here is the output:

Code:
root@proxmox02:~# pveversion -vpve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1

Do you think the older version of my packages could be my problem?
 
Spirit, I'm using the 61.63.103.3000 driver from Redhat published on 7/3/2012.

For what it's worth I was experiencing the blue screens before using IDE so I switched to the Virtio driver to see if it would help fix the problem but alas it hasn't.
 
do you use cache=none ?

if yes, can you try with directsync or writethrough ?

I use the default option of none. So I can understand better can you tell me what these options do and how they might help with the BSOD issue?

I just found one of my servers in a blue screen running the CPU at 100%. Here is a screen shot of it:


bsod2.png
 
about cache: I have a lot of bluescreen in the past, because of bug in virtio driver, not handling write flush correctly. (so in cache=none or writeback)

ok, it seem to not be related to disk, but to missing timer interrupt.

is it win2003/xp ?
do you use a ntp server, or is a member of domain with time sync ?
read this wiki article, I think it should help
http://pve.proxmox.com/wiki/Guest_Time_drift
 
spirit, I'm using Windows 2008 and Windows 2008 R2 guests all connected to a domain. I don't know how to check but I think you are correct that they are all synchronizing their time clocks with the primary domain controller which is also on the same Proxmox VE host.

I applied the bcdedit command to all the 2008 R2 servers. It did not work on the 2008 guest though. I get this error:

Code:
C:\Users\administrator>bcdedit /set {default} USEPLATFORMCLOCK on
The element data type specified is not recognized, or does not apply to the
specified entry.
Run "bcdedit /?" for command line assistance.


hotwired007, The servers are using local storage.



Now that I applied this command to help with time drift I will re-enable the backups for tonight and see what happens... I'll keep my fingers crossed as I expect that at least the Windows Standard 2008 server will still crash since I could not use the command...

Hopefully this does resolve the problems with the other ones as that would mean 4 out of 5 are now stable!

Also, on a side note... The servers have been stable since I disabled the back-ups.. So that's a good thing as I was concerned there might be more issues affecting them.

As a work-around would I be able to lower the priority on the back up process? Or is it such an intensive process that it's going to starve/monopolize the CPU no matter what I do?
 
The backups didn't go through last night.. I had them scheduled. I'll look into the error (I think I can handle it on my own) and try again tonight.

Do you backup to the same local disks or do you backup to an network share?

Hotwired, The backups are to an NFS share. If it helps any, I did extensive testing and could only get the write speed using DD to about 100 Mbps.
 
I remember seeing that exact same error when I first started using Proxmox in early 2010 for production.

If I remember right I had to turn off some power saving feature of the CPU in the BIOS.
Do not remember exactly which setting, I think it was C1E support that I turned off.

This issue can also be caused by overloading the CPU, assigning only one CPU core to the VM might help.
 
oh,yes,disable c1E, put the server in max performance in the bios, disable hyperthreading, it should help


I put the server in Max Performance and disabled C1E early last week and I'm sad to say that the blue screens are still happening.

Here is a copy of the back up job that triggered the crashes last night:

Code:
[COLOR=#000000][FONT=tahoma]INFO: starting new backup job: vzdump 100 101 102 103 104 105 --quiet 1 --mode snapshot --compress lzo --storage backupsOnNode3[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 100 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-proxmox02-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-100-2012_09_21-22_00_03.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-100-2012_09_21-22_00_03.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/vzsnap0/images/100/vm-100-disk-1.raw' to archive ('vm-disk-virtio0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lzop: Input/output error: <stdout>[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: received signal - terminate process[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 100 failed - command '/usr/lib/qemu-server/vmtar  '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-100-2012_09_21-22_00_03.tmp/qemu-server.conf' 'qemu-server.conf' '/mnt/vzsnap0/images/100/vm-100-disk-1.raw' 'vm-disk-virtio0.raw'|lzop >/mnt/pve/backupsOnNode3/dump/vzdump-qemu-100-2012_09_21-22_00_03.tar.dat' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 101 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-proxmox02-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-101-2012_09_21-22_37_37.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-101-2012_09_21-22_37_37.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/vzsnap0/images/101/vm-101-disk-1.raw' to archive ('vm-disk-virtio0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 42949675520 (69.90 MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 14.44GB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 101 (00:09:50)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 102 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-proxmox02-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-102-2012_09_21-22_47_27.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-102-2012_09_21-22_47_27.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/vzsnap0/images/102/vm-102-disk-1.raw' to archive ('vm-disk-virtio0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 80530639360 (83.12 MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 16.83GB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: delete old backup '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-102-2012_09_16-03_22_55.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 102 (00:15:54)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 103 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-proxmox02-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-103-2012_09_21-23_03_21.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-103-2012_09_21-23_03_21.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/vzsnap0/images/103/vm-103-disk-1.raw' to archive ('vm-disk-virtio0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lzop: Input/output error: <stdout>[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: received signal - terminate process[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 103 failed - command '/usr/lib/qemu-server/vmtar  '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-103-2012_09_21-23_03_21.tmp/qemu-server.conf' 'qemu-server.conf' '/mnt/vzsnap0/images/103/vm-103-disk-1.raw' 'vm-disk-virtio0.raw'|lzop >/mnt/pve/backupsOnNode3/dump/vzdump-qemu-103-2012_09_21-23_03_21.tar.dat' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 104 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-proxmox02-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-104-2012_09_22-01_02_10.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-104-2012_09_22-01_02_10.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/vzsnap0/images/104/vm-104-disk-1.raw' to archive ('vm-disk-virtio0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: lzop: Input/output error: <stdout>[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: received signal - terminate process[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]ERROR: Backup of VM 104 failed - command '/usr/lib/qemu-server/vmtar  '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-104-2012_09_22-01_02_10.tmp/qemu-server.conf' 'qemu-server.conf' '/mnt/vzsnap0/images/104/vm-104-disk-1.raw' 'vm-disk-virtio0.raw'|lzop >/mnt/pve/backupsOnNode3/dump/vzdump-qemu-104-2012_09_22-01_02_10.tar.dat' failed: exit code 1[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Starting Backup of VM 105 (qemu)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: status = running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: backup mode: snapshot[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: ionice priority: 7[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO:   Logical volume "vzsnap-proxmox02-0" created[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: creating archive '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-105-2012_09_22-01_38_51.tar.lzo'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/pve/backupsOnNode3/dump/vzdump-qemu-105-2012_09_22-01_38_51.tmp/qemu-server.conf' to archive ('qemu-server.conf')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: adding '/mnt/vzsnap0/images/105/vm-105-disk-1.raw' to archive ('vm-disk-ide0.raw')[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Total bytes written: 536873472 (256.00 MiB/s)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: archive file size: 2MB[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Finished Backup of VM 105 (00:00:04)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]INFO: Backup job finished with errors[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]TASK ERROR: job errors[/FONT][/COLOR]


You'll notice that it failed.. I think due to a full disk on the backup server partition. The partition is on a different server then the one the servers are on. I don't know why that would cause the servers to blue screen though.
 
I've had blue screens when backing up Windows Server 2008 R2. In my experience, it was caused by the proxmox system not having enough memory. The server had 8GB total memory and two KVMs had 4GB each. If I reduced each VM to 3.5GB, backups ran fine.
Proxmox didn't reboot but when I logged into the windows server in the morning, I'd get the unexpected shutdown message and the time was about 20 minutes into the backup.
 
Was there ever a definitive answer to this? I have a windows 2003 guest that will often BSOD at the beginning of a backup. It's right down to the minute.
I took screenshots of two recent stop errors. One is a 077, the other is a 07E and cites atapi.sys.

In this thread I see the following possible explanations:
1) Correct clock drift by using RTC
2) An old virtIO bug.....this guest happens to be using IDE though
3) Not enough RAM.
4) Need to disable C1E in host BIOS.

Did any other possibilities come up since the thread died? Did the OP ever find a fix?
 
Was there ever a definitive answer to this? I have a windows 2003 guest that will often BSOD at the beginning of a backup. It's right down to the minute.
I took screenshots of two recent stop errors. One is a 077, the other is a 07E and cites atapi.sys.

In this thread I see the following possible explanations:
1) Correct clock drift by using RTC
2) An old virtIO bug.....this guest happens to be using IDE though
3) Not enough RAM.
4) Need to disable C1E in host BIOS.

Did any other possibilities come up since the thread died? Did the OP ever find a fix?

Do you have tried to use virtio disk ? ide can be slow.
 
Hi,

I'm quite frustrated this is happening to me too, and far too often (more than once a week).
Someone suggests to use virtio, some other suggests that standard drivers are more stable.
I have the impression (hints from the event viewer) that the backup job is putting too much of a load on the disk and the running vm just gets disconnected from its own virtual disk, and proxmox as a system just doesnt cope with giving a background priority to backup jobs???
Looking at the forum there are quite a few threads complaining about BSODs during nightly backups!
I'm in a setting where proxmox is on one RAID1, while the two machines having issues during backups are one in the same RAID1 and the other on another RAID1 of the same controller.
I wanted to use proxmox just thanks to this feature of online backups...
and I thought the system would have been stable...
...now I really wonder what to do because I've put myself in a horrible situation..
..and I cannot live restoring servers in bluescreen overnight!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!