Windows BSOD happening during backups

A problem does exist, not sure what it is yet.

Previously I suggest using IDE, some of my 2003 machines do have virtIO drivers. My memory must be getting bad or I work too much or both.

Anyhow, we have not had blue screens but we have had issues that are triggered by backups.
We are hosting Adobe Connect on two diffeernt VMs that are located on different hardware.

On both, many times, when the backup starts I see some errors in the Event Viewer from the virtIO driver.
Then the Java based Adobe Connect application throws an exception and halts.
Connect complains about a health check it performed timed out.
Normally this health check takes just a few ms, when the backup starts it seems like the time jumps 1-2minutes, or there is a pause of 1-2minutes.

This did not occur in 3.0, started with 3.1

Our backup is scheduled at 12:10AM in Proxmox GUI, the Connect VM is first to get backed up:
virtio-events.png

Then Connect Dies:
connectdies.png

Data from VirtIO Event:
Source: viostor
Category: None
EventID: 129
Description:
Code:
The description for Event ID ( 129 ) in Source ( viostor ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: \Device\RaidPort0.
Data:
Code:
0000: 0f 00 10 00 01 00 68 00   ......h.
0008: 00 00 00 00 81 00 04 80   ......€
0010: 04 00 00 00 00 00 00 00   ........
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........
0028: 00 00 00 00 00 00 00 00   ........
0030: 00 00 00 00 81 00 04 80   ......€

These are running an older driver
I attempted to update one to the latest virtio-win-0.1-65.iso but it BSOD on reboot.
I *think* that happened because I let windows "choose the best driver", should have told it specifically to use the driver in wnet.

The other one I updated to virtio-win-0.1-59.iso with no problem.
This particular one seems to have the problem less often so I am unsure if the new driver fixed anything.

There is one thing that maybe has something to do with this.
The one that has issues the most is the first VM on that server to get backed up.
I had to restore it from backup after the botched driver update, it is now last to get backed up, no problems last night.

The one that has had issues less is the third vm to get backed up on that server.
 
Hi! and now again two system crashes / BSODs that I will have to repair overnight!

Thanks for the answers, BUT:
- Spirit: is the issue related to virtio? No, I'm not using it...
- e100: is this issue related to 3.1? No, this thread was opened last year...

I agree with you as I also see errors in system events,
like ntfs (filesystem structure is corrupt) or atapi (disk device didnt respond in time)
errors that were not there before and are definitely related to the BSODs and OS corruptions that just happen during live backups!
I fear the problem might not be only related to backups though!
Sure it's the first time I see something like this and never had similar issues with vmware.
It's a very serious and fatal issue for a server!
Unbelievable, really.
 
I've been running 2003 VMs in proxmox since version 1.5, however long ago that is.
Not really had any issues until updating to 3.1 and still no BSOD just some sort of time delay/issue when the backup is started.

Do you have screen shots of the BSOD?
Info from Event Viewer about the BSOD?

What do your system resources look like during a backup?
free ram, swap usage, load average, cpu usage, disk io, etc..

What are your system specifications?
Raid Array? Disks,RAM,CPU, etc

Does it BSOD when the backup starts, ends, somtime in the middle?

Doing a snapshot backup is a resource straining task.
Maybe your system is just too slow to handle doing a backup without causing problems for windows.
 
Well, I'm farily certain now that the issues I'm observing correlate with windows VM's and overloaded disk I/O on the node.

BSOD's are the most recent thing, but I have had other anomolies like corrupt files. I had a Windows 7 system that I had to restore from backup, two instances of two different 2003 server's unable to boot due to corrupt registry....as well as other problems with damaged files. Nearly all of my issues were happening on a Dell R515 with a Perc H200 RAID controller. The Perc H200 is a very slow piece of junk. After moving most of the VM's to other nodes the trouble went away. I knew that machine was not quick, but I didn't think slow would equate to broken. Linux guests didn't seem to have any trouble...they were definitely running very slow and would often seem to just "pause" waiting for I/O, but otherwise they worked fine.

It's not simply one piece of broken hardware because I can make trouble start happening if I move too many Windows VM's onto a node that doesn't have enough I/O performance to support them.

I can defintely work around this now that I know, but it was not expected.

- e100: I'm not sure if my trouble is related to your event viewer messages. I haven't observed that...but most of mine are using IDE. My BSOD's could happen at any time really, but most often at the beginning of a backup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!