Some Windows guests hanging every night

Hmm. May be when we updated / upgraded PVE, but now we do not migrate VMs every so often. Don't know.

Christophe.
 
This error logs are mostly when the storage is too slow and lagging.

Ah yes! But :
- we (very) regularly watch io delay in proxmox (Quad Xeon IBM), and on EqualLogic 10gbE iSCSI SANs. Everything is OK (io delay never reached 4%), no lag at all.
- why the hell an eventually slow storage would crash a VM with virtio drivers, and NEVER crash exactly the same VM with IDE / e1000?
- why quasi identical VMs (Windows, virtio) never had a crash on the same platform?

Effectively, VM disk is unreachable before beeing unresponsive : logs are NOT recorded on the VM disk, but still are, for about 5 minutes, via network syslog! But it is not because of SAN slowness or lagging : other VMs work like a charm...


Thank you,

Christophe.
 
Seems like Christophe and I are seeing the exact same problem.

Another data point: my one problem VM on the host running the latest 2.3.32-33-pve kernel just hung while using IDE disks. It was using VirtIO NIC which I am now going to change and see if it helps.)
 
This same VM (running on the latest 2.6 PVE kernel) hung AGAIN using no VirtIO anything! (The Baloon driver still loads (so I can see the actual memory usage) but the VM is set to a fixed size.)
 
This same VM (running on the latest 2.6 PVE kernel) hung AGAIN using no VirtIO anything! (The Baloon driver still loads (so I can see the actual memory usage) but the VM is set to a fixed size.)

Hi Pegasus,

No more crash / unresponsive VMs here, with IDE / e1000, balloon driver loaded but fixed RAM size, max performance in energy settings.

Christophe.
 
And now one VM on the host running the previous kernel also just hung! It is using IDE disks but VirtIO NIC and has been fine for 5 days.

WHAT. THE. HELL?!

How can I roll back the qemu package? I've had about enough of this.
 
Okay people. The VM in question has had IDE disk, e1000 NIC and LSI controller for the past two days and it STILL hung just a half hour ago! Again, nothing is logged in Windows' system log. What do you suggest at this point?
 
Okay people. The VM in question has had IDE disk, e1000 NIC and LSI controller for the past two days and it STILL hung just a half hour ago! Again, nothing is logged in Windows' system log. What do you suggest at this point?

1. Post VM config here.
2. Try to monitor iops on the host.
 
I had a similar issue with Windows 7 and 2008 and the virtoi 0.1.81 drivers on my Dell Servers. After changing everything from virtio to virtio-scsi every Win7 / 2008 machine is stable again.
 
I had a similar issue with Windows 7 and 2008 and the virtoi 0.1.81 drivers on my Dell Servers. After changing everything from virtio to virtio-scsi every Win7 / 2008 machine is stable again.

I'm not using the latest virtio drivers yet but there is an important general recommendation: check exact version of installed drivers and compare to what is in drivers directories because it is possible to install, for example, Windows 7 driver for Windows 2003 and vice versa that can make server very unstable.
 
Sorry for the delay everyone. Thanks for your patience.

1. Post VM config here.
This is the one that still occasionally freezes (about every two days):
Code:
boot: cdn
bootdisk: ide0
cores: 1
ide0: vms2-drbd:143/vm-143-disk-1.qcow2,format=qcow2,size=22G
ide2: none,media=cdrom
memory: 2048
name: FileData1
net0: e1000=BE:B6:AD:68:D4:4B,bridge=vmbr0
onboot: 1
ostype: win7
sockets: 1
startup: order=1

2. Try to monitor iops on the host.

I do see strange things in the Disk IO RRD graphs on the VMs. They seem to regularly have excursions into the mega, giga and even peta ranges! These excursions do coincide with the troubled VM freezing, but I see them on all VMs on the system even though only the one is freezing.

Can I trust that these graphs are accurate?

As for disk I/O contention, I just restarted the host and two of the VMs needed to install software updates on startup while the third was CHKDSKing (so lots of I/O all around) yet all three operated normally. The one VM will freeze even in the middle of the night when there's no activity, so I don't think it's disk I/O related.

(This host reboot was to upgrade the kernel to 2.6.32-34-pve (from 2.6.32-32-pve) so I'll see if the situation improves with the newer kernel.)
 
Hello again. Okay, it looks like using RAW format on the system drives has fixed all of my problems. (Some of my VMs continue to run fine with qcow2 system drives though. Until I upgraded the kernel, then they too were affected. Changing their system drives to RAW has fixed the problem there as well.)
Next I will change the drivers back to VirtIO and see what happens.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!