VMs go into pause state and can't be continued

cstrav

New Member
Jan 29, 2010
8
0
1
I am having severe problems with some of my VMs, and there doesn't seem to be any commonality. Some of the VMs are Linux guests, one is Windows. Some use SCSI disks, some use IDE. The issue may be related to high I/O to the qcow2 disk, but I don't know how to troubleshoot the problem. It happens with either kernel 2.6.24 or 2.6.32. Here is the issue:

I load a VM from a snapshot, stop the VM, then start it. The VM comes up fine and works for several minutes up to an hour or so, but after some amount of time, 0% CPU usage is seen. "info status" on the QEMU monitor shows that the VM is paused. I type "cont", but the VM immediately pauses again. As far as I can tell, there are no log entries under /var/log indicating why the VM is pausing, and there is nothing in dmesg.

Any ideas on how to debug this? Thanks in advance.
 
I am having severe problems with some of my VMs, and there doesn't seem to be any commonality. Some of the VMs are Linux guests, one is Windows. Some use SCSI disks, some use IDE. The issue may be related to high I/O to the qcow2 disk, but I don't know how to troubleshoot the problem. It happens with either kernel 2.6.24 or 2.6.32. Here is the issue:

I load a VM from a snapshot, stop the VM, then start it. The VM comes up fine and works for several minutes up to an hour or so, but after some amount of time, 0% CPU usage is seen. "info status" on the QEMU monitor shows that the VM is paused. I type "cont", but the VM immediately pauses again. As far as I can tell, there are no log entries under /var/log indicating why the VM is pausing, and there is nothing in dmesg.

Any ideas on how to debug this? Thanks in advance.
Hi,
can you resume the VM in the console with
Code:
qm resume 101
perhaps you see a hint?
Try also to convert the qcow2-file to raw (and change the filename in the configfile (/etc/qemu-server/).
Which network-card do you use in the client? there are some issues with virtio - perhaps you try the e1000.

Udo
 
Hi,
can you resume the VM in the console with
Code:
qm resume 101
perhaps you see a hint?
Try also to convert the qcow2-file to raw (and change the filename in the configfile (/etc/qemu-server/).
Which network-card do you use in the client? there are some issues with virtio - perhaps you try the e1000.

Udo

I tried " qm resume 114", and I get the same result. The command returns with a zero return code and nothing is printed, but the VM is still paused.

I will try converting the disk to raw and let you know the results. I had already converted the NIC driver to e1000, but it made no difference. FYI - I also tried switching to the 2.6.18 kernel and I am seeing the same behavior. Thanks for the reply.
 
I tried " qm resume 114", and I get the same result. The command returns with a zero return code and nothing is printed, but the VM is still paused.

I will try converting the disk to raw and let you know the results. I had already converted the NIC driver to e1000, but it made no difference. FYI - I also tried switching to the 2.6.18 kernel and I am seeing the same behavior. Thanks for the reply.

I tried to convert the disk image to raw using the following command:

qemu-img convert -O raw vm-109-disk-1.qcow2 vm-109-disk-1.raw

This results in the error:

qemu-img: error while writing

It does create a raw file, but if I try to use it the VM just reboots after the boot menu and about 20 seconds of high cpu utilization.

It seems as if the qcow2 format is not so reliable, since I have three VMs in this same state. Not sure if taking a snapshot with one host kernel version and then restoring it with another could be the problem. If so, it might be nice to track the kernel version the snapshot was created with and act appropriately if trying to use it with a different kernel version.
 
Hi,
this message looks like a damaged disk! If the qcow-file are bad, then the convert get trouble to read - but not to write!!.
You should make very fast a backup and test your disks (is this a raid? recaculate the checksum, check for bad blocks).

Udo

It all makes sense now. The /var/lib/vz partition was full. Doh! It would be nice if there were some warning about this in the web interface. There were no logs or any other indication other than the VMs were randomly locking up. Thanks for the hint that it was a write issue, I should have found this sooner.
 
It all makes sense now. The /var/lib/vz partition was full. Doh! It would be nice if there were some warning about this in the web interface. There were no logs or any other indication other than the VMs were randomly locking up. Thanks for the hint that it was a write issue, I should have found this sooner.
Hi,
you see the graph at the storage-menu of the gui... but without warnings. I think this is ok if you use raw - the files dont grow.

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!