VMs go into pause state and can't be continued

cstrav · Mar 24, 2010

I am having severe problems with some of my VMs, and there doesn't seem to be any commonality. Some of the VMs are Linux guests, one is Windows. Some use SCSI disks, some use IDE. The issue may be related to high I/O to the qcow2 disk, but I don't know how to troubleshoot the problem. It happens with either kernel 2.6.24 or 2.6.32. Here is the issue:

I load a VM from a snapshot, stop the VM, then start it. The VM comes up fine and works for several minutes up to an hour or so, but after some amount of time, 0% CPU usage is seen. "info status" on the QEMU monitor shows that the VM is paused. I type "cont", but the VM immediately pauses again. As far as I can tell, there are no log entries under /var/log indicating why the VM is pausing, and there is nothing in dmesg.

Any ideas on how to debug this? Thanks in advance.

udo · Mar 25, 2010

cstrav said:
I am having severe problems with some of my VMs, and there doesn't seem to be any commonality. Some of the VMs are Linux guests, one is Windows. Some use SCSI disks, some use IDE. The issue may be related to high I/O to the qcow2 disk, but I don't know how to troubleshoot the problem. It happens with either kernel 2.6.24 or 2.6.32. Here is the issue:

I load a VM from a snapshot, stop the VM, then start it. The VM comes up fine and works for several minutes up to an hour or so, but after some amount of time, 0% CPU usage is seen. "info status" on the QEMU monitor shows that the VM is paused. I type "cont", but the VM immediately pauses again. As far as I can tell, there are no log entries under /var/log indicating why the VM is pausing, and there is nothing in dmesg.

Any ideas on how to debug this? Thanks in advance.

Hi,
can you resume the VM in the console with

Code:

qm resume 101

perhaps you see a hint?
Try also to convert the qcow2-file to raw (and change the filename in the configfile (/etc/qemu-server/).
Which network-card do you use in the client? there are some issues with virtio - perhaps you try the e1000.

Udo

cstrav · Mar 25, 2010

udo said:
Hi,
can you resume the VM in the console with

Code:

qm resume 101

perhaps you see a hint?
Try also to convert the qcow2-file to raw (and change the filename in the configfile (/etc/qemu-server/).
Which network-card do you use in the client? there are some issues with virtio - perhaps you try the e1000.

Udo

I tried " qm resume 114", and I get the same result. The command returns with a zero return code and nothing is printed, but the VM is still paused.

I will try converting the disk to raw and let you know the results. I had already converted the NIC driver to e1000, but it made no difference. FYI - I also tried switching to the 2.6.18 kernel and I am seeing the same behavior. Thanks for the reply.

cstrav · Mar 26, 2010

cstrav said:
I tried " qm resume 114", and I get the same result. The command returns with a zero return code and nothing is printed, but the VM is still paused.

I will try converting the disk to raw and let you know the results. I had already converted the NIC driver to e1000, but it made no difference. FYI - I also tried switching to the 2.6.18 kernel and I am seeing the same behavior. Thanks for the reply.

I tried to convert the disk image to raw using the following command:

qemu-img convert -O raw vm-109-disk-1.qcow2 vm-109-disk-1.raw

This results in the error:

qemu-img: error while writing

It does create a raw file, but if I try to use it the VM just reboots after the boot menu and about 20 seconds of high cpu utilization.

It seems as if the qcow2 format is not so reliable, since I have three VMs in this same state. Not sure if taking a snapshot with one host kernel version and then restoring it with another could be the problem. If so, it might be nice to track the kernel version the snapshot was created with and act appropriately if trying to use it with a different kernel version.

udo · Mar 26, 2010

cstrav said:
...
qemu-img: error while writing

Hi,
this message looks like a damaged disk! If the qcow-file are bad, then the convert get trouble to read - but not to write!!.
You should make very fast a backup and test your disks (is this a raid? recaculate the checksum, check for bad blocks).

Udo

cstrav · Mar 26, 2010

udo said:
Hi,
this message looks like a damaged disk! If the qcow-file are bad, then the convert get trouble to read - but not to write!!.
You should make very fast a backup and test your disks (is this a raid? recaculate the checksum, check for bad blocks).

Udo

It all makes sense now. The /var/lib/vz partition was full. Doh! It would be nice if there were some warning about this in the web interface. There were no logs or any other indication other than the VMs were randomly locking up. Thanks for the hint that it was a write issue, I should have found this sooner.

udo · Mar 26, 2010

cstrav said:
It all makes sense now. The /var/lib/vz partition was full. Doh! It would be nice if there were some warning about this in the web interface. There were no logs or any other indication other than the VMs were randomly locking up. Thanks for the hint that it was a write issue, I should have found this sooner.

Hi,
you see the graph at the storage-menu of the gui... but without warnings. I think this is ok if you use raw - the files dont grow.

Udo

Search

Search

VMs go into pause state and can't be continued

cstrav

New Member

udo

Distinguished Member

cstrav

New Member

cstrav

New Member

udo

Distinguished Member

cstrav

New Member

udo

Distinguished Member

We value your privacy