[SOLVED] Force stop a VM no matter what

Ralms

New Member
May 16, 2019
10
0
1
34
Hi there,

At work I'm mostly a Hyper-V user and I like the idea of being in control of my VMs.
Something that I found really disappointing from Proxmox is it failing to perform "Stop" sometimes, which should be the equivalent of pulling the plug on a machine. But still, Proxmox tries to be all nice and stuff, failing sometimes to perform something this simple.

I found out that Proxmox doesn't have virsh installed which makes sense, however I need a way to force stop my VM since right now Proxmox has 0 control over the VM as "QEMU Guest Agent" is set to "Enable" but is currently unresponsive.

Any ideas please?

Thank you,
Ralms.
 
Something that I found really disappointing from Proxmox is it failing to perform "Stop" sometimes, which should be the equivalent of pulling the plug on a machine. But still, Proxmox tries to be all nice and stuff, failing sometimes to perform something this simple.

Huh, shutdown or stop? For stop this would point at a big issue in your setup, as there it sends a SIGKILL if the VM doesn't stop (already unlikely) after a some minutes, and that fails on Linux only, at really only, if the KVM process hangs in uninterruptible sleep. Normally this meant that the VM used NFS storage and that isn't available anymore (shutdown, network cut, ...) as the NFS protocol is a bit, well, special and thus the Linux Kernel cannot do anything than wait for it, it's not nice, but it really shouldn't be a regular thing happening, if so move over to something like CIFS, Ceph or iSCSI which can handle that much better.

Shutdown is another story, that one is graceful and will "fail" (i.e., refuse to kill the VM and risk possible data loss from bad written programs).

I found out that Proxmox doesn't have virsh

Virsh is apple to oranges in comparison with Proxmox VE.

Proxmox has 0 control over the VM as "QEMU Guest Agent" is set to "Enable" but is currently unresponsive.

Do you have the QEMU guest agent installed in the VM? As setting it to enabled cannot install it automatically in the guest on whatever operating system runs there.

Actually, if you just enabled it without installing the Agent that would explain that on shutdown (again, not stop) the task fails, as there's no agent in the VM reacting on it, so the task runs into a timeout.
 
Huh, shutdown or stop? For stop this would point at a big issue in your setup, as there it sends a SIGKILL if the VM doesn't stop (already unlikely) after a some minutes, and that fails on Linux only, at really only, if the KVM process hangs in uninterruptible sleep. Normally this meant that the VM used NFS storage and that isn't available anymore (shutdown, network cut, ...) as the NFS protocol is a bit, well, special and thus the Linux Kernel cannot do anything than wait for it, it's not nice, but it really shouldn't be a regular thing happening, if so move over to something like CIFS, Ceph or iSCSI which can handle that much better.
Yes, I meant stop, however it was not working for this VM due to the QEMU guest agent being enabled and unreachable, as such it just returns the tipical "TASK ERROR: can't lock file '/var/lock/qemu-server/lock-201.conf' - got timeout"

Also, the VM is not using NFS, everything is local for this one.

Virsh is apple to oranges in comparison with Proxmox VE.
Well, isn't Proxmox KVM based?

Do you have the QEMU guest agent installed in the VM? As setting it to enabled cannot install it automatically in the guest on whatever operating system runs there.

Actually, if you just enabled it without installing the Agent that would explain that on shutdown (again, not stop) the task fails, as there's no agent in the VM reacting on it, so the task runs into a timeout.
No I didn't, the change to enabled it on the VM Options was for some reason pending, so when I rebooted the VM, it got enabled I was left in this grey zone where I couldn't turn it off.

Anyway, after 10 attempts of doing everything possible, Proxmox finally managed to perform the "STOP" command and not throw the "can't lock file" error.

Thank you
 
the tipical "TASK ERROR: can't lock file '/var/lock/qemu-server/lock-201.conf' - got timeout"

That means that another task for that VM runs, not that stop itself fails, you can cancel that other task and then stop it.

Well, isn't Proxmox KVM based?

And how's that related to virsh? KVM/QEMU are completely independent of virsh, virsh can do some basic managing of KVM/QEMU VMs, Proxmox VE is a full blown manager for KVM/QEMU and a lot of other things. It like saying that both NTFS and ext4 use hard disks to save their data.

No I didn't, the change to enabled it on the VM Options was for some reason pending, so when I rebooted the VM, it got enabled I was left in this grey zone where I couldn't turn it off.

FYI: You can revert pending changes.

The QEMU Guest Agent (QGA) is a pending change as it needs to add an serial port to talk with the QGA daemon (hopefully) running in the VM, that's why it cannot get applied "live".

Anyway, after 10 attempts of doing everything possible, Proxmox finally managed to perform the "STOP" command and not throw the "can't lock file" error.
So you used shutdown first, that task "blocked" the stop task and only after it run into the timeout the stop could go through.

I've had two "improvements" for such, or similar débâcles, in mind since a bit but did not come do implement them:
  1. VM shutdown should send an ACPI shutdown also in the case where the QGA is enabled in the config but not reachable (either ping it before, or do that after it times out)
  2. Allow a "overwrite most locks" force stop, which interrupts currently running tasks and force stops it ASAP
I mean, as said, you should be able to cancle the other running tasks (at the bottom of the Webinterface is the task list, running tasks are always at the top, you can double click it to open it and then there is a "Cancel" button), but especially point 1 seems sensible in general, and point 2. seems like a convenience method nice to have (also for me, when testing/developing some weird stuff).
 
That means that another task for that VM runs, not that stop itself fails, you can cancel that other task and then stop it.

So you used shutdown first, that task "blocked" the stop task and only after it run into the timeout the stop could go through.

Yes and I just noticed the multiple tasks after.
I was looking at the Console for the VM, pressed "Shutdown" and nothing happen, so I was like ok this is unresponsive for some reason, lets just stop it.

I would suggest that the error message to be improved on these cases, where it would check if another task is already running, would be more intuitive, the current error is super generic.

And how's that related to virsh? KVM/QEMU are completely independent of virsh, virsh can do some basic managing of KVM/QEMU VMs, Proxmox VE is a full blown manager for KVM/QEMU and a lot of other things. It like saying that both NTFS and ext4 use hard disks to save their data.

Thank you for the insights, I though that virsh was the "oficial" one from KVM, my bad.



I've had two "improvements" for such, or similar débâcles, in mind since a bit but did not come do implement them:
  1. VM shutdown should send an ACPI shutdown also in the case where the QGA is enabled in the config but not reachable (either ping it before, or do that after it times out)
  2. Allow a "overwrite most locks" force stop, which interrupts currently running tasks and force stops it ASAP
I mean, as said, you should be able to cancle the other running tasks (at the bottom of the Webinterface is the task list, running tasks are always at the top, you can double click it to open it and then there is a "Cancel" button), but especially point 1 seems sensible in general, and point 2. seems like a convenience method nice to have (also for me, when testing/developing some weird stuff).

Yeah, I feel that those 2 would be great improvements in terms of user experience.

The cancel option is not intuitive, as you can't do it directly from the list, you have to open a specific task and then you have the "Stop" button, something I didn't even knew it had until now :)
I always looked at the "Tasks" in the bottom as more as log entries, didn't even knew I could open them for more information or stop them.
 
@t.lamprecht
is it possible to have QEMU guest agent in an OSX Vm?
If I am not wrong it seems not available for OSX

If so can we detect when OSX VM shutdown and consequently shutdown Proxmox Ve in a safe way for VM datas?
Thank you
 
Yes, I meant stop, however it was not working for this VM due to the QEMU guest agent being enabled and unreachable, as such it just returns the tipical "TASK ERROR: can't lock file '/var/lock/qemu-server/lock-201.conf' - got timeout"

Also, the VM is not using NFS, everything is local for this one.


Well, isn't Proxmox KVM based?


No I didn't, the change to enabled it on the VM Options was for some reason pending, so when I rebooted the VM, it got enabled I was left in this grey zone where I couldn't turn it off.

Anyway, after 10 attempts of doing everything possible, Proxmox finally managed to perform the "STOP" command and not throw the "can't lock file" error.

Thank you
You just rm -rf the lock file at the command line and then go back and stop it and it will work.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!