Why Not Integrate `pct stop <containerID> --kill` into the Web GUI?

Lonnie · Jul 13, 2024

I have been a dedicated user of Proxmox for several versions, and I have consistently noticed an issue with the web interface's ability to immediately stop a virtual machine or container. Specifically, the "Stop Immediately" option in the right-click menu has never worked for me.

Today, while using Proxmox 8.2.2, I encountered a significant problem with a Debian container. The summary status indicated that the container was running, but I couldn't SSH into it. Additionally, the console provided by the web interface couldn't log in, similar to the SSH issue. I tried both the "Shutdown" and "Stop" options, but neither could stop the container. When I attempted to reboot the server, it hung indefinitely. Ultimately, I had to physically shut down the node before I could successfully bring it back online.

The error messages in the web UI cluster log mentioned an inability to acquire a lock, but when I tried to review these logs later, the sub-logs were blank, preventing me from relaying the exact messages.

After all of this, I referred to my notes and found that the command

Code:

pct stop <containerID> --kill

has effectively stopped containers for me in the past. This leads me to question why this reliable method isn't integrated into the web UI. The "Stop" option is supposed to achieve this, but when it fails, it would be beneficial to have an alternative. Perhaps "Stop" could be wired to this command or an additional "Kill" option could be introduced to ensure users can stop containers without leaving the web GUI, which excels in other areas.

I appreciate your attention to this matter and look forward to any improvements you can make to address this issue.

fiona · Jul 15, 2024

Hi,

Lonnie said:
I have been a dedicated user of Proxmox for several versions, and I have consistently noticed an issue with the web interface's ability to immediately stop a virtual machine or container. Specifically, the "Stop Immediately" option in the right-click menu has never worked for me.

Today, while using Proxmox 8.2.2, I encountered a significant problem with a Debian container. The summary status indicated that the container was running, but I couldn't SSH into it. Additionally, the console provided by the web interface couldn't log in, similar to the SSH issue. I tried both the "Shutdown" and "Stop" options, but neither could stop the container. When I attempted to reboot the server, it hung indefinitely. Ultimately, I had to physically shut down the node before I could successfully bring it back online.

Lonnie said:
The error messages in the web UI cluster log mentioned an inability to acquire a lock, but when I tried to review these logs later, the sub-logs were blank, preventing me from relaying the exact messages.

The lock is to prevent multiple tasks to access the same config and potentially cause inconsistencies. In Proxmox VE 8.2 it's possible to override shutdown tasks when doing a hard stop, did you use that checkbox?

Lonnie said:
After all of this, I referred to my notes and found that the command

Code:

pct stop <containerID> --kill

has effectively stopped containers for me in the past.

There is no kill option for the pct stop command. Do you mean lxc-stop?

Lonnie said:
This leads me to question why this reliable method isn't integrated into the web UI. The "Stop" option is supposed to achieve this, but when it fails, it would be beneficial to have an alternative. Perhaps "Stop" could be wired to this command or an additional "Kill" option could be introduced to ensure users can stop containers without leaving the web GUI, which excels in other areas.

Because depending on what is going on in the container, this can lead to data loss, so it should only be a last resort, not some routine UI action.

Lonnie said:
I appreciate your attention to this matter and look forward to any improvements you can make to address this issue.

Lonnie · Jul 16, 2024

@fiona, thanks for the reply. I didn't actually stop the virtual machine with the command I specified. I found that command in my notes after I had already addressed the situation by physically powering off the server. Here are the commands from my notes, which were likely written years ago:

Bash:

# List containers and their IDs
pct list

# Kill Container by ID:
pct stop <containterID> --kill

# Get the process ID of the virtual machine you want to kill:
qm list

# Kill that virtual machine's Process
kill -9 <pid>

I did indeed see the checkbox you mentioned, and I tried to stop the container both ways (with the box checked and unchecked).

None of the provided options (with or without checked boxes) could stop this container. Rebooting the node hung indefinitely, causing me to improperly power off the server to stop the container!

I understand the need for caution when it comes to data loss, but consider this: which poses more risk of data loss?

1. Providing a GUI option that can actually kill a runaway container, OR
2. Yanking the power cable out of the wall to stop the container?

See my point? Improperly powering off the node poses no less risk of data loss than a proper kill option.

Therefore, I think the kill option should indeed be available in the web GUI. Upon selecting this option, you can include warnings in red that it should only be used as a last resort and to try other options first. However, I can't see a good reason to omit this option from the Web GUI altogether if all other alternatives risk the same degree of data loss as having the option would.

For example, consider Virtual Machine Manager. Like Proxmox, VMM also provides those less invasive options for stopping a VM. However, when those options don't work, I don't have to do a hard power off of the HOST MACHINE to stop a VM. Instead, VMM has a "Force Off" option that can be used. When you choose this option, it stops the VM immediately.

Surely, providing this option is better than omitting it because when none of the other options work to stop the container, you have to resort to an even more invasive alternative, potentially risking even more disruption/data loss for other running VMs and containers that are not even hung up. If there is a way to do this with less data loss, those steps could be automated into the event handler of the "Force Off" click.

Thanks for considering this.

BobhWasatch · Jul 16, 2024

Lonnie said:
Therefore, I think the kill option should indeed be available in the web GUI. Upon selecting this option, you can include warnings in red that it should only be used as a last resort and to try other options first. However, I can't see a good reason to omit this option from the Web GUI altogether if all other alternatives risk the same degree of data loss as having the option would.

First off, warnings in red don't work. People ignore them, especially when things have gone wrong. See all the threads about people wrecking their system after getting a warning about what they were about to do.

Secondly, if a process is in an uninterruptible wait inside the kernel (state D in ps) it can't be killed. That can happen in the case of driver bugs or more often with NFS mounts where it is a hard mount and the server is down. Remember, a container is not a VM. It is running on the host's kernel. My point being that there are cases that can't be recovered.

Something like that is probably what happened in your case. A red button would not be able to fix it. If it happens a lot to one of your containers, maybe that workload should be in a VM instead. Then it doesn't matter so much if the guest kernel is hung.

johnha · 2024-11-03T06:10:37+0100

It would be great to have the ability to `pct unlock <lxc-id>` from the GUI. I had this issue today due to a failed scheduled backup and I rebooted the node via GUI (on my phone), but even after the restart, my lxc container wouldn't start and a lock icon was showing for that container. I wasn't able to remotely ssh into the host from my phone and I had to wait until I got back to my desk to run `pct unlock 123`. Would've been great to be able to do this from the GUI. This may be slightly off topic from the OP's request, but I'd say it's in a similar vein.

fiona · 2024-11-04T10:09:28+0100

Hi,

johnha said:
It would be great to have the ability to `pct unlock <lxc-id>` from the GUI. I had this issue today due to a failed scheduled backup and I rebooted the node via GUI (on my phone), but even after the restart, my lxc container wouldn't start and a lock icon was showing for that container. I wasn't able to remotely ssh into the host from my phone and I had to wait until I got back to my desk to run `pct unlock 123`. Would've been great to be able to do this from the GUI. This may be slightly off topic from the OP's request, but I'd say it's in a similar vein.

you can use the shell in the UI (select your node and then Shell) in the instead of SSH. The reason unlock is not promptly exposed is that it most often happens after an unexpected failure, so it should be first investigated by an admin that unlocking is safe.

Search

Search

Why Not Integrate `pct stop <containerID> --kill` into the Web GUI?

Lonnie

Renowned Member

fiona

Proxmox Staff Member

Lonnie

Renowned Member

BobhWasatch

Famous Member

johnha

Well-Known Member

fiona

Proxmox Staff Member