One Linux Containter suddenly becomes unresponsive, with very high diskread and high host load

joshuapl

Member
Jun 1, 2023
6
1
8
Hi,

I'm quite new to proxmox and since since some time I'm experiencing quite irritating issue with one of my lxc's.
I'm running proxmox-ve: 7.4-1 (running kernel: 6.2.11-2-pve).

The container is a single python website using uwsgi.
The symptoms are always the same:
- happens in random moments, usually at night whith barely any network traffic
- container is completely unersponsive - can't perform pct enter <vmid> or even pct shutdown / pct stop
- host's CPU usage drops, while IO Delay raises (like CPU usage 20->10%, while IO Delay goes 5%->20%)
- hosts's load rises dramatically - usually load is around 5, when this happens it raises by 20 every half hour
- container's CPU Usage, which is usually 60-70% drops to around 10%
- crazy high disk io (diskread) - usually around 1M goes up to 1.4G
- nothing special in proxmox's syslog, except "pveproxy: detected empty handle" messages during the issue

Only thing that helps then it is happening is log in to UI in the browser and execute stop from the shutdown menu.

After such event I can't even identify any causes in containter's system logs - no special error messages, network traffic, etc.

I'm not using ZFS (only threads concerning high diskio I found have something to do with ZFS), just linux LVM.
Other containers/vm's don't get significantly affected.

Here's my questions:
1. How can I see more details on what's actually happening - like checking which process in container is responsible, list the processess, perform a strace on a process or whole container? This is crucial, since without that I won't be able to identify the cause. Any possibility to
2. How to properly force stop container from CLI? pct stop <vmid> --force will do the same as selecting "STOP" from shutdown menu or should I use pct shutdown <vmid> --forceStop?

Thanks in advance!