Server DOS (huge load) on container disk full

jinjer

Renowned Member
Oct 4, 2010
204
7
83
Hello,

I'm experiencing a problem caused by disk quota exhaustion on a container.

Simply put, when the quota is reached inside a container, further writes are be blocked, however, the writing process is not terminated but hangs indefinitely.

When this happens, the load average on the physical server starts to raise indefinitely and after a few minutes goes above 100-150. At this point, if I terminate the container with pct stop, the loadavg will slowly return to normality.

This is a problem because any container that reaches it's quota will kill the server.

A different behavior would be more desirable. A container shall not be able to affect the host server to such an extent as to cause a DOS.

I'm running the following:

Running: pve-manager/4.4-13/7ea56165 (running kernel: 4.4.62-1-pve)
Container: debian 8.8 (amd64)
Storage: local ZFS storage.

# pveversion -v
proxmox-ve: 4.4-88 (running kernel: 4.4.62-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.4.62-1-pve: 4.4.62-88
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-50
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-100
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
fence-agents-pve: not correctly installed
openvswitch-switch: 2.6.0-2
 
This is a problem because any container that reaches it's quota will kill the server.

A huge load will not kill the server. see https://en.wikipedia.org/wiki/Load_(computing)

However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.[1] This, for example, includes processes blocking due to an NFS server failure or too slow media (e.g., USB 1.x storage devices). Such circumstances can result in an elevated load average which does not reflect an actual increase in CPU use (but still gives an idea of how long users have to wait).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!