i/o disk limit

just checked on 2 containers and after +1h limit is still there and working.

Code:
Tue Nov 15 15:35:51 CET 2016
root@dreadnought:/# cat /sys/fs/cgroup/blkio/lxc/107/blkio.throttle.read_bps_device
251:19 50000
root@dreadnought:/# stat /sys/fs/cgroup/blkio/lxc/107/blkio.throttle.read_bps_device
  File: ‘/sys/fs/cgroup/blkio/lxc/107/blkio.throttle.read_bps_device’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 1ah/26d Inode: 201         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2016-11-15 14:31:23.386872849 +0100
Modify: 2016-11-15 14:31:23.386872849 +0100
Change: 2016-11-15 14:31:23.386872849 +0100
Birth: -
The problem happens when you turn off and on a container, the folder
/sys/fs/cgroup/blkio/lxc/containerX is deleted and all its files inside
 
This is normal for cgroups, they are created dynamically and parameters are passed at container start (mem,cpuset etc). You have to script this yourself or wait for (if) proxmox implement this.
 
This is normal for cgroups, they are created dynamically and parameters are passed at container start (mem,cpuset etc). You have to script this yourself or wait for (if) proxmox implement this.
I have tested without turning it off and the write and read i/o disk continues the same:

root@ns525472:/sys/fs/cgroup/blkio/lxc/100# cat blkio.throttle.write_bps_device
251:0 500
root@ns525472:/sys/fs/cgroup/blkio/lxc/100# cat blkio.throttle.read_bps_device
251:0 500
root@ns525472:/sys/fs/cgroup/blkio/lxc/100#

and now in the vps the i/o test:

root@prueba:/dev# cat /dev/urandom | pv -c - >/bigfile
78.9MiB 0:00:05 [15.7MiB/s] [ <=>
 
No, it's working ok. What you see is read/writes from/to cache and blkio.throttle.read/write_bps_device is limiting physical access to device not cache, if your data is in cache then you get the full speed. Flush cache and then read the file again and you will see that limit is in place. The problem of cache is more apparent with writing because all writes go straight to cache so from user perspective limit is not working. read/write cache is on proxmox machine so if you have 256GB of ram there are plenty of space to cache things so is hard to see that writes are limited. You can't limit cache size per container. You can tune kernel in how it handle cache, but in the end the cache has to be synced to device and then the container will be crippled.

proof:
firs time read
dd if=/root/yad_0.20.3-1_i386.deb of=/root/test.file
340+1 przeczytanych recordów
340+1 zapisanych recordów
skopiowane 174518 bajtów (175 kB), 3,60847 s, 48,4 kB/s

but second attempt from is from cache not the device:
skopiowane 174518 bajtów (175 kB), 0,00127271 s, 137 MB/s


just apply limit and look at for your container:
watch -n1 -d cat /sys/fs/cgroup/blkio/lxc/[container_id]/blkio.throttle.io_service_bytes

execute sync in container and watch numbers and you will see that physical writes to device are obeying the limit.

Copy lets say 100MB of data and then just execute sync in container and it will take for ages ( 50kb/s phisical write speed to device)
 
1.first read:
Code:
dd if=test.long of=test.long2
skopiowane 9682944 bajty (9,7 MB), 246,865 s, 39,2 kB/s

2.second read (file is cached)
Code:
dd if=test.long of=test.long2
skopiowane 9682944 bajty (9,7 MB), 0,0677105 s, 143 MB/s

3.drop cache and read file again
Code:
echo 1 > /proc/sys/vm/drop_caches
dd if=test.long of=test.long2
skopiowane 9682944 bajty (9,7 MB), 212,785 s, 45,5 kB/s

bps_limit.png
 
Last edited:
1.first read:
Code:
dd if=test.long of=test.long2
skopiowane 9682944 bajty (9,7 MB), 246,865 s, 39,2 kB/s

2.second read (file is cached)
Code:
dd if=test.long of=test.long2
skopiowane 9682944 bajty (9,7 MB), 0,0677105 s, 143 MB/s

3.drop cache and read file again
Code:
echo 1 > /proc/sys/vm/drop_caches
dd if=test.long of=test.long2
skopiowane 9682944 bajty (9,7 MB), 212,785 s, 45,5 kB/s

View attachment 4419
First of all I have flushed cache on first step:
root@ns525472:/home# echo 1 > /proc/sys/vm/drop_caches
and then:
with this set: 251:0 500
I get this:

root@prueba:/home# dd if=ubuntu-16.10-server-amd64.iso of=/root/ubuntu.iso
1368064+0 records in
1368064+0 records out
700448768 bytes (700 MB, 668 MiB) copied, 2.73712 s, 256 MB/s
root@prueba:/home#
 
@dietmar, I was wondering why some people have this structure of disks:
sda 8:0 0 447,1G 0 disk
├─sda1 8:1 0 19,5G 0 part /
├─sda2 8:2 0 1023M 0 part [SWAP]
├─sda3 8:3 0 1K 0 part
└─sda5 8:5 0 426,6G 0 part
└─pve-data 251:0 0 422,6G 0 lvm /var/lib/vz
sdb 8:16 0 447,1G 0 disk
loop0 7:0 0 8G 0 loop

With the loop refering to the containers, and other like this:

─pve-data_tdata 251:10 0 821.9G 0 lvm
└─pve-data-tpool 251:11 0 821.9G 0 lvm
├─pve-data 251:12 0 821.9G 0 lvm
├─pve-vm--150--disk--1 251:13 0 40G 0 lvm
├─pve-vm--600251--disk--1 251:14 0 30G 0 lvm
├─pve-vm--100--disk--1 251:15 0 32G 0 lvm
├─pve-vm--4040020--disk--1 251:16 0 30G 0 lvm
 
systems originally installed with older version will have different storage setups.

the I/O throttling only works for real or logical block devices - so bind mounts and ZFS are not working atm, ceph, lvm, raw images should work
 
Hi,
I write here as I have a related questions. Let first clear few of the existing above. There is a way to have persistence of the config (documented enough in https://pve.proxmox.com/wiki/Manual:_pct.conf):
> It is also possible to add low-level LXC-style configuration directly, for example:
> lxc.init_cmd: /sbin/my_own_init

The correct way to add resource limits to LXC containers in the Proxmox PVE is (was) to add to /etc/pve/lxc/CID.conf (for example /etc/pve/lxc/101.conf: ):
lxc.cgroup.blkio.throttle.read_iops_device: 251:7 60
lxc.cgroup.blkio.throttle.write_iops_device: 251:7 60

This will add (lxc will add it on start of the container) the above values to /sys/fs/cgroups/blkio/lxc/101/blkio.throttle.read_iops_device and to /sys/fs/cgroups/blkio/lxc/101/blkio.throttle.write_iops_device
Just browse the /sys/fs/cgroups or search for related docs. For all these to work there is one MANDATORY requirement:
The pid of the process(es) involved should be present in /sys/fs/cgroups/blkio/lxc/101/tasks and/or /sys/fs/cgroups/blkio/lxc/101/cgroup.procs. The difference is that cgroup.procs contains PIDs, where tasks contains as well the thread IDs. LXC fill these two files with all PIDs/ThreadIDs of all processes running in the container (101 in the example). For cgroups to work settings and tasks should be present in one dir (one cgroup). In other words: cgroups applies limit's settings for all tasks listed in the same dir.

Unfortunately the Proxmox team decide to break this (I have no explanation why) with lxc-pve_2.0.5-2_amd64.deb and later versions. Up to version lxc-pve_2.0.5-1_amd64.deb the above functionality works fine. Starting with lxc-pve_2.0.5-2_amd64.deb they put one additional name space and now all goes to /sys/fs/cgroups/blkio/lxc/101/ns dir. Unfortunately they broke the compatibility with their own documentation and now https://pve.proxmox.com/wiki/Manual:_pct.conf is misleading. Now if you add the above two lines to /etc/pve/lxc/101.conf the LXC will add the settings in /sys/fs/cgroups/blkio/lxc/101 on startup, but later LXC will add PIDs and thread IDs to /sys/fs/cgroups/blkio/lxc/101/ns. As effect the settings (limits) are in one cgroup level in the hierarchy, when at the same time pids are in another level (in ns subdir). You still should be able to add the settings manually (echo 251:7 60 > /sys/fs/cgroups/blkio/lxc/101/ns/blkio.throttle.read_iops_device), but this lack persistence.

I will be really grateful to hear some comments from the Proxmox team.
 
Throttling doesn't handle hierarchies (yet? [1]). This also means that your container can always escape your limits by simply creating a subdirectory for themselves as well.
The reason why we added the ns/ subdirectory in the first place was a similar one: with the introduction of cgroup namespaces the containers received full read+write access to their root cgroup effectively allowing them to change any limit previously imposed by the host.
This is also part of the reason why the blkio values are not accessible via pve's own config options and the GUI.

[1] `Hierarchical Cgroups` section of https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt
 
Thank you for the fast answer. Partially understood.

As I understood you, the right way is inside the container. When in container I can do:
echo "251:7 60" > /sys/fs/cgroups/blkio/blkio.throttle.read_iops_device
And it is the same as /sys/fs/cgroups/blkio/lxc/101/ns/blkio.throttle.read_iops_device in the host.
To have the settings persistent I need an "inside the container" solution: rc.local (deprecated), systemd (can be) or custom.

Is it possible for the admin to set some limits that container can not overwrite. I will ask with example:
On host: echo "251:7 60" > /sys/fs/cgroups/blkio/lxc/101/blkio.throttle.read_iops_device
On container: echo "251:7 80" > /sys/fs/cgroups/blkio/blkio.throttle.read_iops_device
Does it will work with 80 or cgroups like you set (with ns/) will limit the container to 60 read IOPS and from inside the container I can only go below 60?
 
IO limits dos not work on LVM-Thin.
20170506_JbvEuc3c.png

20170506_fWw3RPhQ.png
 
Last edited:
It appears that the IO Disk Limiting function is found on a guest VM hard drives settings under Advanced and in the "IO thread" section:
iothread.png

How an we determine how many MB/s each of these is using currently through a graph? In the overall PVE summary there are graphs like this:
iodelay.png

But is this number "41.64" measured in MB/s? Or something else?

We have to have some baseline to judge from, so the question is: How do we determine what rates we could or should set these values to for the IO thread limits?
 
Is there really no way to limit IO on LXC containers that are on a ZFS file system? If so then to use proxmox to host client LXC containers, you would need to either dedicate a drive per customer, or just not use proxmox for hosting customer LXC containers?

We have been using proxmox for about 2 years and absolutely love it! but we have been recently running into issues where a couple of the LXC containers on some of our nodes will do some heavy database reading and slow down MANY of the other LXC containers that share the same ZFS raid array of disks.

ALSO! if the offending LXC container hammers the disks hard enough during a replication event, then the event fails: (if this happens you have to manually intervene to restore replication by removing replication for that container and then re-adding it)

Bash:
end replication job with error: command 'set -o pipefail && pvesm export local-zfs:subvol-107-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_107-1_1638063540__ | /usr/bin/cstream -t 50000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve3' root@10.9.0.7 -- pvesm import local-zfs:subvol-107-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_107-1_1638063540__ -allow-rename 0' failed: exit code 255

I have not used anything other than proxmox for VMs/containers so I am unsure if the same problems exist for example in ESXI, I would assume if the problem is the underlying technology then you would have the exact same problem in ESXI, which leads me back to not using large ZFS pool of disks and instead have small pools that you allocate per customer, this really does complicate things...

Appreciate any feedback from people using proxmox in production and how you have overcome the issue of LXC containers saturating your IO on your ZFS pool and causing issues for other LXC containers on the same ZFS pool.
 
Last edited:
cgroup based IO limits only work with block devices, not on a per-filesystem level. so yes, you need to use a storage that is blockdev based (LVMThin for example, or Ceph). ZFS as used with containers in PVE is just bind-mounted into the container.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!