Proxmox 5.4.6 CT or kernel limits - issue/bug on high CT number?

avladulescu · Jun 19, 2019

Hello guys,

I have a dell 2950 server with one 120 ssd drive and a 1Tb 8 drives array in raid50 for local storage 64GB of RAM and 2xX5460 . I have just installed the latest version of pve in the attempt to run a test environment for client api calls emulation on a software development project.

There is no outside storage, everything is being run from the local storage I just specify and this is no cluster install, just a plain server setup.

So, for this test we need to run around 140-200 CT on this setup, with the exact same configuration, using the bionic LTS latest install we have downloaded from the repository.

I have created one template, which after I have set it up I have cloned it up to 139 CT time/instances.

The issue really arises after i try to start all CT, which fail after just about 105~111 CT running with the following errors while trying to start the rest up to 140:

unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 213
command 'systemctl start pve-container@213' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
starting 214
Job for pve-container@214.service failed because the control process exited with error code.
See "systemctl status pve-container@214.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@214' failed: exit code 1
starting 215
Job for pve-container@215.service failed because of unavailable resources or another system error.
See "systemctl status pve-container@215.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@215' failed: exit code 1
starting 216
Job for pve-container@216.service failed because of unavailable resources or another system error.
See "systemctl status pve-container@216.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@216' failed: exit code 1
starting 217
Job for pve-container@217.service failed because the control process exited with error code.
See "systemctl status pve-container@217.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@217' failed: exit code 1
starting 218
Job for pve-container@218.service failed because the control process exited with error code.
See "systemctl status pve-container@218.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@218' failed: exit code 1
starting 219
Job for pve-container@219.service failed because the control process exited with error code.
See "systemctl status pve-container@219.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@219' failed: exit code 1
starting 220
Job for pve-container@220.service failed because the control process exited with error code.
See "systemctl status pve-container@220.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@220' failed: exit code 1
starting 221
Job for pve-container@221.service failed because the control process exited with error code.
See "systemctl status pve-container@221.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@221' failed: exit code 1
starting 222
Job for pve-container@222.service failed because the control process exited with error code.
See "systemctl status pve-container@222.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@222' failed: exit code 1
starting 223
Job for pve-container@223.service failed because the control process exited with error code.
See "systemctl status pve-container@223.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@223' failed: exit code 1
starting 224
Job for pve-container@224.service failed because the control process exited with error code.
See "systemctl status pve-container@224.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@224' failed: exit code 1
starting 225
Job for pve-container@225.service failed because the control process exited with error code.
See "systemctl status pve-container@225.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@225' failed: exit code 1
starting 226
Job for pve-container@226.service failed because the control process exited with error code.
See "systemctl status pve-container@226.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@226' failed: exit code 1
starting 227
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 228
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 229
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 230
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 231
command 'systemctl start pve-container@231' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
starting 232
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 233
Job for pve-container@233.service failed because of unavailable resources or another system error.
See "systemctl status pve-container@233.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@233' failed: exit code 1
starting 234
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 235
Job for pve-container@235.service failed because the control process exited with error code.
See "systemctl status pve-container@235.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@235' failed: exit code 1
starting 236
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
starting 237
Job for pve-container@237.service failed because the control process exited with error code.
See "systemctl status pve-container@237.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@237' failed: exit code 1
starting 238
Job for pve-container@238.service failed because the control process exited with error code.
See "systemctl status pve-container@238.service" and "journalctl -xe" for details.
command 'systemctl start pve-container@238' failed: exit code 1
starting 239
command 'systemctl start pve-container@239' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
starting 240
unable to fork worker - No space left on device at /usr/share/perl5/PVE/RESTEnvironment.pm line 504.
root@pveds01:/etc#
root@pveds01:/etc# systemctl status pve-container@218.service
● pve-container@218.service - PVE LXC Container: 218
Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2019-06-19 23:02:52 EEST; 9s ago
Docs: man:lxc-start
man:lxc
manct
Process: 21134 ExecStart=/usr/bin/lxc-start -n 218 (code=exited, status=1/FAILURE)

Jun 19 23:02:52 pveds01 systemd[1]: Starting PVE LXC Container: 218...
Jun 19 23:02:52 pveds01 lxc-start[21134]: lxc-start: 218: tools/lxc_start.c: main: 330 The container failed to start
Jun 19 23:02:52 pveds01 lxc-start[21134]: lxc-start: 218: tools/lxc_start.c: main: 333 To get more details, run the container in foreground mode
Jun 19 23:02:52 pveds01 lxc-start[21134]: lxc-start: 218: tools/lxc_start.c: main: 336 Additional information can be obtained by setting the --logfile and --logpriority options
Jun 19 23:02:52 pveds01 systemd[1]: pve-container@218.service: Control process exited, code=exited status=1
Jun 19 23:02:52 pveds01 systemd[1]: Failed to start PVE LXC Container: 218.
Jun 19 23:02:52 pveds01 systemd[1]: pve-container@218.service: Unit entered failed state.
Jun 19 23:02:52 pveds01 systemd[1]: pve-container@218.service: Failed with result 'exit-code'.
root@pveds01:/etc#

Syslog says:

Jun 19 22:02:27 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:27 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:27 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:27 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:28 pveds01 pvestatd[1895]: fork failed: No space left on device
Jun 19 22:02:28 pveds01 pct[32359]: <root@pam> starting task UPIDveds01:00002529:00076FAE:5D0A86C4:vzstart:239:root@pam:
Jun 19 22:02:28 pveds01 pct[9513]: starting CT 239: UPIDveds01:00002529:00076FAE:5D0A86C4:vzstart:239:root@pam:
Jun 19 22:02:28 pveds01 pvestatd[1895]: command 'lxc-info -n 202 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pvestatd[1895]: command 'lxc-info -n 134 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pvestatd[1895]: command 'lxc-info -n 110 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pvestatd[1895]: command 'lxc-info -n 129 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pct[9513]: command 'systemctl start pve-container@239' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pvestatd[1895]: command 'lxc-info -n 208 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pct[32359]: <root@pam> end task UPIDveds01:00002529:00076FAE:5D0A86C4:vzstart:239:root@pam: command 'systemctl start pve-container@239' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:28 pveds01 pvedaemon[1915]: command 'lxc-info -n 114 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:29 pveds01 lxcfs[914]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: Success
Jun 19 22:02:29 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:29 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:29 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:29 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:30 pveds01 pvestatd[1895]: fork failed: No space left on device
Jun 19 22:02:31 pveds01 lxcfs[914]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: No such file or directory
Jun 19 22:02:31 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:32 pveds01 pvestatd[1895]: command 'lxc-info -n 178 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:33 pveds01 pvestatd[1895]: status update time (5.680 seconds)
Jun 19 22:02:33 pveds01 lxcfs[914]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: No such file or directory
Jun 19 22:02:33 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:34 pveds01 snmpd[1420]: error on subcontainer 'ia_addr' insert (-1)
Jun 19 22:02:35 pveds01 lxcfs[914]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: Success
Jun 19 22:02:35 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:37 pveds01 pvestatd[1895]: command 'lxc-info -n 181 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:37 pveds01 pvestatd[1895]: command 'lxc-info -n 195 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 125 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvedaemon[1914]: <root@pam> successful auth for user 'root@pam'
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 206 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 149 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 193 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 170 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 198 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 158 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 166 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 173 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 124 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 122 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 152 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 138 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 144 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 197 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:38 pveds01 pvestatd[1895]: command 'lxc-info -n 186 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:39 pveds01 pvedaemon[1915]: command 'lxc-info -n 114 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jun 19 22:02:39 pveds01 lxcfs[914]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: Success
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device
Jun 19 22:02:39 pveds01 lxcfs[914]: fuse: error creating thread: No space left on device

Tried tunning the kernel params, performed a reboot but couldn't make it eat more than 111 CT without making the base plain PVE install start complaining of no space left.

Below is the useful information for the current env.

Anybody that might have an idea on what should be changed/adapted feel free to knock-in, as help is appreciated.

proxmox-ve: 5.4-1 (running kernel: 4.15.18-16-pve)
pve-manager: 5.4-6 (running version: 5.4-6/aa7856c5)
pve-kernel-4.15: 5.4-4
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.13.13-2-pve: 4.13.13-33
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-10
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-52
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-43
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-37
pve-container: 2.0-39
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-52
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

root@pveds01:/etc# df -h
Filesystem Size Used Avail Use% Mounted on
udev 31G 0 31G 0% /dev
tmpfs 6.2G 26M 6.2G 1% /run
/dev/mapper/pve-root 89G 6.6G 83G 8% /
tmpfs 31G 43M 31G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 31G 0 31G 0% /sys/fs/cgroup
/dev/sdb1 5.5T 221G 5.3T 4% /storage
/dev/fuse 30M 48K 30M 1% /etc/pve
tmpfs 6.2G 0 6.2G 0% /run/user/0
root@pveds01:/etc# df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 7.8M 860 7.8M 1% /dev
tmpfs 7.8M 1.7K 7.8M 1% /run
/dev/mapper/pve-root 45M 71K 45M 1% /
tmpfs 7.8M 85 7.8M 1% /dev/shm
tmpfs 7.8M 155 7.8M 1% /run/lock
tmpfs 7.8M 17 7.8M 1% /sys/fs/cgroup
/dev/sdb1 559M 6.0K 559M 1% /storage
/dev/fuse 9.8K 165 9.7K 2% /etc/pve
tmpfs 7.8M 10 7.8M 1% /run/user/0
root@pveds01:/etc#
root@pveds01:/etc# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 20G 0 loop
loop1 7:1 0 20G 0 loop
loop2 7:2 0 20G 0 loop
loop3 7:3 0 20G 0 loop
loop4 7:4 0 20G 0 loop
loop5 7:5 0 20G 0 loop
loop6 7:6 0 20G 0 loop
loop7 7:7 0 20G 0 loop
loop8 7:8 0 20G 0 loop
loop9 7:9 0 20G 0 loop
loop10 7:10 0 20G 0 loop
loop11 7:11 0 20G 0 loop
loop12 7:12 0 20G 0 loop
loop13 7:13 0 20G 0 loop
loop14 7:14 0 20G 0 loop
loop15 7:15 0 20G 0 loop
loop16 7:16 0 20G 0 loop
loop17 7:17 0 20G 0 loop
loop18 7:18 0 20G 0 loop
loop19 7:19 0 20G 0 loop
loop20 7:20 0 20G 0 loop
loop21 7:21 0 20G 0 loop
loop22 7:22 0 20G 0 loop
loop23 7:23 0 20G 0 loop
loop24 7:24 0 20G 0 loop
loop25 7:25 0 20G 0 loop
loop26 7:26 0 20G 0 loop
loop27 7:27 0 20G 0 loop
loop28 7:28 0 20G 0 loop
loop29 7:29 0 20G 0 loop
loop30 7:30 0 20G 0 loop
loop31 7:31 0 20G 0 loop
loop32 7:32 0 20G 0 loop
loop33 7:33 0 20G 0 loop
loop34 7:34 0 20G 0 loop
loop35 7:35 0 20G 0 loop
loop36 7:36 0 20G 0 loop
loop37 7:37 0 20G 0 loop
loop38 7:38 0 20G 0 loop
loop39 7:39 0 20G 0 loop
loop40 7:40 0 20G 0 loop
loop41 7:41 0 20G 0 loop
loop42 7:42 0 20G 0 loop
loop43 7:43 0 20G 0 loop
loop44 7:44 0 20G 0 loop
loop45 7:45 0 20G 0 loop
loop46 7:46 0 20G 0 loop
loop47 7:47 0 20G 0 loop
loop48 7:48 0 20G 0 loop
loop49 7:49 0 20G 0 loop
loop50 7:50 0 20G 0 loop
loop51 7:51 0 20G 0 loop
loop52 7:52 0 20G 0 loop
loop53 7:53 0 20G 0 loop
loop54 7:54 0 20G 0 loop
loop55 7:55 0 20G 0 loop
loop56 7:56 0 20G 0 loop
loop57 7:57 0 20G 0 loop
loop58 7:58 0 20G 0 loop
loop59 7:59 0 20G 0 loop
loop60 7:60 0 20G 0 loop
loop61 7:61 0 20G 0 loop
loop62 7:62 0 20G 0 loop
loop63 7:63 0 20G 0 loop
loop64 7:64 0 20G 0 loop
loop65 7:65 0 20G 0 loop
loop66 7:66 0 20G 0 loop
loop67 7:67 0 20G 0 loop
loop68 7:68 0 20G 0 loop
loop69 7:69 0 20G 0 loop
loop70 7:70 0 20G 0 loop
loop71 7:71 0 20G 0 loop
loop72 7:72 0 20G 0 loop
loop73 7:73 0 20G 0 loop
loop74 7:74 0 20G 0 loop
loop75 7:75 0 20G 0 loop
loop76 7:76 0 20G 0 loop
loop77 7:77 0 20G 0 loop
loop78 7:78 0 20G 0 loop
loop79 7:79 0 20G 0 loop
loop80 7:80 0 20G 0 loop
loop81 7:81 0 20G 0 loop
loop82 7:82 0 20G 0 loop
loop83 7:83 0 20G 0 loop
loop84 7:84 0 20G 0 loop
loop85 7:85 0 20G 0 loop
loop86 7:86 0 20G 0 loop
loop87 7:87 0 20G 0 loop
loop88 7:88 0 20G 0 loop
loop89 7:89 0 20G 0 loop
loop90 7:90 0 20G 0 loop
loop91 7:91 0 20G 0 loop
loop92 7:92 0 20G 0 loop
loop93 7:93 0 20G 0 loop
loop94 7:94 0 20G 0 loop
loop95 7:95 0 20G 0 loop
loop96 7:96 0 20G 0 loop
loop97 7:97 0 20G 0 loop
loop98 7:98 0 20G 0 loop
loop99 7:99 0 20G 0 loop
loop100 7:100 0 20G 0 loop
loop101 7:101 0 20G 0 loop
loop102 7:102 0 20G 0 loop
loop103 7:103 0 20G 0 loop
loop104 7:104 0 20G 0 loop
loop105 7:105 0 20G 0 loop
loop106 7:106 0 20G 0 loop
loop107 7:107 0 20G 0 loop
loop108 7:108 0 20G 0 loop
loop110 7:110 0 20G 0 loop
sda 8:0 0 111.8G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 256M 0 part
└─sda3 8:3 0 111.6G 0 part
├─pve-swap 253:0 0 8G 0 lvm [SWAP]
└─pve-root 253:1 0 88.8G 0 lvm /
sdb 8:16 0 5.5T 0 disk
└─sdb1 8:17 0 5.5T 0 part /storage
sdc 8:32 1 16M 0 disk
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 590M 0 rom
sr2 11:2 1 1024M 0 rom
root@pveds01:/etc#
root@pveds01:/etc# pct config 101
arch: amd64
cores: 2
hostname: 101.int.hosthub.ro
memory: 4096
net0: name=eth0,bridge=vmbr0,hwaddr=76:30:A1:30:03:4B,ip=dhcp,tag=106,type=veth
ostype: ubuntu
rootfs: vol_containers:101/vm-101-disk-0.raw,size=20G
swap: 128
unprivileged: 1
root@pveds01:/etc# free -m
total used free shared buff/cache available
Mem: 63411 10341 25594 295 27476 52064
Swap: 8191 0 8191
root@pveds01:/etc# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 253482
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 253482
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
root@pveds01:/etc# ulimit -Hn
1048576
root@pveds01:/etc# ulimit -Sn
1048576
root@pveds01:/etc# cat /etc/sysctl.conf | grep -v "#" | awk 'NF >0'
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
vm.max_map_count = 262144

chroot inside a container:

root@pveds01:/etc# pct enter 101
root@101:/etc#
root@101:/etc# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 20G 1.5G 18G 8% /
none 492K 0 492K 0% /dev
udev 31G 0 31G 0% /dev/tty
tmpfs 31G 0 31G 0% /dev/shm
tmpfs 31G 112K 31G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 31G 0 31G 0% /sys/fs/cgroup
root@101:/etc# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/loop0 1310720 40362 1270358 4% /
none 8116654 21 8116633 1% /dev
udev 8111439 860 8110579 1% /dev/tty
tmpfs 8116654 1 8116653 1% /dev/shm
tmpfs 8116654 129 8116525 1% /run
tmpfs 8116654 2 8116652 1% /run/lock
tmpfs 8116654 17 8116637 1% /sys/fs/cgroup
root@101:/etc#

Thank you.

Robstarusa · Jun 20, 2019

I'm guessing you are out of loop devices...
Do you run into the same issue if you use local storage with LVM?
What does losetup -f show?

avladulescu · Jun 20, 2019

Thanks for the feedback.

I'm pretty sure that I'm not the only one trying to run high number of CT on a prox box.

Regarding the LVM, I haven't tried that but for plain and quick manipulation of image files I'd rather stay on file storage backend because on the current storage there is also some other that which can't be deleted.

Regards,
Alex

avladulescu · Jun 20, 2019

New update on the progress:

Added on the kernel's grub start args max_loop=255 and started the bare metal system in the try to increase the max loop devices on the system and overcome this limitation.

After, the container starting/stopping operations are taking considerable longer but what changes in dmesg from previous post is the bold message and the overall system impression is laggy, even with no CT/VM running.

[ 5035.290782] IPv6: ADDRCONF(NETDEV_UP): veth207i0: link is not ready
[ 5035.955039] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.
[ 5035.955281] device veth207i0 entered promiscuous mode
[ 5036.118791] eth0: renamed from vethUD8S21
[ 5037.793872] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 5037.793913] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 5038.057448] EXT4-fs warning (device loop107): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.

[...]
root@pveds01:~# dmesg | grep ext4_multi_mount_protect
[ 4378.489333] EXT4-fs warning (device loop93): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4425.517692] EXT4-fs warning (device loop94): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4472.533465] EXT4-fs warning (device loop95): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4519.552257] EXT4-fs warning (device loop96): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4566.639022] EXT4-fs warning (device loop97): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4613.667209] EXT4-fs warning (device loop98): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4660.742909] EXT4-fs warning (device loop99): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4707.917529] EXT4-fs warning (device loop100): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4755.767277] EXT4-fs warning (device loop101): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4802.807550] EXT4-fs warning (device loop102): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4849.857342] EXT4-fs warning (device loop103): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4896.921926] EXT4-fs warning (device loop104): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4944.020764] EXT4-fs warning (device loop105): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 4990.989336] EXT4-fs warning (device loop106): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.
[ 5038.057448] EXT4-fs warning (device loop107): ext4_multi_mount_protect:325: MMP interval 42 higher than expected, please wait.

@ Robstarusa

root@pveds01:~# losetup -f
/dev/loop109
root@pveds01:~#
root@pveds01:~# losetup -a | wc -l
109
root@pveds01:~# pct list | grep running | wc -l
108
root@pveds01:~#

so one orphan, the last one before it started crashing again upon CT start.

I have also try setting options loop max_loop=255 value in modules, reboot the server and try again, but didn't fix the problem nor it let me spawn more CTs.

Any more ideas ?

avladulescu · Jun 20, 2019

Hello,

I am returning with an update.

I have reinstalled another d2950 server which was sitting in the closet, which has a pretty much configuration like the current one, where I have installed the proxmox 5.1-32 version from an old cd-rom I had.

I have not run any kind of package upgrade on the system or any apt-get upgrade and used all current packages included in the iso by the time of that release.

proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
pve-kernel-4.13.13-2-pve: 4.13.13-32
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-18
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-5
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9

Applied the sysctl.conf tunning params described in the first post.

All being said, I have downloaded the ubuntu 16 LTS image as bionic was not release at that time, thus impossible to run or incompatible with the old system without an system upgrade, kept the local-lvm thin provisioning storage as proxmox delivers by default and generate another 140 containers via the cli.

The process of starting all 140 containers with the backing storage on local-lvm thin worked without any issue and all containers were functional.

So, far for anybody else founding himself in this case, should go straight on using LVM and LVM-thin setup and don't count the use of direct directory storage for running a larger number of CTs on the same prox box.

Therefore, so far, not sure how to overcome the limitation of the "no space left on the device" error thrown by direct directory usage which I initially described, thus I think only a prox dev could explain why if the system has enough inodes and space left - starting gets into this race condition while generating loops for each CT, but surely doesn't happens running on LVM.

Hope I shed some light for others readers and save their time for "trial & error" phase.

P.S. I haven't got a ceph setup around to see if this happens also with CTs on ceph, but anybody else running high volumes clusters might contribute to this post and inform us about the max average CTs it managed to run on one standing node

Cheers,
Alex

dietmar · Jun 20, 2019

You should use lvm-thin on zfs to avoid loop devices...

fabian · Jun 21, 2019

dietmar said:
You should use lvm-thin on zfs to avoid loop devices...

just to prevent confusion: lvm-thin OR zfs, not ON

avladulescu · Jun 21, 2019

Thanks for sharing, what about ceph?

Is the loop issue present on ceph too?

Alex.

Search

Search

Proxmox 5.4.6 CT or kernel limits - issue/bug on high CT number?

avladulescu

Renowned Member

Robstarusa

Renowned Member

avladulescu

Renowned Member

avladulescu

Renowned Member

avladulescu

Renowned Member

dietmar

Proxmox Staff Member

fabian

Proxmox Staff Member

avladulescu

Renowned Member