unable to fork worker - No space left on device at

ferdl9999

Renowned Member
Nov 7, 2015
52
0
73
Hello,

i have the problem on my new pve 5 host system that sometimes the message "unable to fork worker - No space left on device" appears.
But i have over 100GB left on my storage. Sometimes i cannot login via ssh and within the lxc containers this message also appears.

I have about 50 lxc containers and qm's running. Is there anywhere a limitation? I cannot explain this error message
 
It could be that you run into process or open file limits. Have you checked /var/log/syslog or dmesg at the time of the problem.
 
I checked and i get this message:
Code:
Jul 14 18:41:00 route03 systemd[1]: Starting Proxmox VE replication runner...
Jul 14 18:41:00 route03 pvesr[2137]: Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 397.
Jul 14 18:41:00 route03 systemd[1]: pvesr.service: Main process exited, code=exited, status=24/n/a
Jul 14 18:41:00 route03 systemd[1]: Failed to start Proxmox VE replication runner.
Jul 14 18:41:00 route03 systemd[1]: pvesr.service: Unit entered failed state.
Jul 14 18:41:00 route03 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Jul 14 18:41:40 route03 rrdcached[1756]: flushing old values
Jul 14 18:41:40 route03 rrdcached[1756]: rotating journals
Jul 14 18:41:40 route03 rrdcached[1756]: started new journal /var/lib/rrdcached/journal/rrd.journal.1563122500.622559
Jul 14 18:41:40 route03 rrdcached[1756]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1563115300.622570


How can i increase the "open file limit"? Is is possible to do that on a running system?
 
Last edited:
Okay, i increased the limit with the following two commands:
Code:
echo fs.inotify.max_user_instances=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p

I hope it is working now
 
Last edited:
No, the settings i made did not help.
A certain time the problem was gone, but now it's again happening:

Code:
Jul 14 20:06:00 route03 systemd[1]: Starting Proxmox VE replication runner...
Jul 14 20:06:01 route03 systemd[1]: Started Proxmox VE replication runner.
Jul 14 20:07:00 route03 systemd[1]: Starting Proxmox VE replication runner...
Jul 14 20:07:01 route03 systemd[1]: Started Proxmox VE replication runner.
Jul 14 20:08:00 route03 systemd[1]: Starting Proxmox VE replication runner...
Jul 14 20:08:00 route03 systemd[1]: Started Proxmox VE replication runner.
Jul 14 20:08:33 route03 pvestatd[2106]: fork failed: No space left on device
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 154 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 146 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 207 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: fork failed: No space left on device
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 125 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 189 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 200 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 123 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 197 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 166 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.
Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 157 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.



I also checked the space with df -h:
Code:
root@route03:/var/log# df -h
Dateisystem    Größe Benutzt Verf. Verw% Eingehängt auf
udev             63G       0   63G    0% /dev
tmpfs            13G     27M   13G    1% /run
/dev/sda2       411G    230G  161G   59% /
tmpfs            63G     43M   63G    1% /dev/shm
tmpfs           5,0M       0  5,0M    0% /run/lock
tmpfs            63G       0   63G    0% /sys/fs/cgroup
/dev/sdb1       440G    335G   83G   81% /ssd2
/dev/sda1       510M    136K  510M    1% /boot/efi
/dev/fuse        30M     28K   30M    1% /etc/pve
 
Last edited:
Here is the output of the command stat -f /:
Code:
  Datei: "/"
    ID: 3fc16bd59fee11b2 Namenslänge: 255     Typ: ext2/ext3
Blockgröße: 4096       Fundamentale Blockgröße: 4096
Blöcke: Gesamt: 107651314  Frei: 47621821   Verfügbar: 42147684
Inodes: Gesamt: 27353088   Frei: 27305667



And here is the output of the command df -i:
Code:
udev           16484024      675 16483349    1% /dev
tmpfs          16489286     1534 16487752    1% /run
/dev/sda2      27353088    47421 27305667    1% /
tmpfs          16489286       85 16489201    1% /dev/shm
tmpfs          16489286       56 16489230    1% /run/lock
tmpfs          16489286       17 16489269    1% /sys/fs/cgroup
/dev/sdb1      29310976       42 29310934    1% /ssd2
/dev/sda1             0        0        0     - /boot/efi
/dev/fuse         10000       71     9929    1% /etc/pve

Please help me, i don't find any solution in the internet
 
Last edited:
Jul 14 18:41:00 route03 pvesr[2137]: Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 397.

That's what I figured. I ran into this problem also this week and opened a bug report for it.

Jul 14 20:08:33 route03 pvestatd[2106]: command 'lxc-info -n 157 -p' failed: open3: fork failed: No space left on device at /usr/share/perl5/PVE/Tools.pm line 429.

This implies that you have a problem opening new processes. Please post the output of ulimit -a and ps auxf | wc -l (please in CODE tags for better readability)
 
Okay, but i think i'm not affected by this bug because i'm running the PVE5 stable, your bug report was concering PVE6 beta.

Here is the output of the two commands (i also edited my posts above and put the console output in CODE-tags).
But i need to note that the problem is currently not occuring.

Code:
root@route03:~# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514917
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 514917
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

and here the output of the command. But it gave me just a few different process id's, i entered the command a few times
Code:
root@route03:~# ps auxf | wc -l
1753
root@route03:~# ps auxf | wc -l
1755
root@route03:~# ps auxf | wc -l
1751
root@route03:~# ps auxf | wc -l
1749
root@route03:~# ps auxf | wc -l
1752
root@route03:~# ps auxf | wc -l
1749
root@route03:~# ps auxf | wc -l
1750


Thanks for your help!
 
I currently made these setting in the /etc/sysctl.conf regarding to this thread.
Code:
fs.inotify.max_queued_events = 1048576
fs.inotify.max_user_instances = 1048576
fs.inotify.max_user_watches = 1048576
vm.max_map_count = 262144


In this thread it is also mentioned to make these changes on the /etc/security/limits.conf file:
Code:
*       soft    nofile  1048576 unset
*       hard    nofile  1048576 unset
root    soft    nofile  1048576 unset
root    hard    nofile  1048576 unset
*       soft    memlock 1048576 unset
*       hard    memlock 1048576 unset


Should i make these changes too? If so, how can i apply these changes on a running system without rebooting it?
 
Last edited:
Okay, but i think i'm not affected by this bug because i'm running the PVE5 stable, your bug report was concering PVE6 beta.

Yes, but the inotify limit is the same. You will have problems in your containers if you run out of inotify space.

Your other settings just look fine. The total number of processes is well below the limit of your already increased max user processes. The sysctl settings are also fine, whereas you will have a different error message if you run out of max_map_count, but it'll occur, I also had it on big (>256 GB RAM) servers.

But i need to note that the problem is currently not occuring.

We have to wait then. In the meantime, I recommend to install telegraf on your node and store your metrics offsite. The tricky part now is to decide what metrics to acquire. Also read about lsof. What about your main memory usage, please also post free in case of the error.
 
Thanks for your answer!

Okay, i will wait till the problem is ocurring again and report the output of the commands. But maybe (*hopefully*) the changes i already made will make the problem not occur again.

But should i increase the open file limit? Because it is currently set to 1024 (ulimit -n).
And how can i change this on a running system without a reboot?

Should i make the changes in the /etc/security/limits.conf file which i mentioned in my last post?
 
Last edited:
Hello again, now i get these error codes.
Code:
Jul 16 15:51:47 route03 pveproxy[32326]: failed to accept connection: Too many open files in system
Jul 16 15:57:33 route03 pveproxy[23834]: Can't opendir(/usr/share/libpve-http-server-perl/js): Too many open files in system#012 at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1747.
Jul 16 15:57:33 route03 pveproxy[23832]: Unable to create new inotify object: Too many open files in system at /usr/share/perl5/PVE/INotify.pm line 397.
Jul 16 15:57:33 route03 pveproxy[23834]: Can't opendir(/usr/share/libpve-http-server-perl/fonts): Too many open files in system#012 at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1747.
Jul 16 15:57:33 route03 pveproxy[23834]: Unable to create new inotify object: Too many open files in system at /usr/share/perl5/PVE/INotify.pm line 397.


But the failure "No space left on device" didn't repeat yet.
Do i need to restart the system to make the changes have an effect?
 
Last edited:
The error is still happening, i don't know what else i can do...


Code:
root@route03:~# ps auxf | wc -l
1899
root@route03:~# df -h
Dateisystem                                Größe Benutzt Verf. Verw% Eingehängt auf
udev                                         63G       0   63G    0% /dev
tmpfs                                        13G     43M   13G    1% /run
/dev/sda2                                   411G    265G  126G   68% /
tmpfs                                        63G     46M   63G    1% /dev/shm
tmpfs                                       5,0M       0  5,0M    0% /run/lock
tmpfs                                        63G       0   63G    0% /sys/fs/cgroup
/dev/sda1                                   510M    136K  510M    1% /boot/efi
/dev/sdb1                                   440G    391G   27G   94% /ssd2
/dev/fuse                                    30M     28K   30M    1% /etc/pve
 
Here is the output of the command systemctl show pveproxy.service | grep Limit (but the error is currently not occuring):

Code:
root@route03:~# systemctl show pveproxy.service | grep Limit
MemoryLimit=18446744073709551615
LimitCPU=18446744073709551615
LimitCPUSoft=18446744073709551615
LimitFSIZE=18446744073709551615
LimitFSIZESoft=18446744073709551615
LimitDATA=18446744073709551615
LimitDATASoft=18446744073709551615
LimitSTACK=18446744073709551615
LimitSTACKSoft=8388608
LimitCORE=18446744073709551615
LimitCORESoft=0
LimitRSS=18446744073709551615
LimitRSSSoft=18446744073709551615
LimitNOFILE=4096
LimitNOFILESoft=1024
LimitAS=18446744073709551615
LimitASSoft=18446744073709551615
LimitNPROC=514917
LimitNPROCSoft=514917
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=18446744073709551615
LimitLOCKSSoft=18446744073709551615
LimitSIGPENDING=514917
LimitSIGPENDINGSoft=514917
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=18446744073709551615
LimitRTTIMESoft=18446744073709551615
StartLimitIntervalSec=10000000
StartLimitBurst=5
StartLimitAction=none
root@route03:~#
 
Seems systemd is ignoring the limits you set -- a somewhat known bug between legacy unix, PAM limits, and systemd. See the entries:
Code:
LimitNOFILE=4096
LimitNOFILESoft=1024

I suggest editing /etc/systemd/system.conf , scroll down, uncomment and change #DefaultLimitNOFILE= to match your limits.conf. (Ignore DefaultLimitNOFILESoft=)

EDIT: Sorry, The resource limit may be specified in two formats, value to set soft and hard limits to the same value, or soft:hard to set both limits individually.

You should be able to activate the change with:
Code:
systemctl daemon-reexec
or reboot -- YMMV. Hope that helps.
 
Last edited:
Thanks, i edited the file and made the changes active with the command you've written.

The output of systemctl show pveproxy.service | grep Limit is now the following:
Code:
root@route03:~# systemctl show pveproxy.service | grep Limit
MemoryLimit=18446744073709551615
LimitCPU=18446744073709551615
LimitCPUSoft=18446744073709551615
LimitFSIZE=18446744073709551615
LimitFSIZESoft=18446744073709551615
LimitDATA=18446744073709551615
LimitDATASoft=18446744073709551615
LimitSTACK=18446744073709551615
LimitSTACKSoft=8388608
LimitCORE=18446744073709551615
LimitCORESoft=0
LimitRSS=18446744073709551615
LimitRSSSoft=18446744073709551615
LimitNOFILE=1048576
LimitNOFILESoft=1048576
LimitAS=18446744073709551615
LimitASSoft=18446744073709551615
LimitNPROC=514917
LimitNPROCSoft=514917
LimitMEMLOCK=65536
LimitMEMLOCKSoft=65536
LimitLOCKS=18446744073709551615
LimitLOCKSSoft=18446744073709551615
LimitSIGPENDING=514917
LimitSIGPENDINGSoft=514917
LimitMSGQUEUE=819200
LimitMSGQUEUESoft=819200
LimitNICE=0
LimitNICESoft=0
LimitRTPRIO=0
LimitRTPRIOSoft=0
LimitRTTIME=18446744073709551615
LimitRTTIMESoft=18446744073709551615
StartLimitIntervalSec=10000000
StartLimitBurst=5
StartLimitAction=none


I hope it helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!