Garbage collect job fails with "EMFILE: Too many open files"

tbahn · Aug 12, 2024

All of a sudden the garbage collect job started to fail with:

TASK ERROR: update atime failed for chunk/file "/mnt/proxmox-backup/.chunks/69f8/69f82c53064d2eb7795d401a63177ebda5bba778f02db397235e6508a2fee0ed" - EMFILE: Too many open files

This happens after marked 85% (212 of 249 index files) consistently with the same chunk.

Upgraded all packages, rebooted, retried.

Increased number of open files in /etc/security/limits.conf from 2^16-1 (65,535) step by step, doubling each time, to 2^20-1 (1,048,575), each time I rebooted the PBS and retried the job.
* soft nofile 1048575
* hard nofile 1048575

Prune and verify jobs succeed.

stat /mnt/proxmox-backup/.chunks/69f8/69f82c53064d2eb7795d401a63177ebda5bba778f02db397235e6508a2fee0ed works.

fireon · Aug 12, 2024

Hello and welcome to the Proxmox Community,

What is the output of the following commands:

Code:

sysctl -a | grep inotify

sysctl -a | grep file-max

And the now active value:

Code:

cat /proc/sys/fs/file-max

tbahn · Aug 12, 2024

fireon said:
Hello and welcome to the Proxmox Community,

What is the output of the following commands:

Code:

sysctl -a | grep inotify sysctl -a | grep file-max

And the now active value:

Code:

cat /proc/sys/fs/file-max

Hi fireon,

thank you for the warm welcome.

Bash:

2024-08-12T20:06:01+02:00: starting garbage collection on store Unraid-Backup
2024-08-12T20:06:01+02:00: Start GC phase1 (mark used chunks)
2024-08-12T20:06:01+02:00: TASK ERROR: unexpected error on datastore traversal: Too many open files (os error 24) - "/mnt/proxmox-backup/vm"

sysctl -a | grep inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 30025
user.max_inotify_instances = 128
user.max_inotify_watches = 30025

sysctl -a | grep file-max
fs.file-max = 9223372036854775807

cat /proc/sys/fs/file-max
9223372036854775807

And directly after a server restart:

Bash:

sysctl -a | grep inotify
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 30025
user.max_inotify_instances = 128
user.max_inotify_watches = 30025

sysctl -a | grep file-max
fs.file-max = 9223372036854775807

cat /proc/sys/fs/file-max
9223372036854775807

tbahn · Aug 13, 2024

tbahn said:
Hi fireon,

thank you for the warm welcome.
...

Tonight the garbage collection job completed successfully?! Don't know, why. But I hope that this problem simply disappeared as suddenly as it came.

fireon · Aug 13, 2024

tbahn said:
Tonight the garbage collection job completed successfully?! Don't know, why. But I hope that this problem simply disappeared as suddenly as it came.

To be on the safe side, increase the max user watches and instances. I have already configured this as the default on my servers.

Code:

nano /etc/sysctl.d/custom.conf

Code:

fs.inotify.max_user_watches=5242880
fs.inotify.max_user_instances=1024
fs.inotify.max_queued_events = 8388608
user.max_inotify_instances = 1024
user.max_inotify_watches = 5242880

Set manually so that you do not have to reboot:

Code:

sysctl -w fs.inotify.max_user_watches=5242880
sysctl -w fs.inotify.max_user_instances=1024
sysctl -w fs.inotify.max_queued_events = 8388608
sysctl -w user.max_inotify_instances = 1024
sysctl -w user.max_inotify_watches = 5242880

tbahn · Aug 13, 2024

Hallo Fireon,

thank you for the pointer to the inotify API.

For those reading this thread: Have a look at https://man7.org/linux/man-pages/man7/inotify.7.html
"The inotify API provides a mechanism for monitoring filesystem events. Inotify can be used to monitor individual files, or to monitor directories."

In the section "/proc interfaces" is a description of the settings Fireon provided:

/proc/sys/fs/inotify/max_queued_events
... is used ... to set an upper limit on the number of events that can be queued to the corresponding inotify instance. ...

/proc/sys/fs/inotify/max_user_instances
This specifies an upper limit on the number of inotify instances that can be created per real user ID.

/proc/sys/fs/inotify/max_user_watches
This specifies an upper limit on the number of watches that can be created per real user ID.

By experiment, I have found that setting the fs.inotify values causes the user values to have the same value.

Danke, Fireon

tbahn · Aug 15, 2024

It didn't help.

Today the daily garbage collector task failed again with:
Error: unexpected error on datastore traversal: Too many open files (os error 24) - "/mnt/proxmox-backup"

The prune job executed afterwards failed, too:
Pruning failed: EMFILE: Too many open files

Manually starting the garbage collector task fails immediately:
2024-08-15T11:37:11+02:00: starting garbage collection on store Unraid-Backup
2024-08-15T11:37:11+02:00: Start GC phase1 (mark used chunks)

2024-08-15T11:37:13+02:00: TASK ERROR: update atime failed for chunk/file "/mnt/proxmox-backup/.chunks/baf9/baf9db3ad92d8800646788131c75ac5c694aad72f6ff0aeb5cb0163ed81ac526" - EMFILE: Too many open files

lsof | wc -l (list open files, count the rows of the output) returns 2535, which is quite a small number compared to the limits set.

tbahn · Aug 22, 2024

It worked for some days, but today there was again the "Too many open files" error.

I don't see the "system" behind working/not working days.

mow · Aug 22, 2024

Is this pbs inside a VM or container?

tbahn · Aug 22, 2024

mow said:
Is this pbs inside a VM or container?

In my case it's inside a VM.

tbahn · Sep 2, 2024

No one with same problem? It's only one private Proxmox VE-Server backed up by this PBS. How could this be too much?

tbahn · Sep 13, 2024

I reduced the number of old backups to keep. Helped. But only for some days. :-(

jobine23 · Dec 22, 2024

I am currently experiencing the same issue as you. Did you ever get to permanently fix your issue ?

tbahn · Feb 14, 2025

jobine23 said:
I am currently experiencing the same issue as you. Did you ever get to permanently fix your issue ?

Sorry, not really. I dumped the PBS VM and created a new one, pointing to a NFS share on another server.

undefine · May 22, 2025

Hello,

I'm experiencing recurring "Too many open files" (EMFILE, os error 24) errors with Proxmox Backup Server (PBS) version 3.4.1, running on a dedicated Dell R720 bare-metal server (filesystem: XFS).

Error examples:

During backup verification:

can't verify chunk, load failed - store 'backup' [...] - Too many open files (os error 24)
During backup jobs:
POST /fixed_chunk: 400 Bad Request: inserting chunk [...] failed: EMFILE: Too many open files

Repository stats:
Number of namespaces: find /backup/ns/ -type d |wc -l
6024
Number of chunk files: find /backup/.chunks/ -type f |wc -l
36966951

Additional observations:

The errors occur both during garbage collection and regular backup jobs.
There is no clear pattern; the issue appears sporadically.
Disk space, CPU, and RAM usage are within normal limits.

Troubleshooting steps taken:

Verified PBS version and updates.
Checked disk space (no issues).
Monitored system resource usage.

Questions for the community:

Does this scale of stored data (nearly 37 million chunk files) require special tuning?
What are the best practices for configuring PBS for large-scale repositories?
Are there recommended kernel or system limits (e.g., ulimit, fs.file-max) for this scenario?

Any advice on diagnosing or resolving this would be greatly appreciated!

Search

Search

Garbage collect job fails with "EMFILE: Too many open files"

tbahn

New Member

fireon

Distinguished Member

tbahn

New Member

tbahn

New Member

fireon

Distinguished Member

tbahn

New Member

tbahn

New Member

tbahn

New Member

mow

Active Member

tbahn

New Member

tbahn

New Member

tbahn

New Member

jobine23

Active Member

tbahn

New Member

undefine

New Member

Error examples:

Additional observations:

Troubleshooting steps taken:

Questions for the community:

We value your privacy

Garbage collect job fails with "EMFILE: Too many open files"

New Member

Distinguished Member

New Member

New Member

Distinguished Member

New Member

New Member

New Member

Active Member

New Member

New Member

New Member

Active Member

New Member

New Member

Error examples:​

Additional observations:​

Troubleshooting steps taken:​

Questions for the community:​

We value your privacy

Error examples:

Additional observations:

Troubleshooting steps taken:

Questions for the community: