[SOLVED] Tasks fail with Too many open files (os error 24)

asvetter

Member
Mar 18, 2022
11
0
6
54
EDIT
Solution is to *not* use virtiofs. Instead hand through the block device and mount inside PBS.
/EDIT


I am running PBS 2.4 with all updates in a VM.
A XFS filesystem is handed through by virtiofs to the VM.

Garbage collection fails with:
Datastore: srvbackups
Garbage collection failed: unexpected error on datastore traversal: Too many open files (os error 24) - "/srv/backups/template/cache"

Pruning fails with:
Job ID: s-fe12c161-ac3c
Datastore: srvbackups
Pruning failed: EMFILE: Too many open files

Backups worked initially, now fail with:
100: 2023-04-30 01:00:03 INFO: Starting Backup of VM 100 (qemu)
100: 2023-04-30 01:00:03 INFO: status = running
100: 2023-04-30 01:00:03 INFO: VM Name: adadmin-wpyf139
100: 2023-04-30 01:00:03 INFO: include disk 'scsi0' 'lvm-hdd:vm-100-disk-1' 100G
100: 2023-04-30 01:00:04 INFO: backup mode: snapshot
100: 2023-04-30 01:00:04 INFO: ionice priority: 7
100: 2023-04-30 01:00:04 INFO: snapshots found (not included into backup)
100: 2023-04-30 01:00:04 INFO: creating Proxmox Backup Server archive 'vm/100/2023-04-29T23:00:03Z'
100: 2023-04-30 01:00:04 INFO: issuing guest-agent 'fs-freeze' command
100: 2023-04-30 01:00:10 INFO: issuing guest-agent 'fs-thaw' command
100: 2023-04-30 01:00:10 INFO: started backup task 'fc6f3b4b-42c4-462d-a1b7-70ff6da9b0b1'
100: 2023-04-30 01:00:10 INFO: resuming VM again
100: 2023-04-30 01:00:10 INFO: scsi0: dirty-bitmap status: OK (18.6 GiB of 100.0 GiB dirty)
100: 2023-04-30 01:00:10 INFO: using fast incremental mode (dirty-bitmap), 18.6 GiB dirty of 100.0 GiB total
100: 2023-04-30 01:00:11 INFO: 0% (68.0 MiB of 18.6 GiB) in 1s, read: 68.0 MiB/s, write: 64.0 MiB/s
100: 2023-04-30 01:00:11 ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'srvbackups' failed for 48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9 - Atomic rename failed for file "/srv/backups/.chunks/48df/48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9" - Too many open files (os error 24)
100: 2023-04-30 01:00:11 INFO: aborting backup job
100: 2023-04-30 01:00:11 INFO: resuming VM again
100: 2023-04-30 01:00:11 ERROR: Backup of VM 100 failed - backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'srvbackups' failed for 48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9 - Atomic rename failed for file "/srv/backups/.chunks/48df/48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9" - Too many open files (os error 24)

And similar for the other VMs.
 
Last edited:
One of the vms has a disk size of 3.7TB.

Do I have too many chunks?

On the host:
/srv/backups/.chunks # find . -type f|wc -l
1749601

In the pbs VM this ends with the error message:
find: ‘./ff35’: Too many open files
find: ‘./ff55’: Too many open files
find: ‘./ff75’: Too many open files
find: ‘./ff95’: Too many open files
find: ‘./ffb5’: Too many open files
find: ‘./ffd5’: Too many open files
find: ‘./fff5’: Too many open files
1486421
 
Last edited:
1. Garbage collection failed: unexpected error on datastore traversal: Too many open files (os error 24) - "/srv/backups/template/cache"

why does GC even see that path? that sounds like you configured your PBS datastore to also contain other data, which is a bad idea!

2. the too many open files could come from virtiofs, PBS normally doesn't keep the files open at all (except a few lock files and files actually being written)

I am not sure virtiofs as layer in your PBS datastore path is a good idea performance wise either..
 
1. I used the the FS for a NFS Storage on PVE before. Ok, not a good idea. So I will start with an empty FS.
2. If virtiofs is not fast enough, would a NFS be ok? Or do I have to give a raw partition to the VM with PBS?
 
best would be to have (fast ;)) local storage, since anything that goes over the network introduces additional latency which hurts random I/O performance (which PBS needs because of how the chunk store and deduplication works).
 
My error: I forgot to remove the NFS Storage on PVE. So PVE had two different paths to the same Filesystem: NFS to the host and PBS over virtiofs.

I removed both storage in PVE, and the virtiofs Storage in PBS, cleaned up the FS on the host, and created a new Storage over virtiofs in PBS and then declared it in PVE. Now starting Backups.

Will try more Backups, Pruning, Garbage Collection and report here.
 
With virtiofs:
Backup went fine.
A Garbage Collection during the Backup failed:
Garbage collection failed: chunk iterator on chunk store 'srvbackups' failed - unable to read subdir '772a' - EMFILE: Too many open files

Is that expected?

Now running Garbage Collection after backup finished, same Error:
2023-05-04T15:23:50+02:00: processed 53% (928454 chunks)
2023-05-04T15:23:50+02:00: TASK ERROR: chunk iterator on chunk store 'srvbackups' failed - unable to read subdir '890e' - EMFILE: Too many open files
 
Now passing though the block device to the VM instead of virtiofs:
Garbage collection works.

So my rcommendation is: Do not use virtiofs, better pass through block device.