[SOLVED] Tasks fail with Too many open files (os error 24)

asvetter

Member
Mar 18, 2022
11
0
6
53
EDIT
Solution is to *not* use virtiofs. Instead hand through the block device and mount inside PBS.
/EDIT


I am running PBS 2.4 with all updates in a VM.
A XFS filesystem is handed through by virtiofs to the VM.

Garbage collection fails with:
Datastore: srvbackups
Garbage collection failed: unexpected error on datastore traversal: Too many open files (os error 24) - "/srv/backups/template/cache"

Pruning fails with:
Job ID: s-fe12c161-ac3c
Datastore: srvbackups
Pruning failed: EMFILE: Too many open files

Backups worked initially, now fail with:
100: 2023-04-30 01:00:03 INFO: Starting Backup of VM 100 (qemu)
100: 2023-04-30 01:00:03 INFO: status = running
100: 2023-04-30 01:00:03 INFO: VM Name: adadmin-wpyf139
100: 2023-04-30 01:00:03 INFO: include disk 'scsi0' 'lvm-hdd:vm-100-disk-1' 100G
100: 2023-04-30 01:00:04 INFO: backup mode: snapshot
100: 2023-04-30 01:00:04 INFO: ionice priority: 7
100: 2023-04-30 01:00:04 INFO: snapshots found (not included into backup)
100: 2023-04-30 01:00:04 INFO: creating Proxmox Backup Server archive 'vm/100/2023-04-29T23:00:03Z'
100: 2023-04-30 01:00:04 INFO: issuing guest-agent 'fs-freeze' command
100: 2023-04-30 01:00:10 INFO: issuing guest-agent 'fs-thaw' command
100: 2023-04-30 01:00:10 INFO: started backup task 'fc6f3b4b-42c4-462d-a1b7-70ff6da9b0b1'
100: 2023-04-30 01:00:10 INFO: resuming VM again
100: 2023-04-30 01:00:10 INFO: scsi0: dirty-bitmap status: OK (18.6 GiB of 100.0 GiB dirty)
100: 2023-04-30 01:00:10 INFO: using fast incremental mode (dirty-bitmap), 18.6 GiB dirty of 100.0 GiB total
100: 2023-04-30 01:00:11 INFO: 0% (68.0 MiB of 18.6 GiB) in 1s, read: 68.0 MiB/s, write: 64.0 MiB/s
100: 2023-04-30 01:00:11 ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'srvbackups' failed for 48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9 - Atomic rename failed for file "/srv/backups/.chunks/48df/48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9" - Too many open files (os error 24)
100: 2023-04-30 01:00:11 INFO: aborting backup job
100: 2023-04-30 01:00:11 INFO: resuming VM again
100: 2023-04-30 01:00:11 ERROR: Backup of VM 100 failed - backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'srvbackups' failed for 48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9 - Atomic rename failed for file "/srv/backups/.chunks/48df/48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9" - Too many open files (os error 24)

And similar for the other VMs.
 
Last edited:
One of the vms has a disk size of 3.7TB.

Do I have too many chunks?

On the host:
/srv/backups/.chunks # find . -type f|wc -l
1749601

In the pbs VM this ends with the error message:
find: ‘./ff35’: Too many open files
find: ‘./ff55’: Too many open files
find: ‘./ff75’: Too many open files
find: ‘./ff95’: Too many open files
find: ‘./ffb5’: Too many open files
find: ‘./ffd5’: Too many open files
find: ‘./fff5’: Too many open files
1486421
 
Last edited:
1. Garbage collection failed: unexpected error on datastore traversal: Too many open files (os error 24) - "/srv/backups/template/cache"

why does GC even see that path? that sounds like you configured your PBS datastore to also contain other data, which is a bad idea!

2. the too many open files could come from virtiofs, PBS normally doesn't keep the files open at all (except a few lock files and files actually being written)

I am not sure virtiofs as layer in your PBS datastore path is a good idea performance wise either..
 
1. I used the the FS for a NFS Storage on PVE before. Ok, not a good idea. So I will start with an empty FS.
2. If virtiofs is not fast enough, would a NFS be ok? Or do I have to give a raw partition to the VM with PBS?
 
best would be to have (fast ;)) local storage, since anything that goes over the network introduces additional latency which hurts random I/O performance (which PBS needs because of how the chunk store and deduplication works).
 
My error: I forgot to remove the NFS Storage on PVE. So PVE had two different paths to the same Filesystem: NFS to the host and PBS over virtiofs.

I removed both storage in PVE, and the virtiofs Storage in PBS, cleaned up the FS on the host, and created a new Storage over virtiofs in PBS and then declared it in PVE. Now starting Backups.

Will try more Backups, Pruning, Garbage Collection and report here.
 
With virtiofs:
Backup went fine.
A Garbage Collection during the Backup failed:
Garbage collection failed: chunk iterator on chunk store 'srvbackups' failed - unable to read subdir '772a' - EMFILE: Too many open files

Is that expected?

Now running Garbage Collection after backup finished, same Error:
2023-05-04T15:23:50+02:00: processed 53% (928454 chunks)
2023-05-04T15:23:50+02:00: TASK ERROR: chunk iterator on chunk store 'srvbackups' failed - unable to read subdir '890e' - EMFILE: Too many open files
 
Now passing though the block device to the VM instead of virtiofs:
Garbage collection works.

So my rcommendation is: Do not use virtiofs, better pass through block device.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!