[SOLVED] Tasks fail with Too many open files (os error 24)

asvetter · May 2, 2023

EDIT
Solution is to *not* use virtiofs. Instead hand through the block device and mount inside PBS.
/EDIT

I am running PBS 2.4 with all updates in a VM.
A XFS filesystem is handed through by virtiofs to the VM.

Garbage collection fails with:
Datastore: srvbackups
Garbage collection failed: unexpected error on datastore traversal: Too many open files (os error 24) - "/srv/backups/template/cache"

Pruning fails with:
Job ID: s-fe12c161-ac3c
Datastore: srvbackups
Pruning failed: EMFILE: Too many open files

Backups worked initially, now fail with:
100: 2023-04-30 01:00:03 INFO: Starting Backup of VM 100 (qemu)
100: 2023-04-30 01:00:03 INFO: status = running
100: 2023-04-30 01:00:03 INFO: VM Name: adadmin-wpyf139
100: 2023-04-30 01:00:03 INFO: include disk 'scsi0' 'lvm-hdd:vm-100-disk-1' 100G
100: 2023-04-30 01:00:04 INFO: backup mode: snapshot
100: 2023-04-30 01:00:04 INFO: ionice priority: 7
100: 2023-04-30 01:00:04 INFO: snapshots found (not included into backup)
100: 2023-04-30 01:00:04 INFO: creating Proxmox Backup Server archive 'vm/100/2023-04-29T23:00:03Z'
100: 2023-04-30 01:00:04 INFO: issuing guest-agent 'fs-freeze' command
100: 2023-04-30 01:00:10 INFO: issuing guest-agent 'fs-thaw' command
100: 2023-04-30 01:00:10 INFO: started backup task 'fc6f3b4b-42c4-462d-a1b7-70ff6da9b0b1'
100: 2023-04-30 01:00:10 INFO: resuming VM again
100: 2023-04-30 01:00:10 INFO: scsi0: dirty-bitmap status: OK (18.6 GiB of 100.0 GiB dirty)
100: 2023-04-30 01:00:10 INFO: using fast incremental mode (dirty-bitmap), 18.6 GiB dirty of 100.0 GiB total
100: 2023-04-30 01:00:11 INFO: 0% (68.0 MiB of 18.6 GiB) in 1s, read: 68.0 MiB/s, write: 64.0 MiB/s
100: 2023-04-30 01:00:11 ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'srvbackups' failed for 48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9 - Atomic rename failed for file "/srv/backups/.chunks/48df/48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9" - Too many open files (os error 24)
100: 2023-04-30 01:00:11 INFO: aborting backup job
100: 2023-04-30 01:00:11 INFO: resuming VM again
100: 2023-04-30 01:00:11 ERROR: Backup of VM 100 failed - backup write data failed: command error: write_data upload error: pipelined request failed: inserting chunk on store 'srvbackups' failed for 48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9 - Atomic rename failed for file "/srv/backups/.chunks/48df/48dfa3478846326348d44f48b83851aa7bd819bd35127a16c211eb908ffd2aa9" - Too many open files (os error 24)

And similar for the other VMs.

asvetter · May 2, 2023

One of the vms has a disk size of 3.7TB.

Do I have too many chunks?

On the host:
/srv/backups/.chunks # find . -type f|wc -l
1749601

In the pbs VM this ends with the error message:
find: ‘./ff35’: Too many open files
find: ‘./ff55’: Too many open files
find: ‘./ff75’: Too many open files
find: ‘./ff95’: Too many open files
find: ‘./ffb5’: Too many open files
find: ‘./ffd5’: Too many open files
find: ‘./fff5’: Too many open files
1486421

fabian · May 2, 2023

1. Garbage collection failed: unexpected error on datastore traversal: Too many open files (os error 24) - "/srv/backups/template/cache"

why does GC even see that path? that sounds like you configured your PBS datastore to also contain other data, which is a bad idea!

2. the too many open files could come from virtiofs, PBS normally doesn't keep the files open at all (except a few lock files and files actually being written)

I am not sure virtiofs as layer in your PBS datastore path is a good idea performance wise either..

asvetter · May 2, 2023

1. I used the the FS for a NFS Storage on PVE before. Ok, not a good idea. So I will start with an empty FS.
2. If virtiofs is not fast enough, would a NFS be ok? Or do I have to give a raw partition to the VM with PBS?

fabian · May 2, 2023

best would be to have (fast

) local storage, since anything that goes over the network introduces additional latency which hurts random I/O performance (which PBS needs because of how the chunk store and deduplication works).

asvetter · May 2, 2023

My error: I forgot to remove the NFS Storage on PVE. So PVE had two different paths to the same Filesystem: NFS to the host and PBS over virtiofs.

I removed both storage in PVE, and the virtiofs Storage in PBS, cleaned up the FS on the host, and created a new Storage over virtiofs in PBS and then declared it in PVE. Now starting Backups.

Will try more Backups, Pruning, Garbage Collection and report here.

asvetter · May 4, 2023

With virtiofs:
Backup went fine.
A Garbage Collection during the Backup failed:
Garbage collection failed: chunk iterator on chunk store 'srvbackups' failed - unable to read subdir '772a' - EMFILE: Too many open files

Is that expected?

Now running Garbage Collection after backup finished, same Error:
2023-05-04T15:23:50+02:00: processed 53% (928454 chunks)
2023-05-04T15:23:50+02:00: TASK ERROR: chunk iterator on chunk store 'srvbackups' failed - unable to read subdir '890e' - EMFILE: Too many open files

asvetter · May 4, 2023

Now passing though the block device to the VM instead of virtiofs:
Garbage collection works.

So my rcommendation is: Do not use virtiofs, better pass through block device.

Search

Search

[SOLVED] Tasks fail with Too many open files (os error 24)

asvetter

Member

asvetter

Member

fabian

Proxmox Staff Member

asvetter

Member

fabian

Proxmox Staff Member

asvetter

Member

asvetter

Member

asvetter

Member

We value your privacy