Big 'host' backup failed

kia0

New Member
Mar 6, 2024
8
0
1
Hello
We have PBS 4.1.0 server used for backing up a number of PVE VMs and file-level sets via backup agents. It is used successfully for about a month. Total size of PBS ZFS datastore is 30TB,78% of free space.
Yesterday i try to set up backup script at our mail storage server containing a huge set (~32M, 3.7TB) of files in wide directory tree. The mail server is physical (not VM) under CentOS 9 and have proxmox-backup-client 4.0.15 installed. I make LVM snapshots to have consistent backup, mount the snapshots and run proxmox-backup-client. All backups but one completed successfully. Backup process for one of big "archive" mail partition works for 15h and have failed at ~90%. No any system errors logged on the mail and/or backup servers.
Backup script fragment:

Bash:
  mount -o remount,ro "/var/imap_arch"
  let errors+=$?
  /sbin/lvcreate -ay -L 100G -p r -n "$CYR_ARCH_SNAP_LV" -s "$CYR_ARCH_VG/$CYR_ARCH_LV"
  local result=$?
  mount -o remount,rw "/var/imap_arch"
  let errors+=$?
  if [ $result -ne 0 ]; then
    let errors+=$result
  else
    mount -o ro,norecovery,nouuid "/dev/$CYR_ARCH_VG/$CYR_ARCH_SNAP_LV" "$CYR_ARCH_SNAP_MNT"
    let errors+=$?
  fi
...
if [ $errors -eq 0 ]; then
   export PBS_PASSWORD_FILE
   export PBS_FINGERPRINT

  /usr/bin/proxmox-backup-client backup \
    "cyrus-imap-db.pxar:$CYR_DB_SNAP_MNT" \
    "cyrus-imap-main.pxar:$CYR_MAIN_SNAP_MNT" \
    "cyrus-imap-arch.pxar:$CYR_ARCH_SNAP_MNT" \
    --repository "$PBS_TOKEN@backup.solvo.ru:DATA" -ns MAIL --backup-id "`hostname -s`-cyrus" --backup-type host --change-detection-mode=metadata
  let errors+=$?

Output of script (fragment):
Code:
  Logical volume "imapmain.snap" created.
  Logical volume "imaparch.snap" created.
  Logical volume "imaplib.snap" created.
Starting backup: [MAIL]:host/rhino-cyrus/2026-02-12T21:46:40Z   
Client name: rhino   
Starting backup protocol: Fri Feb 13 00:46:40 2026   
No previous manifest available.   
 ...
Upload directory '/mnt/imaparch.snap' to 'backup-agent@pbs!cyrus-imap@backup.solvo.ru:8007:DATA' as cyrus-imap-arch.mpxar.didx   
processed 4.208 GiB in 1m, uploaded 4.169 GiB
 ...
processed 3.081 TiB in 14h 47m 1s, uploaded 2.754 TiB
processed 3.082 TiB in 14h 48m 1s, uploaded 2.754 TiB
unclosed encoder dropped
closed encoder dropped with state
unfinished encoder state dropped
cyrus-imap-arch.ppxar: had to backup 2.754 TiB of 3.082 TiB (compressed 1.403 TiB) in 53310.74 s (average 54.177 MiB/s)
cyrus-imap-arch.ppxar: backup was done incrementally, reused 335.65 GiB (10.6%)
Error: upload failed: error at "imap/V/user/nladokhin/Support/Bugs"

The failed backup directory looks normal in the native LV filesystem. Snapshot LVM was removed but no any FS (xfs) errors logged so it seems FS state is not the cause. Yes, the directory is big enough containing 265123 files. Can it be a problem for PBS archive format or for file change detection algorithm?
Or any ideas more?
 
An addition:
It looks like client-side error, not PBS itself
At PBS system log there is just one line:

Feb 13 15:45:37 backup proxmox-backup-proxy[999]: TASK ERROR: backup ended but finished state is not set.
 
The failed backup directory looks normal in the native LV filesystem. Snapshot LVM was removed but no any FS (xfs) errors logged so it seems FS state is not the cause. Yes, the directory is big enough containing 265123 files. Can it be a problem for PBS archive format or for file change detection algorithm?
Or any ideas more?
Yes, the change detection mode metadata uses a lookahead cache, keeping file handles for possibly reusable files open until a decision can be made whether the files are reusable or need to be re-encoded. The lookahead cache capacity is calculated dynamically based on the file limits, which might however be problematic for network shares (in particular CIFS). For a workaround, see https://forum.proxmox.com/threads/p...oder-dropped-upload-failed.165596/post-767415

Does forcing the reduced limit as described there help?
 
I set open files limit for the client 1K:1K and will try to run the script at this weekend.
P.S. The mail server filesystem is local (XFS), not network. Open file limits are 1K:512K by default
 
No setting of 'prlimit --nofile=1024:1024' does not help. I got the same error in the same directory

Bash:
prlimit --nofile=1024:1024 /usr/bin/proxmox-backup-client backup \
  "cyrus-imap-db.pxar:$CYR_DB_SNAP_MNT" \
  "cyrus-imap-main.pxar:$CYR_MAIN_SNAP_MNT" \
  "cyrus-imap-arch.pxar:$CYR_ARCH_SNAP_MNT" \
  --repository "$PBS_TOKEN@backup.solvo.ru:DATA" -ns MAIL --backup-id "`hostname -s`-cyrus" --backup-type host --change-detection-mode=metadata

Code:
Starting backup: [MAIL]:host/rhino-cyrus/2026-02-13T21:45:11Z   
Client name: rhino   
Starting backup protocol: Sat Feb 14 00:45:11 2026   
No previous manifest available.   
 ...
Upload directory '/mnt/imaparch.snap' to 'backup-agent@pbs!cyrus-imap@backup.solvo.ru:8007:DATA' as cyrus-imap-arch.mpxar.didx   
resource limit for open file handles low: 1024   
processed 2.876 GiB in 1m, uploaded 2.841 GiB
processed 6.245 GiB in 2m, uploaded 6.207 GiB
 ...
processed 3.083 TiB in 15h 39m 1s, uploaded 2.756 TiB
processed 3.084 TiB in 15h 40m 1s, uploaded 2.757 TiB
unclosed encoder dropped
closed encoder dropped with state
unfinished encoder state dropped
cyrus-imap-arch.ppxar: had to backup 2.757 TiB of 3.084 TiB (compressed 1.404 TiB) in 56414.76 s (average 51.237 MiB/s)
cyrus-imap-arch.ppxar: backup was done incrementally, reused 335.662 GiB (10.6%)
Error: upload failed: error at "imap/V/user/nladokhin/Support/Bugs"

2) I try to backup only the '/mnt/imaparch.snap/imap/V/user/nladokhin' directory alone for test and get the same error
Code:
Starting backup: [MAIL]:host/rhino-cyrus/2026-02-16T14:16:52Z
Client name: rhino
Starting backup protocol: Mon Feb 16 17:16:52 2026
No previous manifest available.
Upload directory '/mnt/imaparch.snap/imap/V/user/nladokhin' to 'backup-agent@pbs!cyrus-imap@backup.solvo.ru:8007:DATA' as cyrus-test.mpxar.didx
resource limit for open file handles low: 1024
processed 4.463 GiB in 1m, uploaded 4.414 GiB
...
processed 67.759 GiB in 16m, uploaded 67.607 GiB
processed 71.285 GiB in 17m, uploaded 71.133 GiB
unclosed encoder dropped
closed encoder dropped with state
unfinished encoder state dropped
cyrus-test.mpxar: had to backup 154.655 MiB of 154.655 MiB (compressed 29.562 MiB) in 1034.15 s (average 153.137 KiB/s)
Error: upload failed: error at "Support/Bugs"

3) Try to backup deeper directory '/mnt/imaparch.snap/imap/V/user/nladokhin/Support' - no error

Code:
Starting backup: [MAIL]:host/rhino-cyrus/2026-02-16T14:46:34Z
Client name: rhino
Starting backup protocol: Mon Feb 16 17:46:34 2026
Downloading previous manifest (Mon Feb 16 17:37:05 2026)
Upload directory '/mnt/imaparch.snap/imap/V/user/nladokhin/Support' to 'backup-agent@pbs!cyrus-imap@backup.solvo.ru:8007:DATA' as cyrus-test2.mpxar.didx
resource limit for open file handles low: 1024
Previous manifest does not contain an archive called 'cyrus-test2.mpxar.didx', skipping download..
Previous manifest does not contain an archive called 'cyrus-test2.ppxar.didx', skipping download..
processed 4.074 GiB in 1m, uploaded 4.074 GiB
processed 7.716 GiB in 2m, uploaded 7.716 GiB
processed 12.082 GiB in 3m, uploaded 12.082 GiB
processed 15.785 GiB in 4m, uploaded 15.783 GiB
processed 18.085 GiB in 5m, uploaded 18.082 GiB
processed 19.649 GiB in 6m, uploaded 19.647 GiB
processed 22.22 GiB in 7m, uploaded 22.217 GiB
cyrus-test2.ppxar: had to backup 24.617 GiB of 24.619 GiB (compressed 3.487 GiB) in 473.45 s (average 53.243 MiB/s)
cyrus-test2.ppxar: backup was done incrementally, reused 1.394 MiB (0.0%)
cyrus-test2.mpxar: had to backup 196.228 MiB of 196.228 MiB (compressed 36.229 MiB) in 473.53 s (average 424.337 KiB/s)
Duration: 473.68s
End Time: Mon Feb 16 17:54:27 2026

P.S. I'm sure the 100GB mailbox looks a bit oversized :) and ask the user to clean it. But it not the biggest one...