"backup failed: connection error: not connected" when attempting to use PBC with Metadata backup

LDShadowLord

New Member
Dec 6, 2024
7
0
1
Hi all,
I'm attempting to do a file based backup using PBC using the metadata backup. I have several backups, all targeting different folders on a NAS mounted with CIFS.
I'm running PBC on the actual PBS machine itself, so it's connecting to "itself."
All of my backups are configured identically, except they all target different folders and different namespaces.
One of the backups runs perfectly every single day, without fail. One runs perfectly 90% of the time with an error here or there.
The problematic backup however, runs for the _initial_ backup perfectly, and then on all subsequent backups where it needs to compare against the existing metadata.pxar it proceeds to fail.
I'd like to use the metadata backup, as it reduces the backup time taken from about 2 and a half hours down to ~30 minutes. Does anyone know why It might be constantly failing out, at the end of the backup?

The script I run - This is run by a CRON Script every day at 01:45.
Bash:
#!/bin/bash

PBS_PASSWORD_FILE=/root/pbsc_token.txt
export PBS_PASSWORD_FILE

proxmox-backup-client backup appdata.pxar:/mnt/appdata \
        --repository 'root@pam!ponder@192.168.1.197:General' \
        --crypt-mode none \
        --change-detection-mode metadata \
        --ns appdata

Code:
...[Previous 5000 lines truncated]....
2024-12-06T01:47:24+00:00: successfully added chunk 9895fd77e974a44ea109109627a1e287eb00e870a5f62dda95885fbe2d7b6f09 to dynamic index 2 (offset 288503899, size 1259373)
2024-12-06T01:47:24+00:00: upload_chunk done: 1259309 bytes, 19298dc012c6f25eb063f95b79aea4e6063c6ec4e975192e8928bb380194b278
2024-12-06T02:12:00+00:00: POST /dynamic_chunk
2024-12-06T02:12:00+00:00: POST /dynamic_chunk
2024-12-06T02:12:00+00:00: upload_chunk done: 1593155 bytes, 37820eb2aba09e64bb13efad350fdcdf209b4a46cc0bc9eb8402109a240810e8
2024-12-06T02:12:00+00:00: PUT /dynamic_index
2024-12-06T02:12:00+00:00: dynamic_append 1 chunks
2024-12-06T02:12:00+00:00: successfully added chunk 37820eb2aba09e64bb13efad350fdcdf209b4a46cc0bc9eb8402109a240810e8 to dynamic index 1 (offset 0, size 1593155)
2024-12-06T02:12:00+00:00: upload_chunk done: 2269439 bytes, b8c5a6763eb16b8b97cc65958e1828b9e83f2a5f0bd49f0cf0a49e10cfe26c38
2024-12-06T02:12:00+00:00: PUT /dynamic_index
2024-12-06T02:12:00+00:00: dynamic_append 2 chunks
2024-12-06T02:12:00+00:00: successfully added chunk 19298dc012c6f25eb063f95b79aea4e6063c6ec4e975192e8928bb380194b278 to dynamic index 2 (offset 289763272, size 1259309)
2024-12-06T02:12:00+00:00: successfully added chunk b8c5a6763eb16b8b97cc65958e1828b9e83f2a5f0bd49f0cf0a49e10cfe26c38 to dynamic index 2 (offset 291022581, size 2269439)
2024-12-06T02:12:00+00:00: POST /dynamic_close
2024-12-06T02:12:00+00:00: Upload statistics for 'appdata.ppxar.didx'
2024-12-06T02:12:00+00:00: UUID: 874bc3bd58f94b0e8bb54319ab3fc43b
2024-12-06T02:12:00+00:00: Checksum: 6209854d34347f7284d297285a0e3b0d47d7e0cec2a2d63f50c50be9359b6208
2024-12-06T02:12:00+00:00: Size: 293292020
2024-12-06T02:12:00+00:00: Chunk count: 220
2024-12-06T02:12:00+00:00: Upload size: 8358586 (2%)
2024-12-06T02:12:00+00:00: Duplicates: 213+3 (98%)
2024-12-06T02:12:00+00:00: Compression: 12%
2024-12-06T02:12:00+00:00: successfully closed dynamic index 2
2024-12-06T02:12:00+00:00: backup failed: connection error: not connected
2024-12-06T02:12:00+00:00: removing failed backup
2024-12-06T02:12:00+00:00: removing backup snapshot "/backups/PONDER/General/ns/appdata/host/ponder/2024-12-06T01:45:01Z"
2024-12-06T02:12:00+00:00: TASK ERROR: removing backup snapshot "/backups/PONDER/General/ns/appdata/host/ponder/2024-12-06T01:45:01Z" failed - Directory not empty (os error 39

If you need me to provide any other logs, or more information about the environment, let me know.
This has been driving me crazy for about 3 months and I cannot figure out how to fix it.
 
Last edited:
What about the log from the backup client? Does it tell you more? You can try and invoke the command itself directly...

2024-12-06T02:12:00+00:00: successfully closed dynamic index 2 2024-12-06T02:12:00+00:00: backup failed: connection error: not connected
It does seem that the client has reached the finish for the archive, but then the connection is dropped. What is your proxmox-backup-manager version --verbose? What is the client version proxmox-backup-client version?
 
Code:
root@ponder:~# proxmox-backup-manager version --verbose
proxmox-backup                     3.3.0         running kernel: 6.8.8-3-pve
proxmox-backup-server              3.3.0-2       running version: 3.3.0     
proxmox-kernel-helper              8.1.0                                   
proxmox-kernel-6.8                 6.8.12-4                                 
proxmox-kernel-6.8.12-4-pve-signed 6.8.12-4                                 
proxmox-kernel-6.8.8-3-pve-signed  6.8.8-3                                 
proxmox-kernel-6.8.4-2-pve-signed  6.8.4-2                                 
ifupdown2                          3.2.0-1+pmx11                           
libjs-extjs                        7.0.0-5                                 
proxmox-backup-docs                3.3.0-1                                 
proxmox-backup-client              3.3.0-1                                 
proxmox-mail-forward               0.3.1                                   
proxmox-mini-journalreader         1.4.0                                   
proxmox-offline-mirror-helper      0.6.7                                   
proxmox-widget-toolkit             4.3.3                                   
pve-xtermjs                        5.3.0-3                                 
smartmontools                      7.3-pve1                                 
zfsutils-linux                     2.2.6-pve1

Code:
root@ponder:~# proxmox-backup-client version
client version: 3.3.0

I tried running the same command, with the debug flag, but couldn't get any more information. I have a previous log which I can add, but i'm not able to get another immediately. It's chunky, so i'll put it in a spoiler again.
Code:
root@ponder:~# tail -50 /root/pbsc_appdata_debug.log
reusable: "mapValuesSeries.js" at range 9698122177..9698122261 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797459)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/memoize.js"
reusable: "memoize.js" at range 9698122261..9698122337 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797543)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/nextTick.js"
reusable: "nextTick.js" at range 9698122337..9698122414 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797619)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/omit.js"
reusable: "omit.js" at range 9698122414..9698122487 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797696)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/omitLimit.js"
reusable: "omitLimit.js" at range 9698122487..9698122565 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797769)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/omitSeries.js"
reusable: "omitSeries.js" at range 9698122565..9698122644 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797847)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/package.json"
reusable: "package.json" at range 9698122644..9698124052 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416797926)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/parallel.js"
reusable: "parallel.js" at range 9698124052..9698124129 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416799334)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/parallelLimit.js"
reusable: "parallelLimit.js" at range 9698124129..9698124211 has unchanged metadata.
Offset relative to range start: PayloadOffset(9416799411)
"docker/_deprecated/foundry/resources/app/node_modules/neo-async/pick.js"
unclosed encoder dropped
closed encoder dropped with state
unfinished encoder state dropped
unfinished encoder state dropped
unfinished encoder state dropped
unfinished encoder state dropped
unfinished encoder state dropped
append chunks list len (126)
append chunks list len (2)
appdata.mpxar.didx: had to backup 1.513 MiB of 1.513 MiB (compressed 285.724 KiB) in 1648.24 s (average 962 B/s)
appdata.mpxar.didx: Reused 0 from 2 chunks.
appdata.mpxar.didx: Average chunk size was 774.847 KiB.
appdata.mpxar.didx: Average time per request: 824120372 microseconds.
append chunks list len (1)
appdata.ppxar.didx: reused 261.322 MiB from previous snapshot for unchanged files (204 chunks)
appdata.ppxar.didx: had to backup 8.746 MiB of 270.068 MiB (compressed 1.995 MiB) in 1648.24 s (average 5.434 KiB/s)
appdata.ppxar.didx: backup was done incrementally, reused 261.322 MiB (96.8%)
appdata.ppxar.didx: Reused 0 from 214 chunks.
appdata.ppxar.didx: Average chunk size was 1.262 MiB.
appdata.ppxar.didx: Average time per request: 7702052 microseconds.
Upload index.json to 'root@pam!ponder@192.168.1.197:8007:General'
Duration: 1648.53s
End Time: Thu Oct 24 20:27:13 2024
 
Could you provide the full log for further investigation, not just the tail?
 
Hi there, sorry this took a minute. I re-ran the backup so it was updated. It has more details than previously, seems to indicate a specific file is the issue, but that file is in a folder of unused files - So it's not open. Unsure as to why it would be a problem, I assume that was just the file it failed on when it had the error?

I was unable to attach the gZip'd log to this post, due to the size, so here's a G-Drive link.
https://drive.google.com/file/d/160F06FuQGsjygbeDciCl0B8NVRu-uBtv/view?usp=sharing

Let me know if this has the information you need, thanks for your help.
 
Thanks for providing the logs: Is the proxmox backup client running out of memory? According to the log you uploaded, the error occurs when adding an entry to the lookahead cache used to collect reusable files. Do you see any errors in the systemd journal around the time of the failure?
 
The machine is a VM, with 16GB of RAM on it - Normally only 1-2GB used.
I've plucked the systemd logs, for the specific area around when it failed. You can see the timestamps. It's in reverse time order.
Code:
Dec 06 10:51:49 ponder systemd-logind[584]: Removed session 1902.
Dec 06 10:51:49 ponder systemd[1]: session-1902.scope: Deactivated successfully.
Dec 06 10:51:49 ponder systemd-logind[584]: Session 1902 logged out. Waiting for processes to exit.
Dec 06 10:48:26 ponder systemd-logind[584]: Removed session 1899.
Dec 06 10:48:26 ponder systemd-logind[584]: Session 1899 logged out. Waiting for processes to exit.
Dec 06 10:48:26 ponder systemd[1]: session-1899.scope: Consumed 15min 58.343s CPU time.
Dec 06 10:48:26 ponder systemd[1]: session-1899.scope: Deactivated successfully.
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: removing failed backup
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: backup failed: connection error: not connected
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: successfully closed dynamic index 1
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: Compression: 18%
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: Upload size: 1594559 (100%)
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: Chunk count: 1
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: Size: 1594559
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: Checksum: 24d53eb854a35d3696850076821dc0f5bf9c86ae4fdfc8a49b5e8fb4dcd05dae
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: UUID: 1767a5d2b1334452a0dfde8d0fd8168d
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: Upload statistics for 'appdata.mpxar.didx'
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: successfully added chunk 2363ea5763c051f2dcf1813fd88af0fa7278aeed8155845077e6b5eff7e042be to dynamic index 1 (offset 0, size 1594559)
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: dynamic_append 1 chunks
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: upload_chunk done: 1594559 bytes, 2363ea5763c051f2dcf1813fd88af0fa7278aeed8155845077e6b5eff7e042be
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: successfully added chunk 905fd521c8510eb3cf41e656983ba1fec02ae8543fc621214c9fd362dd9cdfdb to dynamic index 2 (offset 290300699, size 1259309)
Dec 06 10:37:04 ponder proxmox-backup-proxy[104206]: dynamic_append 1 chunks
Dec 06 10:32:39 ponder proxmox-backup-proxy[104206]: rrd journal successfully committed (33 files in 0.029 seconds)
Dec 06 10:27:50 ponder login[128619]: ROOT LOGIN  on '/dev/pts/0'
Dec 06 10:27:50 ponder systemd[1]: Started session-1902.scope - Session 1902 of User root.
Dec 06 10:27:50 ponder systemd-logind[584]: New session 1902 of user root.
Dec 06 10:27:50 ponder login[128614]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
Dec 06 10:27:37 ponder proxmox-backup-proxy[104206]: error during snapshot file listing: 'unable to load blob '"/backups/PONDER/General/ns/appdata/host/ponder/2024-12-06T10:08:09Z/index.json.blob"' - No such file or directory (os error 2)'
Dec 06 10:27:36 ponder proxmox-backup-proxy[104206]: error during snapshot file listing: 'unable to load blob '"/backups/PONDER/General/ns/appdata/host/ponder/2024-12-06T09:34:14Z/index.json.blob"' - No such file or directory (os error 2)'
Dec 06 10:17:01 ponder CRON[128600]: pam_unix(cron:session): session closed for user root
Dec 06 10:17:01 ponder CRON[128601]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Dec 06 10:17:01 ponder CRON[128600]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Dec 06 10:12:31 ponder systemd-logind[584]: Removed session 1900.
Dec 06 10:12:31 ponder systemd-logind[584]: Session 1900 logged out. Waiting for processes to exit.
Dec 06 10:12:31 ponder systemd[1]: session-1900.scope: Deactivated successfully.
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 9895fd77e974a44ea109109627a1e287eb00e870a5f62dda95885fbe2d7b6f09 to dynamic index 2 (offset 289041326, size 1259373)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 16985c61faec3fd4c197c1f1e623bbee0813d5667406d666d9fdccc6499c85b0 to dynamic index 2 (offset 287348845, size 1692481)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk e3b8b88b9186d67b2315587a65ee3457df802980b2d9b3da4e3afda0c8800fe2 to dynamic index 2 (offset 286257798, size 1091047)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk d7b84808045a9369018e767546a8de6b09d2be6c9ec3a9b6d07c723069078bef to dynamic index 2 (offset 285207382, size 1050416)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 7e37e870264fdd707c7fb82cb135c54f152a7bc5cdd2664873da168a341ae1ce to dynamic index 2 (offset 284099317, size 1108065)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 76c85cf4bb110e4bd7da88d2fa53da19817fdaa095e8c72ff4ef72428fefbcfd to dynamic index 2 (offset 283017813, size 1081504)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 2e6ae8e707aa26e7d443e5def80fe468c7c82806fce9142fac4e05637da4425c to dynamic index 2 (offset 281869372, size 1148441)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk e5f487c92bce2453158a6824e437855d5bcfd298175790ca9f618aeef72bdf93 to dynamic index 2 (offset 280819688, size 1049684)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 67434f88c6770c4021afbc4c0e7a6c446f27bbb654910a9067b3025168e88a84 to dynamic index 2 (offset 279739459, size 1080229)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk ba2758321c5481b6a78de7e0aa39cc47996f979bc29ccd6fc0c681df8de606e2 to dynamic index 2 (offset 278633025, size 1106434)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk b91df6d41b42edce9bd8fa984d4c8bbeef1afdce80aa9cfb4eec362e2efe9518 to dynamic index 2 (offset 277540202, size 1092823)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 18d044ff905f550518bc571e87dfb02943cd5fc8bee018fc5593d84e8120eb9c to dynamic index 2 (offset 276481922, size 1058280)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 1638153d2e562a1dc2f63b3c320fedd990b1560441fa8a314d09a003a809da7a to dynamic index 2 (offset 275411581, size 1070341)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 74a108e6aa425cd62722f50678ba2109f3eb6579b04a26fcfddb67923888459f to dynamic index 2 (offset 274340980, size 1070601)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk f5dfdcb3cb91c0bcd2a10d134299c61b75f17fd0549bbc94b705e085064eb140 to dynamic index 2 (offset 273233004, size 1107976)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 261a1a23bb6649609761014b4ea4a6520e8506ca0e74a5e01bafe8f2c315f4e6 to dynamic index 2 (offset 271117436, size 2115568)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: successfully added chunk 0b67aed5acbd70d3ca46db215088549f06a8fc24e442212bf5b448519fc4b45e to dynamic index 2 (offset 269217449, size 1899987)
Dec 06 10:10:33 ponder proxmox-backup-proxy[104206]: upload_chunk done: 1259309 bytes, 905fd521c8510eb3cf41e656983ba1fec02ae8543fc621214c9fd362dd9cdfdb
 
Hmm, so no OOM killer message. Could you nevertheless try to run the proxmox backup client with limited cache size. The cache size is adapted based on the resouce limit of the allowed number of open files, so running prlimit --nofile=1024:1024 proxmox-backup-client ... will set the limits for the client to 1024 file handles. You should get a message resource limit for open file handles low: 1024 at the start.

Also please try to stat the file the backup failed upon, stat <archive-root>/docker/_deprecated/foundry/resources/app/node_modules/neo-async/omitLimit.js. Does this file still exist, is there concurrent operations on that file on the network share?
 
Regarding the file omitLimit.js, here is both a cat and a stat.
There shouldn't be any operations on that file, it's a random JS file in a deprecated folder that I keep for historical reasons. Nothing is touching it.

Code:
root@ponder:~# cat /mnt/appdata/docker/_deprecated/foundry/resources/app/node_modules/neo-async/omitLimit.js
'use strict';

module.exports = require('./async').omitLimit;
root@ponder:~# stat /mnt/appdata/docker/_deprecated/foundry/resources/app/node_modules/neo-async/omitLimit.js
  File: /mnt/appdata/docker/_deprecated/foundry/resources/app/node_modules/neo-async/omitLimit.js
  Size: 62              Blocks: 8          IO Block: 1048576 regular file
Device: 0,46    Inode: 8422739     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 1985-10-26 09:15:00.000000000 +0100
Modify: 1985-10-26 09:15:00.000000000 +0100
Change: 1985-10-26 09:15:00.000000000 +0100
 Birth: 1985-10-26 09:15:00.000000000 +0100

It takes about half an hour (usually) for the backup client to fail, so we'll see how it goes.
 
So the backup was able to complete successfully by using the prlimit command. Do you know what the default is? Because i'd be curious now to increase it and find the upper limit that I can do before it falls over.
I didn't see a "resource limit low" error on the log, it should be said.
 
I didn't see a "resource limit low" error on the log, it should be said.
That should be present though, otherwise it might not have been set as expected, and it might not have been the cause for the backup to complete successfully this time.

Do you know what the default is?
You can print the current default resource limits via prlimit. This will show soft and hard limits. Proxmox backup client sets the cache size accordingly, as described in some more details here [0].

[0] https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=cf75bc0db54d2a18230dc78ba451c0db49b9d774
 
I found it. I used head -100 on the file to read the logs. It was on line 111. Default hard limit appears to be 1048576.
I'll try setting it to something like 65536 which should give it a chunk of memory to work with, while not allowing it to go crazy.

Thanks for all your help with this today Chris, this has been bugging me for months.
 
I found it. I used head -100 on the file to read the logs. It was on line 111.
Ah yes, if you still have the debug output active, there might be a lot before that message, didn't take that into consideration when writing at the beginning.

I'll try setting it to something like 65536 which should give it a chunk of memory to work with, while not allowing it to go crazy.
Please keep us posted on your findings and/or if the issue reappears also for lower cache size limits. I would really be interested in understanding what limit you are running into, as it does not seem to be the memory limit...

EDIT: Could you find out what the limit of file handles is for your CIFS network share? I would guess that could be the limit you are running into, at it is independent from the client resource limit.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!