Subject: Intermittent backup errors and repeated TLS handshake failures

Jly4 · Aug 21, 2025

I am running PBS as a VM on the same Proxmox host where I back up LXC containers and VMs.
Backups usually work, but sometimes they fail with errors like:

Bash:

catalog upload error - pipelined request failed: connection closed because of a broken pipe
Error: error at "usr/include/xercesc/framework/psvi"
Caused by: sending on a closed channel
--or--
catalog upload error - channel closedError: pipelined request failed: connection closed because of a broken pipe

This happens intermittently and usually only for certain containers, while other containers back up successfully before and after.

Additionally, PBS is accessed through an Nginx reverse proxy running in a separate LXC container on the same Proxmox host.
in journalctl -f -b I constantly see the following message:

C-like:

Aug 21 17:31:58 pbs proxmox-backup-proxy[826]: [[::ffff:172.16.100.65]:54132] failed to check for TLS handshake: couldn't peek into incoming TCP stream
Aug 21 17:32:28 pbs proxmox-backup-proxy[826]: [[::ffff:172.16.100.65]:33634] failed to check for TLS handshake: couldn't peek into incoming TCP stream
Aug 21 17:32:58 pbs proxmox-backup-proxy[826]: [[::ffff:172.16.100.65]:43834] failed to check for TLS handshake: couldn't peek into incoming TCP stream
Aug 21 17:33:28 pbs proxmox-backup-proxy[826]: [[::ffff:172.16.100.65]:34628] failed to check for TLS handshake: couldn't peek into incoming TCP stream
Aug 21 17:33:58 pbs proxmox-backup-proxy[826]: [[::ffff:172.16.100.65]:57200] failed to check for TLS handshake: couldn't peek into incoming TCP stream

Nginx config:

NGINX:

server {
    listen 80 ;
    listen [::]:80;
    server_name domain.com;
    rewrite ^(.*) https://$host$1 permanent;
}
 
server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name domain.com;
    ssl_certificate /etc/letsencrypt/live/domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/domain.com/privkey.pem;
    proxy_redirect off;
    location / {
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass https://172.16.100.62:8007;
        proxy_buffering off;
        client_max_body_size 0;
        proxy_connect_timeout  3600s;
        proxy_read_timeout  3600s;
        proxy_send_timeout  3600s;
        send_timeout  3600s;
    }
}

Environment:

PBS runs inside a VM on the Proxmox host
Reverse proxy: Nginx in a separate LXC container
172.16.100.62 - pbs
172.16.100.65 - lxc with nginx

Bash:

proxmox-backup                     4.0.0        running kernel: 6.14.8-2-pve
proxmox-backup-server              4.0.11-2     running version: 4.0.11
proxmox-kernel-helper              9.0.3
proxmox-kernel-6.14.8-2-pve-signed 6.14.8-2
proxmox-kernel-6.14                6.14.8-2
ifupdown2                          3.3.0-1+pmx9
libjs-extjs                        7.0.0-5
proxmox-backup-docs                4.0.11-2
proxmox-backup-client              4.0.11-1
proxmox-mail-forward               1.0.2
proxmox-mini-journalreader         1.6
proxmox-offline-mirror-helper      0.7.0
proxmox-widget-toolkit             5.0.5
pve-xtermjs                        5.5.0-2
smartmontools                      7.4-pve1
zfsutils-linux                     2.3.3-pve1

Questions:

Could the repeated TLS handshake errors in the PBS journal be caused by my Nginx reverse proxy setup?
Are the intermittent pxar backup failures (broken pipe, catalog upload error) related to the proxy, or are they more likely caused by PBS itself running inside the VM?
What steps can I take to troubleshoot or handle these errors?

Thanks for any help

Max Carrara · Aug 21, 2025

At a first glance, your reverse-proxy setup seems to be fine.

Some basic troubleshooting things:

Does connecting to the web interface via both http://yourdomain.tld and https://yourdomain.tld work?.
(I assume it does, otherwise you'd have already noticed. Right? ;P)
Do the errors go away if you use https://172.16.100.62:8007 directly in your backup config and other places?
Do you have any health monitoring service running or something similar that just checks whether your PBS VM is reachable? Perhaps by just establishing a TCP connection and then sending weird data? Because that would be enough to trigger the errors.
What's the overall resource usage during backups? CPU, RAM, IOPS, etc. of your PVE host(s) and of your CTs and VMs as well. If there are any hangs or longer hickups, it might explain why you're running into that specific code path.
You mentioned that this happens "usually only for certain containers, while other containers back up successfully before and after"—so for which containers / (and VMs?) does this happen? For which does it not happen? What kind of applications are running in those containers?
Since your Nginx instance runs inside a container, are you also backing up that container itself to your PBS VM? What might happen is that a short interruption of the container causes the connection to hang / get messed up somehow, but that should be very unlikely.

That's all I can think of off the top of my head right now. Thanks for reporting this here by the way!

Jly4 · Aug 26, 2025

Max Carrara said:
At a first glance, your reverse-proxy setup seems to be fine.

Some basic troubleshooting things:

Does connecting to the web interface via both http://yourdomain.tld and https://yourdomain.tld work?.
(I assume it does, otherwise you'd have already noticed. Right? ;P)

Do the errors go away if you use https://172.16.100.62:8007 directly in your backup config and other places?

Do you have any health monitoring service running or something similar that just checks whether your PBS VM is reachable? Perhaps by just establishing a TCP connection and then sending weird data? Because that would be enough to trigger the errors.

What's the overall resource usage during backups? CPU, RAM, IOPS, etc. of your PVE host(s) and of your CTs and VMs as well. If there are any hangs or longer hickups, it might explain why you're running into that specific code path.

You mentioned that this happens "usually only for certain containers, while other containers back up successfully before and after"—so for which containers / (and VMs?) does this happen? For which does it not happen? What kind of applications are running in those containers?

Since your Nginx instance runs inside a container, are you also backing up that container itself to your PBS VM? What might happen is that a short interruption of the container causes the connection to hang / get messed up somehow, but that should be very unlikely.

That's all I can think of off the top of my head right now. Thanks for reporting this here by the way!

Thank you for your response.

Yes, it does, it works fine.
The problem is that backup errors do not appear consistently — roughly 1 out of 30 backups fails. Because of this, it is quite hard to confirm if the issue is resolved.
Yes, I found the cause of the TLS handshake errors. It turned out to be the built-in health check in the Nginx UI, which is the web panel for managing Nginx.
The load is usually <25% CPU, <50% I/O delay, <50% RAM on PVE, and <50% CPU, <60% RAM on PBS. I/O load is often around 40%, with occasional peaks. My PVE is not very big — most LXC containers do not exceed 10 GB, and due to incremental and daily backups, the backup size is often under 1GB.
Once the issue happened with the PVE backup script, and a few times with this community LXC script. It is not very important for me, so today I just deleted this LXC. What bothers me more are the recurring TCP handshake errors.
That is an interesting point, but backups of the LXC with Nginx always succeed.

Max Carrara · Aug 26, 2025

Jly4 said:
Yes, I found the cause of the TLS handshake errors. It turned out to be the built-in health check in the Nginx UI, which is the web panel for managing Nginx.

Okay, that's good!

Jly4 said:
The load is usually <25% CPU, <50% I/O delay, <50% RAM on PVE, and <50% CPU, <60% RAM on PBS. I/O load is often around 40%, with occasional peaks. My PVE is not very big — most LXC containers do not exceed 10 GB, and due to incremental and daily backups, the backup size is often under 1GB.

What do you mean by <50% I/O delay? Do you mean <50ms I/O delay, perhaps?

Jly4 said:
Once the issue happened with the PVE backup script, and a few times with this community LXC script. It is not very important for me, so today I just deleted this LXC. What bothers me more are the recurring TCP handshake errors.

What PVE backup script are you referring to?

Regarding the TLS handshake errors: Those shouldn't be linked to your problem of backups sporadically failing, but given that they're rather annoying if one has to rely on a monitoring service with that kind of healthcheck mechanism, I'll see if I can improve the overall logic in that code path a little.

Jly4 · Aug 27, 2025

Max Carrara said:
What do you mean by <50% I/O delay? Do you mean <50ms I/O delay, perhaps?

I've checked there

Max Carrara said:
What PVE backup script are you referring to?

Script for proxmox-backup-client

Bash:

proxmox-backup-client backup $BACKUP_NAME.$BACKUP_TYPE:$BACKUP_PATH --repository $USER@$REPOSITORY:$PORT:$DATASTORE $ADDITIONAL

Max Carrara said:
Regarding the TLS handshake errors: Those shouldn't be linked to your problem of backups sporadically failing, but given that they're rather annoying if one has to rely on a monitoring service with that kind of healthcheck mechanism, I'll see if I can improve the overall logic in that code path a little.

Yeah, and what's interesting is that the Proxmox host is also managed via the Nginx UI, but there aren’t any TLS handshake errors.

Max Carrara · Aug 27, 2025

Jly4 said:
I've checked there

Oh right! My bad. I usually look at different stats.

... So, wait. Your IO delay stays below 50%? How high does it usually spike?

From man 1 iostat:

%iowait
Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

So, if your IO delay percentage is at around 40% for example, it means that 40% of your CPUs' time was spent waiting on your disk(s). That could very well be the reason why some backups might intermittently fail. The next time a backup fails / hangs / aborts, check if you find anything in journalctl -x and also see what the IO delay was at that time.

What kind of hardware are you using for your storage?

Jly4 said:
Yeah, and what's interesting is that the Proxmox host is also managed via the Nginx UI, but there aren’t any TLS handshake errors.

Yeah, PVE and PBS use different HTTP servers under the hood and the HTTP-to-HTTPS redirects have subtle differences in how they work.

Either way, I'll see if I can improve the error handling we have on the PBS side, since it's clearly distracting / worrying.

Max Carrara · Aug 27, 2025

Max Carrara said:
Either way, I'll see if I can improve the error handling we have on the PBS side, since it's clearly distracting / worrying.

This is now being tracked here: #6738

Jly4 · Aug 28, 2025

Max Carrara said:
So, if your IO delay percentage is at around 40% for example, it means that 40% of your CPUs' time was spent waiting on your disk(s). That could very well be the reason why some backups might intermittently fail. The next time a backup fails / hangs / aborts, check if you find anything in journalctl -x and also see what the IO delay was at that time.

What kind of hardware are you using for your storage?

I don’t want to take up too much of your time — looking at it now, it might just be an issue with my hardware. If so, I don’t think it makes sense to pursue this further.

Still, in case it’s useful for you, I’ve made a short video with some details. Google Drive

Max Carrara said:
Yeah, PVE and PBS use different HTTP servers under the hood and the HTTP-to-HTTPS redirects have subtle differences in how they work.

Either way, I'll see if I can improve the error handling we have on the PBS side, since it's clearly distracting / worrying.

I see, thank you, I really appreciate your work!

Search

Search

Subject: Intermittent backup errors and repeated TLS handshake failures

Jly4

New Member

Max Carrara

Well-Known Member

Jly4

New Member

Max Carrara

Well-Known Member

Jly4

New Member

Max Carrara

Well-Known Member

Max Carrara

Well-Known Member

Jly4

New Member

We value your privacy