[SOLVED] Sync timeout via Proxmox Backup Server

Tacioandrade

Renowned Member
Sep 14, 2012
124
19
83
Vitória da Conquista, Brazil
Hello everyone! I have a PBS running locally on my client and a PBS running on OVH to receive a copy of the data from the local PBS.
In this environment I currently have 8 VMs, of the 8 machines 7 are smaller than the PBS and can push the VMs without any problems, however the largest of them all is a SQL Server with 2 2TB disks, it gives this error and I have been unable to pull the backups from there since the 6th:

2025-03-13T05:00:09-03:00: sync snapshot vm/201/2025-03-09T03:00:16Z
2025-03-13T05:00:09-03:00: sync archive qemu-server.conf.blob
2025-03-13T05:00:09-03:00: sync archive drive-sata1.img.fidx
2025-03-13T05:05:35-03:00: downloaded 1.97 GiB (6.214 MiB/s)
2025-03-13T05:05:35-03:00: sync archive drive-sata0.img.fidx
2025-03-13T07:34:55-03:00: removing backup snapshot "/mnt/datastore/pbs-remoto/ns/kacyumara/vm/201/2025-03-09T03:00:16Z"
2025-03-13T07:34:55-03:00: percentage done: 15.00% (1/8 groups, 1/5 snapshots in group #2)
2025-03-13T07:34:55-03:00: sync group vm/201 failed - timed out

I would like to know if anyone knows of a way to increase the timeout or something like that so that I can pull up this VM!

In this case, the full backup has already arrived, I believe that from the 5th to the 6th of this month there was some major change to the disk and it is causing this problem.
 
I've synced some large VMs. Never had this type of issue.
In fact, when it stops in the middle of a sync (reboot or whatever), it appears to go back and get the rest on next sync. (I've not looked at this closely, as I'm only concerned about the most recent backup syncing, not prior failures.)

My advice ... just off the cuff, I'd delete the synced backups from the target server and try again.

And if that doesn't get it, delete all the source backups.
Start over with a fresh backup of your 2tb machine and see if you can sync that.
 
Last edited:
you don't need to delete anything, just restart the sync (it will still make progress unless you run GC).

the timeout points at a network problem though, maybe something is throttling http2 connections?
 
  • Like
Reactions: Tacioandrade
It seems that I have solved it here, for some reason my PBS was closing connections when it was taking too long, I ended up solving it by doing the following tuning on my Debian/PBS.

I edited the /etc/sysctl.conf file and added the following content:

Code:
##Kernel
kernel.panic = 10
kernel.watchdog_thresh = 20

##Arrumando erro de E/S
vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

##Trocas de Cache em RAM
vm.vfs_cache_pressure = 50

##Ativar Swap acima de 90%
vm.swappiness = 10

##Sempre reservar 512MB Ram
vm.min_free_kbytes = 524288

##Permitir a reutilização de sockets TIME_WAIT
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_fin_timeout = 120
net.ipv4.tcp_tw_reuse = 0

##Desativa IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

##16 MB por soquete
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

##Maximo de backlogged sockets
net.core.somaxconn = 4096

# Aumenta o numero permitido de solicitacoes de sincronização pendentes
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1

##Aumenta o Fluxo
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_adv_win_scale = 2

# Ajuda com o timeout da rede
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 9

After that, I ran sysctl -p to load the new settings and it's working perfectly.

I believe this isn't necessary in 99% of cases, but in my case it solved the problem! I believe what solved the problem in this case were the net.ipv4.tcp_keepalive and TIME_WAIT options. The rest of the options are tunings that I used to improve the performance of PBS in my environment based on documentation from friends and it seems to be working very well.

In their documentation, these options were smaller because the PBSs were on the same network or country, but not in my case, where the backup PBS is almost 200ms away in another country.
 
you don't need to delete anything, just restart the sync (it will still make progress unless you run GC).

the timeout points at a network problem though, maybe something is throttling http2 connections?
I believe that this was the problem, the timeout was occurring due to the distance between the PBSs, the main one is in Brazil and the Backup one is in Canada, almost 200ms apart and perhaps during peak link usage times, this latency was even higher, causing the timeout.

I ended up adjusting Debian to accept longer timeouts before dropping the connection and it seems that this solved my problem.
 
yeah, that would explain it. HTTP2 is pathologically slow with high latency links
 
  • Like
Reactions: Johannes S