Remote (re)sync very slow

Jul 18, 2018
26
5
43
19
Hi all,
incremental backups from all servers takes about 15 Minutes, but (re)sync between two Backup Servers 2.0-9 takes about 9 hours.
Initial(first) sync has taken about 24 hours for cca 3TB.
All are on the same subnet (LAN), all connected 10Gbit.

Can't see any errors, is this normal???

root@pbs2:~# proxmox-backup-client benchmark PRIMARY BACKUP SERVER
Uploaded 484 chunks in 5 seconds.
Time per request: 10438 microseconds.
TLS speed: 401.80 MB/s
SHA256 speed: 204.94 MB/s
Compression speed: 270.26 MB/s
Decompress speed: 537.63 MB/s
AES256/GCM speed: 915.43 MB/s
Verify speed: 147.57 MB/s

root@pbs:~# proxmox-backup-client benchmark SECONDARY BACKUP SERVER
Uploaded 479 chunks in 5 seconds.
Time per request: 10477 microseconds.
TLS speed: 400.32 MB/s
SHA256 speed: 281.71 MB/s
Compression speed: 495.79 MB/s
Decompress speed: 970.91 MB/s
AES256/GCM speed: 1564.10 MB/s
Verify speed: 274.31 MB/s

Pings from SECONDARY to PRIMARY
64 bytes from 192.168.10.22 (192.168.10.22): icmp_seq=1 ttl=64 time=0.130 ms
64 bytes from 192.168.10.22 (192.168.10.22): icmp_seq=2 ttl=64 time=0.122 ms
64 bytes from 192.168.10.22 (192.168.10.22): icmp_seq=3 ttl=64 time=0.138 ms
64 bytes from 192.168.10.22 (192.168.10.22): icmp_seq=4 ttl=64 time=0.132 ms
64 bytes from 192.168.10.22 (192.168.10.22): icmp_seq=5 ttl=64 time=0.096 ms
 
can you post the sync task log?
 
mhm... how does the storage of the source server look like? can you test the network performance via iperf ?
 
it's common HP based raid 5 (gen9) 900G drives.

------------------------------------------------------------
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.XX.XX port 50818 connected with 192.168.XX.XX port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0002 sec 10.9 GBytes 9.39 Gbits/sec

TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.XX.XX port 45504 connected with 192.168.XX.XX port 5001 (reverse)
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0030 sec 10.9 GBytes 9.40 Gbits/sec
 
I do suspect destination server, It's just 4GB RAM running ZFS(raid 5).

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --numjobs=2 --iodepth=64 --size=10G --runtime=60 --readwrite=randrw --rwmixread=75 && rm random_read_write.fio

fio-3.25
Starting 2 processes
test: Laying out IO file (1 file / 10240MiB)
Jobs: 2 (f=2): [m(2)][0.1%][r=136KiB/s,w=76KiB/s][r=34,w=19 IOPS][eta 21h:11m:07s]
test: (groupid=0, jobs=1): err= 0: pid=3968622: Mon Sep 13 11:13:32 2021
read: IOPS=26, BW=107KiB/s (109kB/s)(6488KiB/60811msec)
bw ( KiB/s): min= 8, max= 200, per=51.57%, avg=110.65, stdev=39.76, samples=113
iops : min= 2, max= 50, avg=27.66, stdev= 9.94, samples=113
write: IOPS=9, BW=36.3KiB/s (37.2kB/s)(2208KiB/60811msec); 0 zone resets
bw ( KiB/s): min= 8, max= 88, per=52.17%, avg=39.85, stdev=18.91, samples=108
iops : min= 2, max= 22, avg= 9.96, stdev= 4.73, samples=108
cpu : usr=0.03%, sys=0.53%, ctx=1954, majf=0, minf=7
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.7%, 32=1.5%, >=64=97.1%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=1622,552,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
test: (groupid=0, jobs=1): err= 0: pid=3968623: Mon Sep 13 11:13:32 2021
read: IOPS=26, BW=107KiB/s (110kB/s)(6580KiB/61262msec)
bw ( KiB/s): min= 16, max= 184, per=52.04%, avg=111.65, stdev=39.38, samples=114
iops : min= 4, max= 46, avg=27.91, stdev= 9.85, samples=114
write: IOPS=9, BW=38.7KiB/s (39.6kB/s)(2372KiB/61262msec); 0 zone resets
bw ( KiB/s): min= 8, max= 104, per=53.50%, avg=40.42, stdev=21.27, samples=113
iops : min= 2, max= 26, avg=10.11, stdev= 5.32, samples=113
cpu : usr=0.05%, sys=0.52%, ctx=2007, majf=0, minf=8
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.7%, 32=1.4%, >=64=97.2%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=1645,593,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: bw=213KiB/s (218kB/s), 107KiB/s-107KiB/s (109kB/s-110kB/s), io=12.8MiB (13.4MB), run=60811-61262msec
WRITE: bw=74.8KiB/s (76.6kB/s), 36.3KiB/s-38.7KiB/s (37.2kB/s-39.6kB/s), io=4580KiB (4690kB), run=60811-61262ms
 
:D:D:D
read: IOPS=26, BW=107KiB/s (109kB/s)(6488KiB/60811msec
Fio does what's it's being told to do

What kind of CPU do you have ?
Why are you giving an io depth of 64? A sequential job with iodepth=2 will submit two sequential IO requests at a time.
Why are you testing with the libaio engine ? I believe this should be a synchronized workload
Are you sure 4K blocks size relates to the rsync workload ? TCP window size: 85.0 KByte (default)

I believe your benchmark is unrelated to your workload
 
Last edited:
If the receiving back-end is just pure storage, then I can recommend doing this.


1) Get x2 decent 50G SSDs, or w/e SSD you have lying around - anything will do tbh - Assuming /dev/sd[a-b]
2) 2 partition of type ZFS Solaris: 1 for the zil slog (5% number #1 sda1), 1 for the l2arc (95% of total size #2)
3) zpool add BIGPOOL log mirror /dev/sda1 /dev/sdb1
4) zpool add BIGPOOL cache /dev/sda2 /dev/sdb2

It will save you money and also prevents fragmentation of the ZFS pool- ZFS can start to fragment very early , it's not that great, that thing just get stuck looking for free space for eternity and you wanna throw that from the window then.

There is no such rule as 1Gb of RAM per Tb of storage - SSDs or nvmes will get you all the way to the hundred Tb mark without any issue.

In fact if you look at what Ixsystem does, you'll notice most of their enterprise hardware is running on low cost hardware.
 
Last edited:
Thanks monkaX for tips, shame - but there is no space for other drives.
I'm relatively newbie with ZFS - tell the true to put ZFS on a server is a kind or test.
Usually I use HW raid, but I've found one unused server with plain SATA disks.
I'm really curious if upgrade RAM will help, I'll let you know in here.

Here how the dashboard looks like (during sync).

Screenshot from 2021-09-13 12-54-27.png
 
This will help you big time concerning the ZFS file system:

https://www.truenas.com/community/r...bas-and-why-cant-i-use-a-raid-controller.139/
https://forums.unraid.net/topic/102010-recommended-controllers-for-unraid/

As far as I'm concerned, the layout which works rock solid on pretty much anything would be:

- 16Gb of RAM
- 10Gb NIC
- x2 small SSD/Nvme (50Gb will do), ZFS Raid1 mirror for the Proxmox system
- x2 1 Tb SSD/Nvme, same thing mirror, a pool dedicated for the VMs
- x8 high capacity spinning drives with it's LSI 92x1-8i supported by 2 small SSD for the arc and the zil slog. This is key to prevent fragmentation which is bound to happen.

I would advise against running anything but the system itself on the local pool rpool, it's better that way and will avoid you a LOT of unnecessary troubles down the road.

This will work with DDR3, DDR4, ECC non ECC, i3, i5, atom ... anything really. Hell even a laptop is a good candidate for a T1 hypervisor, it has batteries after all won't go down that easily :D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!