[SOLVED] slow restore between all NVME + 10GBit + 16core PBS and PVE ?? 60Mbyte/s ??

eXtremeSHOk

Well-Known Member
Mar 15, 2016
51
23
48
41
TLDR: Extremely slow restore speed between 2 dedicated PBS and PVE servers ( all NVME based + 10GBit + 16cores + 128GB ram )
restore of 1.2TB runs at 60Mbyte/s, where does the problem lie ?
My thoughs are the TLS and small chunk size, has a high overhead which is limiting the restore speed.

-------

We have a constant latency between the 2 servers of 25ms, Iperf3 is 9.8Gbit/s between the 2 servers using parallel threads, and 3.3GBit/s using a single thread


PBS specs: (Dedicated for testing)
CPU: AMD 5950x (16core,32thread)
RAM: 128GB
OS: Latest PBS (Kernel 6.2)
DISKS:
bpool: 6x NVME 3.84TB (ZFS zraid1) (enterprise)
os/boot: 2x ssd 1TB (ZFS raid1) (enterprise)
Connection: 10GBit

PVE specs: (Dedicated for testing)
CPU: AMD 7950X (16core,32thread)
RAM: 128GB
OS: Latest PVE (Kernel 6.2)
DISKS:
os/boot/vm: 2x NVME 1.92TB (ZFS raid1) (enterprise)
Connection: 2x10GB (LACP bonded)

CPU load is around 3% during a restore, IO latency is 0.

FIO: random write: 2MB records, 1000GB of data, iodepth 16

PBS# fio --name=random-write --ioengine=libaio --rw=randwrite --bs=2m --numjobs=1 --size=1000g --runtime=30m --iodepth=16 --filename=/mnt/datastore/bpool/fio
write: IOPS=829, BW=1658MiB/s (1739MB/s)(1000GiB/617596msec); 0 zone resets
PBS# fio --name=random-read --ioengine=libaio --rw=randread --bs=2m --numjobs=1 --size=1000g --runtime=30m --iodepth=16 --filename=/mnt/datastore/bpool/fio
read: IOPS=582, BW=1166MiB/s (1223MB/s)(1000GiB/878242msec); 0 zone resets

PVE# fio --name=random-write --ioengine=libaio --rw=randwrite --bs=2m --numjobs=1 --size=1000g --runtime=30m --iodepth=16 --filename=/rpool/fio
write: IOPS=1404, BW=2809MiB/s (2946MB/s)(1000GiB/364509msec); 0 zone resets
PVE# fio --name=random-read --ioengine=libaio --rw=randread --bs=2m --numjobs=1 --size=1000g --runtime=30m --iodepth=16 --filename=/rpool/fio
read: IOPS=1088, BW=2177MiB/s (2283MB/s)(1000GiB/470393msec)


Single connection, on parallel connections it will hit 9.8GBit/s
iperf3 -c PBS.hidden.host -P 1 -C bbr
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 341 MBytes 2.86 Gbits/sec 0 32.4 MBytes
[ 5] 1.00-2.00 sec 344 MBytes 2.88 Gbits/sec 1043 22.6 MBytes
[ 5] 2.00-3.00 sec 428 MBytes 3.59 Gbits/sec 0 22.6 MBytes
[ 5] 3.00-4.00 sec 426 MBytes 3.58 Gbits/sec 0 22.6 MBytes
[ 5] 4.00-5.00 sec 411 MBytes 3.45 Gbits/sec 362 22.3 MBytes
[ 5] 5.00-6.00 sec 412 MBytes 3.46 Gbits/sec 981 22.4 MBytes
[ 5] 6.00-7.00 sec 419 MBytes 3.51 Gbits/sec 81 8.75 MBytes
[ 5] 7.00-8.00 sec 386 MBytes 3.24 Gbits/sec 660 22.4 MBytes
[ 5] 8.00-9.00 sec 351 MBytes 2.95 Gbits/sec 417 22.1 MBytes
[ 5] 9.00-10.00 sec 428 MBytes 3.59 Gbits/sec 0 22.3 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 3.85 GBytes 3.31 Gbits/sec 3544 sender
[ 5] 0.00-10.02 sec 3.85 GBytes 3.30 Gbits/sec receiver


(PVE proxmox-backup-client benchmark target is the PBS datastore)
Uploaded 682 chunks in 5 seconds.
Time per request: 7393 microseconds.
TLS speed: 567.31 MB/s
SHA256 speed: 2397.22 MB/s
Compression speed: 795.06 MB/s
Decompress speed: 1310.99 MB/s
AES256/GCM speed: 2610.70 MB/s
Verify speed: 847.24 MB/s
 
  • Like
Reactions: lucius_the
your TLS speed is 576 MBytes/s ( 4,5 Gbits/s )
the 60 MBytes/s ( 0,5 Gbits/s ) bottleneck seems elsewhere.
 
Last edited:
Conclusion : the problem is caused by latency, and how PBS uses TLS along with the small chunk size.

Possible solutions:
PBS to use Quic/http3
increasing the chunksize 10x
Multi-threaded restore (user defined threads)

Here is an actual real world comparison between 2 identical pbs backup servers, located in 2 different datacenters, with an identical vm backup, consisting of a 1.2TB and 8GB disk.

10gbit (25ms latency)
restore image complete (bytes=1342177280000, duration=14452.10s, speed=88.57MB/s)
restore image complete (bytes=8589934592, duration=75.92s, speed=107.91MB/s)
**latency limited**

1gbit (3ms latency)
restore image complete (bytes=1342177280000, duration=9612.51s, speed=133.16MB/s)
restore image complete (bytes=8589934592, duration=40.59s, speed=201.80MB/s)
**bandwidth limited**
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!