Help PBS speed fixed 1gbs

Testani · Feb 23, 2024

Hi everyone, I installed a bare metal PBS on a dual 4110 with 128GB of ram and SSD.
I tried all possible configurations switching from ZFS to any configuration, in extreme I even created a DS with 1 single SSD. I'm on a full 10GB network, the proxmox servers have 2 10gb and the PBS 2 10gb with MTU and network speed tested withiperf.
I attach a screenshot of the PBS configuration and the compression used, in no way can I reach a backup speed higher than 90/100 MB/s. Where can I check and understand where the bottleneck is? Does PBS have 1GB limits somewhere?

sb-jw · Feb 23, 2024

What does your bond status say? What do your ports themselves say? Is the switch configured incorrectly? Is there perhaps a limiter? What does iperf say?

Testani · Feb 23, 2024

This is network tesT:

[ 1] local 10.12.0.249 port 47152 connected with 10.12.0.37 port 5001 (icwnd/mss/irtt=14/1448/106)

[ ID] Interval Transfer Bandwidth

[ 1] 0.0000-10.0134 sec 6.15 GBytes 5.27 Gbits/sec

root@pbs09:~# iperf -c 10.12.0.37

------------------------------------------------------------

Client connecting to 10.12.0.37, TCP port 5001

TCP window size: 85.0 KByte (default)

------------------------------------------------------------

[ 1] local 10.12.0.249 port 41052 connected with 10.12.0.37 port 5001 (icwnd/mss/irtt=14/1448/106)

[ ID] Interval Transfer Bandwidth

[ 1] 0.0000-10.0141 sec 5.80 GBytes 4.98 Gbits/sec

sb-jw · Feb 23, 2024

Is it possible that your PVE node is simply at its limit and simply can't deliver anymore? Or have you set a limiter on the backup bandwidth so as not to overload your drives? Have you ever done a benchmark on the PBS datastore?

Testani · Feb 23, 2024

Yes i did, i have a 10node cluster and everynode has got this performance.

Testani · Feb 23, 2024

here screenshot of speed and pbs load

sb-jw · Feb 23, 2024

It's difficult to help you if you ignore half the questions.

You can also have thousands of PVE nodes, what would that change? Nothing.
You have to provide a few more facts, answer the questions, check something or try something out.

Testani · Feb 23, 2024

Thanks, i have done all the test. What am i Missing? No limits speed on nodes, no i/o delay, VM run smoothly with 1gb of r/w on Local SSD disk, network bandwidth is ok, PBS test returns this values:

Time per request: 13590 microseconds.

TLS speed: 308.61 MB/s

SHA256 speed: 215.52 MB/s

Compression speed: 373.76 MB/s

Decompress speed: 575.69 MB/s

AES256/GCM speed: 1246.04 MB/s

Verify speed: 161.82 MB/s

+===================================+====================+

| Name | Value |

+===================================+====================+

| TLS (maximal backup upload speed) | 308.61 MB/s (25%) |

+-----------------------------------+--------------------+

| SHA256 checksum computation speed | 215.52 MB/s (11%) |

+-----------------------------------+--------------------+

| ZStd level 1 compression speed | 373.76 MB/s (50%) |

+-----------------------------------+--------------------+

| ZStd level 1 decompression speed | 575.69 MB/s (48%) |

+-----------------------------------+--------------------+

| Chunk verification speed | 161.82 MB/s (21%) |

+-----------------------------------+--------------------+

| AES256 GCM encryption speed | 1246.04 MB/s (34%) |

+===================================+====================+

what can i check?

sb-jw · Feb 23, 2024

For example, you could evaluate why your PBS obviously can't reach 20g via iperf.

Maybe this could help you: https://forum.proxmox.com/threads/lacp-bonding-with-2-x10g-nic-are-giving-10g-traffic-only.111428/

Check you vzdump config to make Sure, there is really no Limit Set: https://pve.proxmox.com/pve-docs/chapter-vzdump.html#vzdump_configuration

You haven't told us yet which switches you use. Here you should check the config to see whether the port is not limited by something and is also configured correctly.
I also don't know if you can achieve the 20g between two nodes.
You also didn't reveal which MTU you set and whether the switch can do it and is configured correctly on each node. The wrong MTU settings can cause various problems.

There is also no configuration or information about how you integrated the datastore on the PVE. Whether you e.g. Use encryption or not.

_gabriel · Feb 23, 2024

is only your 10gb nic connected ? because 1gb nic connected for the mgmt can be used, irrc

relink · Mar 5, 2025

Did you find a solution for this?

I've got 2 PBS instances running as VM on PVE.
Both hooked up with 2x25 Gbit/s Nics.
Bonding via LACP Layer 3+4 and passed through via virtio to BPS VM.

I'm observing the following:
Verify jobs and sync jobs are somehow thottled to 1Gbit/s.

I checked the switch interface status, which is up and running on 25Gbit/s.
Also the host ethtool puts out a 25Gbit/s connection.

When running iperf3 between those 2 BPS instaces, a get solid 10 Gbit/s (as virtio network adapter supports correctly).
But when BPS is doing its job, I'm running at like 0.7 to 1 Gbit/s.

vzdump checked, everything is commented out.
BWLimit never set manually, checked PBS itself and jobs to find anything which could throttle... Didnt find anything.

When running proxmox-backup-client against pbs, i get these results:

Thats not what I'm seeing in realworld performance

When observing the traffic on switches, I can clearly say that its using the correct QSFP Interfaces (Mikrotik Switches).

Another example: VM114

Backup running at nearly 2 Gbit/s.

But looking onto verify performance:

There is a huge performance gap.

This is taking action when verifying and syncing... As would the datastore throttle those "secondary tasks" somehow...

Any ideas? :S

Johannes S · Mar 5, 2025

relink said:
This is taking action when verifying and syncing... As would the datastore throttle those "secondary tasks" somehow...

What kind of storage are you using as datastore? The disk type and local versus network can make a huge difference

_gabriel · Mar 5, 2025

relink said:
Backup running at nearly 2 Gbit/s.

Reading source disk is at 1,8 GB/s (= 14,4 Gb/s)
Real data on the network wire is the "write" speed, which is 15 MB/s ( = 120 Mb/s), often limited by spinning disks.

guruevi · Mar 5, 2025

Reading source disk is faster than your network? What are your read/write disk speeds on both sides for a single thread? If you do FIO, you often end up testing multi-core performance, when something like PBS only uses a single thread per job. You can schedule multiple jobs simultaneously to use more CPU/bandwidth, or you could (like I did once) schedule them all at the same time and have 21 nodes trying to stream over 2x10Gbps, at which point yes less than gigabit is expected, so now I have my schedule better.

The other thing I noticed is that backups only backup changes, so, while it is looping over your disks it may not need to stream more than a gigabit worth of data over the network. Also you may need to enable fleecing depending on your disk workloads.

So there are lots of variables, I would benchmark both sides first and set expectations accordingly.

relink · Mar 6, 2025

Thank you everyone for your replies <3

May I answer ur questions:

1. Yes, you are right: It was 1.8 GibiByte while backing up - not Gbit/s. Sorry

2. Both storage nodes are running 24x Exos Enterprise SAS HDDs each in ZFS Raid 2 totalling in by nearly 330 Terrabyte + 290 Terrabyte (Yes I know, only Enterprise SSDs are wished), performance output should be well over 1Gbit/s? Due to experience a friendly company of mine running the same setup, I know it should read / write at around 200-250 MB/s, which would be the observed 1,8 Gbit/s.

Why would the backup process achieve the bandwith, but the verify process be limited to 1Gbit/s?

3. PBS running in VM, ZFS Raid hosted by PVE host, passed through via virtio to local PBS VM on that host. Therefore PBS is accessing the disks "locally"?

4. @guruevi: I dont understand yet, how come that you assume that read spead is faster than network?
Your point was good, that it could be multiple task at the same time, which are limiting the bandwith of each process - but while I observe the verify process at 1Gbit/s, there is nothing else running on or to that PBS

If run all backup jobs at the time, the Ceph cluster muscles up, und pumps the data - but due to retransmissions and lack of write speeds on PBS nodes (cant handle NVME + SSD traffic) I timed the backups to have pauses inbetween.

May you give me a hand, and explain me how i could to the benchmarks accordingly to your instructions?

<3

Thank you everyone, I'm lost without you!

guruevi · Mar 6, 2025

24 disks in a single RAIDZ2 is not recommended, I would say split them up in 3x8 RAIDZ2 and if possible, add a spare or 2. A single vdev will have the throughput speed of the slowest disk in the array which is truthfully about .8-1.6 Gbps (100MB/s for 5400RPM 200MB/s for 15k RPM) on spinning disks, 3 vdev will have the speed of 3 vdev etc.

relink · Mar 18, 2025

Very speed is still limited to < 1 Gbit/s :/

Need more help on this :/

_gabriel · Mar 18, 2025

HDD are too slow for PBS even more with ZFS.
Perhaps RAID10 mdadm or RAID10 HW will give some boost.
But verifiy job and gargabe collect will be slow.
BTW, double check there isn't overlapped tasks, eg: backup during a GC or Verify Tasl ...

Johannes S · Mar 19, 2025

_gabriel said:
HDD are too slow for PBS even more with ZFS.
Perhaps RAID10 mdadm or RAID10 HW will give some boost.

This. At least garbage collect will profit from a ssd mirror as special device though:
https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_special_device

And I doubt that mdadm will make a difference, nothing beats physics

guruevi · Mar 19, 2025

You can set up mirrors with ZFS as well, as I said before, a single VDEV will have the speed of the slowest disk. Spreading your load over more VDEV will improve it, so yes, having 12 mirror VDEV will be faster than a single 24 disk RAIDZ2. Over 3x8 RAIDZ2 on 7200RPM, you can definitely push ~2Gbps, add a few SSD for cache and ZLOG, I was able to get 14 nodes backed up by spreading the load over 24 hours.

Help PBS speed fixed 1gbs

Member

Attachments

Famous Member

Member

Famous Member

Member

Member

Attachments

Famous Member

Member

Famous Member

Famous Member

New Member

Attachments

Famous Member

Famous Member

Well-Known Member

New Member

Well-Known Member

New Member

Famous Member

Famous Member

Well-Known Member

We value your privacy