Dedicated server OVHCloud SCALE-3 - Private bandwidth 12 Gbps - vRack - Guaranteed and expected network speed not reached

John_Smith · Nov 10, 2021

Hello,

We have three dedicated OVHCloud SCALE-3 servers. The first in the Roubaix (France) data center (RBX8), the other two in the Strasbourg (France also) data center (SBG3).

We have taken the private 12 Gbps in / out bandwidth option on these three SCALE-3 dedicated servers.

Reference: https://www.ovhcloud.com/en/bare-metal/scale/scale-3/

The GNU / Linux distribution Proxmox 7.0 is installed on all servers (but same problem with Proxmox 6).

All three servers are in the same vRack (reference: https://us.ovhcloud.com/products/networking/vrack-private-network).

The problem is: we are unable to achieve effective 12Gbps private bandwidth between our various servers.

After many discussions with OVH support and numerous interventions by their teams on site, it turns out that the problem is not hardware and the on-site network configuration is correct.

The problem would therefore be software side, more precisely a network configuration fault in our Proxmox system.

OVH support managed to achieve the right speed (12Gbps incoming / outgoing) via their rescue mode (which is a Debian 8 base…) using their parameters (which were not communicated to us in detail…). The test they used to confim it is a simple:

iperf3 -c 192.168.1.10 -i 10 -t 100 -P 16 | grep "SUM"

Here is the solution offered by OVH support:

“Set the MTU to 9000 and optimize the TCP Windows Size according to latency and bandwidth.
You need to know the latency between your servers on average.
Here is an example calculation for a bandwidth of 10 Gbps and a latency of 80.1 ms:

- Binding bandwidth in bits * latency = window size / 8 = perfect size in bytes.
- 10737418240 bits * 0.0801 = 860067201/8 = 107508400 bytes. (102MBytes)

Under Linux, you will need to modify the "TCP buffer" settings via "sysctl tcp_rmem, tcp_wmem and tcp_mem".
/! \ I invite you to make a backup of your data before any manipulation for security reasons.
After modifying the MTU as well as the TCP buffer, your bandwidth should be much greater. "

We have done several tests by changing these values as indicated, indeed it modulates the bandwidth between the servers well, but still cannot stably reach the bandwidth of 12Gbps guaranteed by our option on the SCALE-3.

Here are the details of our tests (as a reminder the 3 SCALE-3 servers are in the same vRack we have subscribed to the 12Gbps option on the three):

Between the SCALE-3 RBX8 and the SCALE-3 SBG3 we have a very poor throughput of 2Gbps incoming / outgoing. All our tests to modulate the bandwidth have failed (it remains below 12Gbps and is unstable).

Between the two SCALE-3 SBG3s we have a standard 6Gbps speed (so still not the 12Gbps “guaranteed” by our option) BUT with the modifications made by OVH a theoretical bandwidth of 12Gbps could be reached (via iperf3 test).
I say theoretical because in practice the transfers of files and VMs within our Proxmox remain slow (do not use 12Gbps).

For the SCALE-3 RBX8, the only time it reaches 12Gbps is on the OVH support side, in rescue mode, with their parameter and their iperf3 theoretical test. In practice on our Proxmox, despite the various tests, the bandwidth remains catastrophic (even the standard 6Gbps is not reached, we remain at mediocre 2.2Gbps).

We are starting to lose our minds a bit, so if someone has faced this scenario before (not specificaly on OVHCloud dedicated servers):

Would it be possible to know the exact network configuration to apply to reach the private 12Gbps bandwidth between several dedicated servers using Proxmox (so based on Debian Linux) in the same vRack?

Thank you.

spirit · Nov 11, 2021

do you use same test than ovh support ?
"iperf3 -c 192.168.1.10 -i 10 -t 100 -P 16" ?

mtu && windows size should help a little bit. (like going from 9gbit/s to 10gbit/S) because of tcp headers overheard.

with a standard 10gibit link, no special tuning, and a simple iperf command without tuning, you should reach 7-8gbit/s easily

but if you have only 2gbit/s, something is really wrong.

(what is the physical link speed ? I never have seen 12gbit/s link, so I would like t known if it's some kind of Qos on their side, or agreggation of multiple 2,5gbit/s link ?)

can you send your /etc/network/interfaces of 2 differents servers on each DC ?

VictorSTS · Nov 11, 2021

Seems like that Scale-3 servers use at least a 25Gbps interface for the vRack network, as that is the max speed supported in that server for vRack.
The oput of lshw -class net could help getting the hardware details but, as no special setup is needed in the OS, I believe it will be a single interface.

Our biggest vRack has 4Gbps and we do get the full performance both within server in the same site and remote ones. Those servers have 10Gbps interfaces and get limited in their switch.

John_Smith · Nov 11, 2021

@spirit
If we use the same test that ovh support "iperf3 -c 192.168.1.10 -i 10 -t 100 -P 16" we get 10.5 Gbits/sec
If we use the same command but with "-P 1" we get 500 Mbits/sec instead
If we do an scp between our serveur in RBX8 and the one in SBG3: we get 180 MBytes/sec
If we do an scp between our two servers in SBG3: we get 490 MBytes/sec

Here are the network configuration of the two servers we are testing:

Code:

## RBX8 server

root@proxmox-1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp133s0f0np0 inet manual
  mtu 9000
iface enp133s0f1np1 inet manual
  mtu 9000
iface enp193s0f0np0 inet manual
iface enp193s0f1np1 inet manual
iface enxb6270ec3364f inet manual

auto bond0
iface bond0 inet manual
      bond-slaves enp133s0f0np0 enp133s0f1np1
      bond-miimon 100
      bond-mode 802.3ad
      bond-xmit-hash-policy layer2+3
      mtu 9000

auto vmbr0
iface vmbr0 inet static
    address 192.168.11.2/24
    gateway 192.168.11.254
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
        mtu 9000

Code:

## SGB3 server

root@proxmox-2:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp193s0f0np0 inet manual

iface enxce6fd28b41b9 inet manual

iface enp193s0f1np1 inet manual

iface enp133s0f0np0 inet manual
  mtu 9000

iface enp133s0f1np1 inet manual
  mtu 9000

iauto bond0
iface bond0 inet manual
      bond-slaves enp133s0f0np0 enp133s0f1np1
      bond-miimon 100
      bond-mode 802.3ad
      bond-xmit-hash-policy layer2+3
      mtu 9000

auto vmbr0
iface vmbr0 inet static
    address 192.168.11.21/24
    gateway 192.168.11.254
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
        mtu 9000

@VictorSTS Here is the output of the command you asked for, all of our servers share the same characteristics (@spirit it also answers your question about physical link speed)

Code:

## RBX8 server

root@proxmox-1:~# lshw -class net
  *-network:0
       description: Ethernet interface
       product: MT27800 Family [ConnectX-5]
       vendor: Mellanox Technologies
       physical id: 0
       bus info: pci@0000:85:00.0
       logical name: enp133s0f0np0
       logical name: /dev/fb0
       version: 00
       serial: 62:d3:ff:5d:df:9d
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd autonegotiation fb
       configuration: autonegotiation=on broadcast=yes depth=32 driver=mlx5_core driverversion=5.11.22-7-pve duplex=full firmware=16.31.1014 (MT_0000000425) latency=0 link=yes mode=1024x768 multicast=yes slave=yes visual=truecolor xres=1024 yres=768
       resources: iomemory:2000-1fff irq:349 memory:2004c000000-2004dffffff memory:bb100000-bb1fffff memory:2004e800000-2004effffff
  *-network:1
       description: Ethernet interface
       product: MT27800 Family [ConnectX-5]
       vendor: Mellanox Technologies
       physical id: 0.1
       bus info: pci@0000:85:00.1
       logical name: enp133s0f1np1
       version: 00
       serial: 62:d3:ff:5d:df:9d
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=5.11.22-7-pve duplex=full firmware=16.31.1014 (MT_0000000425) latency=0 link=yes multicast=yes slave=yes
       resources: iomemory:2000-1fff irq:414 memory:2004a000000-2004bffffff memory:bb000000-bb0fffff memory:2004e000000-2004e7fffff
  *-network:0 DISABLED
       description: Ethernet interface
       product: MT27800 Family [ConnectX-5]
       vendor: Mellanox Technologies
       physical id: 0
       bus info: pci@0000:c1:00.0
       logical name: enp193s0f0np0
       version: 00
       serial: 04:3f:72:c0:8a:9a
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=5.11.22-7-pve firmware=16.31.1014 (MT_0000000248) latency=0 link=no multicast=yes
       resources: iomemory:1800-17ff irq:219 memory:1801c000000-1801dffffff memory:c0100000-c01fffff memory:1801e800000-1801effffff
  *-network:1 DISABLED
       description: Ethernet interface
       product: MT27800 Family [ConnectX-5]
       vendor: Mellanox Technologies
       physical id: 0.1
       bus info: pci@0000:c1:00.1
       logical name: enp193s0f1np1
       version: 00
       serial: 04:3f:72:c0:8a:9b
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=5.11.22-7-pve firmware=16.31.1014 (MT_0000000248) latency=0 link=no multicast=yes
       resources: iomemory:1800-17ff irq:284 memory:1801a000000-1801bffffff memory:c0000000-c00fffff memory:1801e000000-1801e7fffff
  *-network:0
       description: Ethernet interface
       physical id: 1
       logical name: bond0
       serial: 62:d3:ff:5d:df:9d
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=5.11.22-7-pve duplex=full firmware=2 link=yes master=yes multicast=yes
  *-network:1
       description: Ethernet interface
       physical id: 2
       logical name: vmbr0
       serial: 62:d3:ff:5d:df:9d
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=192.168.11.2 link=yes multicast=yes

spirit · Nov 12, 2021

ok.

1)
first you should use "layer3+4" for your lacp bond.

lacp loadbalance 1tcp connection always on 1 interface.
if you have multiple tcp connection, il'll loadbalance with a hash based on:

layer2+3: srcmac dstmac srcip dstip.
that mean than between 2 servers, it'll always use 1 interface

with layer3+4: srcip dstip srcport dstport:
that mean than between 2 servers, i'll load balance each tcp connection (differents src/dst port) on differents interfaces.

of course, if you do a simple "scp", it's only used 1 link. (and of course, storage can limit performance too).

2)

Now, with iperf, using -P, it's create more parralel tcp connections.
I have done test with my mellanox card (connect-x5 too), I'm able to reach 10gbit/s without any special iperf tuning, "-P 1".
So, maybe ovh is doing some rate limiting based on the tcp connection stream ?
I'm really not sure, you should ask to ovh.

Edit: I'm doing it locally, with mtu 9000. Not sure about the latency impact between the 2DC.

Code:

 iperf -c X.X.X.X -i 10 -t 100
------------------------------------------------------------
Client connecting to kvm1.odiso.net, TCP port 5001
TCP window size: 1.00 MByte (default)
------------------------------------------------------------
[  3] local 10.3.98.142 port 38890 connected with 10.3.99.141 port 5001
^C[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 8.1 sec  9.07 GBytes  9.57 Gbits/sec

John_Smith · Nov 12, 2021

1)
Already tested previously, without any impact on the performance.
Furthermore, the message we got from OVH about this specific configuration was that using "layer3+4" was "not supported by their switch". We tested it anyway and miraculously it worked perfectly (?!) (but unfortunately still with zero impact on performance).
We even tried to install a specialised network card driver but we just managed to crash the server ^^'

We are completely out of ideas, and like you we are finding it more and more likely that a rate limit somewhere is at fault, despise OVH insistence of the contrary. The latency between our two DC is 10ms, I don't think it is sufficient to explain why we don't even get 1Gbps. We will try to get something out of OVH, we will keep you posted if anything comes of it and if you have any more idea to explore until then, we will continue investigating.

Thank you for all of your inputs until now

spirit · Nov 12, 2021

John_Smith said:
1)
Already tested previously, without any impact on the performance.
Furthermore, the message we got from OVH about this specific configuration was that using "layer3+4" was "not supported by their switch". We tested it anyway and miraculously it worked perfectly (?!) (but unfortunately still with zero impact on performance).
We even tried to install a specialised network card driver but we just managed to crash the server ^^'

the layer3+4, is for packets going out your server.
on the other side, for incoming packet to target server, it's the outgoing packet from the ovh physical switch (so maybe they don't support it because they used cheap switch).

but both can be different.

John_Smith said:
We are completely out of ideas, and like you we are finding it more and more likely that a rate limit somewhere is at fault, despise OVH insistence of the contrary. The latency between our two DC is 10ms, I don't think it is sufficient to explain why we don't even get 1Gbps. We will try to get something out of OVH, we will keep you posted if anything comes of it and if you have any more idea to explore until then, we will continue investigating.

I think in the current situation, it's really depend of the application runnings. (if you do a lot of parallel "small" tcp connection (<500mbits).
or if you really need high bandwidth single tcp connection.

Search

Search

Dedicated server OVHCloud SCALE-3 - Private bandwidth 12 Gbps - vRack - Guaranteed and expected network speed not reached

John_Smith

New Member

spirit

Distinguished Member

VictorSTS

Renowned Member

John_Smith

New Member

spirit

Distinguished Member

John_Smith

New Member

spirit

Distinguished Member