This is a paste from another forum where I posted better information about my issue. I hope this helps me figure out what is going on.
Hi folks! I'm neck deep in a network performance tuning issue, and I'm hopeful that perhaps someone will have some insight. The overall goal is technically to optimize my 10gbe interfaces for optimal throughput, though the condition that I have noticed I believe it was it causing such odd results.
My setups:
I have 6 different freenas machines interfacing to 6 different proxmox machines split between two sites. The first site is in production (unfortunately) and does exhibit the problems that I'm seeing (low network performance). My second site is in development still, we have not brought it online yet, so I can easily make changes on multiple systems at a whim (Woohoo!). Within this second site, I have two specific machines that I am focusing on. Both started with fresh installations of the latest Freenas and Proxmox respectively. This is basically my test bed within the development site. On Freenas there is no defined storage, no NFS shares, no iScsi. On proxmox, there are no virtual machines, and the NIC's in question do not operate as bridges. They are directly defined in interfaces. There is no traffic beyond the traffic that I create while testing.
The hardware:
All systems are using Intel network cards of various flavors. All of the servers themselves are on Xeon CPU's with a minimum of 32gb of memory.
I'm using Netgear M4300-8X8F switches. **Currently not running LACP**, I have disabled this during testing. I'm only using a single switch at a time. I have replicated the issue on two different switches. The Netgear switches are at most current firmware as of 2 weeks ago. I have flow control enabled symmetrically on the switch ports.
Some Tests:
Tests have been performed with iperf3.
First a good test. Freenas running the iperf3 server (iperf3 -s), Proxmox running the client ( iperf3 -c 10.200.108.65):
Hi folks! I'm neck deep in a network performance tuning issue, and I'm hopeful that perhaps someone will have some insight. The overall goal is technically to optimize my 10gbe interfaces for optimal throughput, though the condition that I have noticed I believe it was it causing such odd results.
My setups:
I have 6 different freenas machines interfacing to 6 different proxmox machines split between two sites. The first site is in production (unfortunately) and does exhibit the problems that I'm seeing (low network performance). My second site is in development still, we have not brought it online yet, so I can easily make changes on multiple systems at a whim (Woohoo!). Within this second site, I have two specific machines that I am focusing on. Both started with fresh installations of the latest Freenas and Proxmox respectively. This is basically my test bed within the development site. On Freenas there is no defined storage, no NFS shares, no iscsi. On proxmox, there are no virtual machines, and the nic's in question do not operate as bridges. They are directly defined in interfaces. There is no traffic beyond the traffic that I create while testing.
The hardware:
All systems are using Intel network cards of various flavors. All of the servers themselves are on Xeon CPU's with a minimum of 32gb of memory.
I'm using netgear M4300-8X8F switches. **Currently not running LACP**, I have disabled this during testing. I'm only using a single switch at a time. I have replicated the issue on two different switches. The netgears are at most current firmware as of 2 weeks ago. I have flow control enabled symmetrically on the switch ports.
Some Tests:
All tests have been performed with iperf3.
First a good test:
Freenas running the iperf3 server (iperf3 -s), Proxmox running the client ( iperf3 -c 10.200.108.65):
Connecting to host 10.200.108.65, port 5201
[ 5] local 10.200.108.45 port 44200 connected to 10.200.108.65 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.09 GBytes 9.40 Gbits/sec 0 1.28 MBytes
[ 5] 1.00-2.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 2.00-3.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 3.00-4.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 4.00-5.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 5.00-6.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 6.00-7.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 7.00-8.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 8.00-9.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
[ 5] 9.00-10.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.28 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.9 GBytes 9.38 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 10.9 GBytes 9.38 Gbits/sec receiver
A decent test for sure. The connection speed is stable, there were no retries and the CWND stays stable and consistent.
Now, let's try Freenas to Freenas. In this case I will use my test bed (10.200.108.65) as the client (it was the server in the first) and a second freenas machine as the server:
Connecting to host 10.200.108.15, port 5201
[ 5] local 10.200.108.65 port 44268 connected to 10.200.108.15 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 989 MBytes 8.28 Gbits/sec 0 1.62 MBytes
[ 5] 1.00-2.00 sec 1.09 GBytes 9.37 Gbits/sec 0 1.62 MBytes
[ 5] 2.00-3.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.62 MBytes
[ 5] 3.00-4.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.62 MBytes
[ 5] 4.00-5.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.62 MBytes
[ 5] 5.00-6.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.62 MBytes
[ 5] 6.00-7.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.62 MBytes
[ 5] 7.00-8.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.62 MBytes
[ 5] 8.00-9.00 sec 1.09 GBytes 9.38 Gbits/sec 0 1.62 MBytes
[ 5] 9.00-10.00 sec 1.09 GBytes 9.39 Gbits/sec 0 1.62 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.8 GBytes 9.27 Gbits/sec 0 sender
[ 5] 0.00-10.00 sec 10.8 GBytes 9.27 Gbits/sec receiver
Pretty decent as well on this test. I'm great with anything 9+ Gbits when it is this stable. No retries, consistent CWND.
And now let me break things. This will have the same client running Freenas (10.200.108.65) and we will use the Proxmox machine from the first test as the server (10.200.108.45)
Connecting to host 10.200.108.45, port 5201
[ 5] local 10.200.108.65 port 44270 connected to 10.200.108.45 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 557 MBytes 4.67 Gbits/sec 1 204 KBytes
[ 5] 1.00-2.00 sec 873 MBytes 7.32 Gbits/sec 0 285 KBytes
[ 5] 2.00-3.00 sec 1.01 GBytes 8.65 Gbits/sec 1 82.7 KBytes
[ 5] 3.00-4.00 sec 809 MBytes 6.78 Gbits/sec 0 262 KBytes
[ 5] 4.00-5.00 sec 997 MBytes 8.37 Gbits/sec 0 331 KBytes
[ 5] 5.00-6.00 sec 1.00 GBytes 8.60 Gbits/sec 1 204 KBytes
[ 5] 6.00-7.00 sec 872 MBytes 7.31 Gbits/sec 0 285 KBytes
[ 5] 7.00-8.00 sec 1.02 GBytes 8.72 Gbits/sec 0 348 KBytes
[ 5] 8.00-9.00 sec 835 MBytes 7.01 Gbits/sec 1 257 KBytes
[ 5] 9.00-10.00 sec 718 MBytes 6.02 Gbits/sec 1 230 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 8.55 GBytes 7.35 Gbits/sec 5 sender
[ 5] 0.00-10.15 sec 8.55 GBytes 7.24 Gbits/sec receiver
Phew. This is really bad for me since stable and reliable transfer is going to be super important once I run some storage on this network. My issues with this test are the low speed, the high amount of fluctuation in speed, the amount of retries (yes i know it's only 5) along with the unstable CWND.
In my research, I've tried a lot of different sysctl tunables for both Proxmox and Freenas with no resultant better or more stable performance. Though the tests above are nearly clean and without any tuneables. Certainly out of box my results are as bad if not worse.
It's my belief that for whatever reason, when Freenas is sending data over to Proxmox (Debian really), they are unable to agree on stable buffer settings, and this is causing the issues with the congestion window (CWND) fluctuating greatly. I believe that the retries are coming from CWND going up and down and perhaps a "full" window not allowing a packet in.