Slow network connection from VM1 to host2. Fast connection from VM1 on host1 to VM2 on host2, Fast connection from host2 to VM1.

egulatee

New Member
Feb 26, 2022
2
0
1
49
Hi All,

I've been digging through the forums and posts...

I have an issue with slow network performance from VM1 to host2 which doesn't host the VM.

Context:
I have 2 separate 1GB switches (One for corosync) which use VLANs.
I have Intel NUCs & Chromeboxes and am using an USB 1GB adapter for corosync. (It's been rock solid..)
I am using virtio network adapter in VM.
I am using pfsense as my FW and don't think I have any traffic shaping or limiting!

What I've tried
Ensure I am using VirtIO Network adapter on my VMs. (I've tried switching between that and others and back again, no change)
Tried setting multi queue on my VMs, but AFAIK pfsense doesn't support it. I didn't see a significant difference anyhow.
I've Disable hardware checksum offload on my pfsense. No impact.
I've tried increasing the VM CPU and memory and it didn't make a difference. However given that I can get 1GB/s speed from VM to VM and the opposite path that has an issue from host2 to VM1. It doesn't seem like a VM CPU or memory setting. [I've tried changing those, doesn't make a difference]
I've tried changing the CPU type (Doesn't make a difference).
I've tried enabling NUMA. (Doesn't make a difference)

Any guidance would be greatly appreciated. I will happily attach any needed files.

IPerf data below.

VM1 -> Host2 (Not hosting VM1)
------------------------------------------------------------
Client connecting to nuc9034, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 1] local 172.17.50.205 port 48552 connected with 172.17.10.58 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-20.4164 sec 547 KBytes 220 Kbits/sec

VM1 on host1 -> VM2 on host2
------------------------------------------------------------
Client connecting to k3s-node1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 1] local 172.17.50.205 port 50604 connected with 172.17.50.101 port 5001
[ ID] Interval Transfer Bandwidth
[ 1] 0.0000-10.0463 sec 1.10 GBytes 937 Mbits/sec

Host2(Not hosting VM1) -> VM1
------------------------------------------------------------
Client connecting to networktest, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 172.17.50.58 port 52086 connected with 172.17.50.205 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0019 sec 1.08 GBytes 928 Mbits/sec
 
Last edited:
I've found the following threads:
https://software.es.net/iperf/faq.html

TCP throughput drops to (almost) zero during a test, what’s going on?
A drop in throughput to almost zero, except maybe for the first reported interval(s), may be related to problems in NIC TCP Offload, which is used to offload TCP functionality to the NIC (see https://en.wikipedia.org/wiki/TCP_offload_engine). The goal of TCP Offload is to save main CPU performance, mainly in the areas of segmentation and reassembly of large packets and checksum computation.

When TCP packets are sent with the “Don’t Fragment” flag set, which is the recommended setting, segmentation is done by the TCP stack based on the reported next hop MSS in the ICMP Fragmentation Needed message. With TCP Offload, active segmentation is done by the NIC on the sending side, which is known as TCP Segmentation offload (TSO) or in Windows as Large Send Offload (LSO). It seems that there are TSO/LSO implementations which for some reason ignore the reported MSS and therefore don’t perform segmentation. In these cases, when large packets are sent, e.g. the default iperf3 128KB (131,072 bytes), iperf3 will show that data was sent in the first interval, but since the packets don’t get to the server, no ack is received and therefore no data is sent in the following intervals. It may happen that after certain timeout the main CPU will re-send the packet by re-segmenting it, and in these cases data will get to the server after a while. However, it seems that segmentation is not automatically continued with the next packet, so the data transfer rate be very low.

The recommended solution in such a case is to disable TSO/LSO, at least on the relevant port. See for example:https://atomicit.ca/kb/articles/slow-network-speed-windows-10/. If that doesn’t help then “Don’t Fragment” TCP flag may be disabled. See for example: https://support.microsoft.com/en-us...gs-for-wan-links-with-a-mtu-size-of-less-than. However, note that disabling the “Don’t Fragment” flag may cause other issues.

To test whether TSO/LSO may be the problem, do the following:

  • If different machine configurations are used for the client and server, try the iperf3 reverse mode (-R). If TSO/LSO is only enabled on the client machine, this test should succeed.
  • Reduce the sending length to a small value that should not require segmentation, using the iperf3 -l option, e.g. -l 512. It may also help to reduce the MTU by using the iperf3 -M option, e.g. -M 1460.
  • Using tools like Wireshark, identify the required MSS in the ICMP Fragmentation Needed messages (if reported). Run tests with the -l value set to 2 times the MSS and then 4 times, 6 times, etc. With TSO/LSO issue in each test the throughput should be reduced more. It may help to increase the testing time beyond the default 10 seconds to better see the behavior (iperf3 -toption).



Iperf3 from VM to host (Another one is slow)
iperf3 -c nuc9034 -l 512 -M 1460


Connecting to host nuc9034, port 5201


[ 5] local 172.17.50.205 port 33646 connected to 172.17.10.58 port 5201


[ ID] Interval Transfer Bitrate Retr Cwnd


[ 5] 0.00-1.00 sec 365 KBytes 2.99 Mbits/sec 2 1.41 KBytes


[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes


[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes


[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes


[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes


[ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes


[ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes


[ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes


[ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes


[ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes


- - - - - - - - - - - - - - - - - - - - - - - - -


[ ID] Interval Transfer Bitrate Retr


[ 5] 0.00-10.00 sec 365 KBytes 299 Kbits/sec 5 sender


[ 5] 0.00-10.04 sec 64.1 KBytes 52.3 Kbits/sec receiver

Iperf3 reverse (Host to VM) is fast.

Connecting to host nuc9034, port 5201


Reverse mode, remote host nuc9034 is sending


[ 5] local 172.17.50.205 port 36306 connected to 172.17.10.58 port 5201


[ ID] Interval Transfer Bitrate


[ 5] 0.00-1.00 sec 82.2 MBytes 690 Mbits/sec


[ 5] 1.00-2.00 sec 86.1 MBytes 722 Mbits/sec


[ 5] 2.00-3.00 sec 86.5 MBytes 726 Mbits/sec


[ 5] 3.00-4.00 sec 86.1 MBytes 722 Mbits/sec


[ 5] 4.00-5.00 sec 86.4 MBytes 725 Mbits/sec


[ 5] 5.00-6.00 sec 87.6 MBytes 735 Mbits/sec


[ 5] 6.00-7.00 sec 89.6 MBytes 751 Mbits/sec


[ 5] 7.00-8.00 sec 86.1 MBytes 723 Mbits/sec


[ 5] 8.00-9.00 sec 83.4 MBytes 700 Mbits/sec


[ 5] 9.00-10.00 sec 88.1 MBytes 739 Mbits/sec


- - - - - - - - - - - - - - - - - - - - - - - - -


[ ID] Interval Transfer Bitrate Retr


[ 5] 0.00-10.04 sec 868 MBytes 725 Mbits/sec 0 sender


[ 5] 0.00-10.00 sec 862 MBytes 723 Mbits/sec receiver

So it may seem to be a TSO issue.
Which led me to the following threads:
https://forum.proxmox.com/threads/tso-offloading-problem-with-igb-whilst-ixgbe-is-fine.56899/
https://forum.proxmox.com/threads/e1000-driver-hang.58284/
 
Last edited:
Recently, I have discovered an TSO issue on PVE7.0
Some tap interfaces does not have the TSO flap set
To check if TCP segmentation offload is correctly set execute ethtool on tap iface
Code:
ethtool -K tap***i*| grep offload
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!