Very Slow iPerf performance from Proxmox VM to VMs on different host

carpenike · Jul 9, 2019

I started this thread on Reddit (can't post link) as well, hopefully someone has some thoughts!

Strange issue with cross-host VM-to-VM communication... First my setup:

3x Proxmox hosts (5.4-10)
10GB networking
Unifi US-16-XG switch
Qlogic BR-1860 NICs. Single port from each host connected to switch, no LACP. Configured as NICs (no CNA)
Jumbo frames enabled on CEPH ports, not enabled on VLAN network
Networking configured with OpenvSwitch
CEPH storage within the Proxmox cluster -- very slow, suspecting it's due to this slow communication between hosts
Only network connected to the hosts is the 10GB port.
VMs are using all virtio.

When using iPerf from host-to-host I get good speeds:

When using iPerf from VM to its local host speeds are good too:

However, going from vm to a different host in the cluster results are quite bad:

VM to VM iPerf on the same host is good:

VM to VM iPerf when they're on different hosts is not good:

Connectivity from outside the network into the VM is good though (1GB)

Here's one of the host's /etc/network/interfaces. They're all the same with the exception being the network interface names:

Code:

# Loopback interface
auto lo
iface lo inet loopback
# Bridge for our bond and vlan interfaces (our VMs will also attach to this bridge
auto vmbr0
allow-ovs vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports enp129s0f0 vlan20 vlan55
mtu 9000
allow-vmbr0 vlan20
iface vlan20 inet static
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=20
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.20.0.10
netmask 255.255.0.0
gateway 10.20.0.1
mtu 1500
# Physical interface for traffic coming into the system. Retag untagged
# traffic into vlan 1, but pass through other tags.
auto enp129s0f0
allow-vmbr0 enp129s0f0
iface enp129s0f0 inet manual
ovs_bridge vmbr0
ovs_type OVSPort
ovs_options tag=1 vlan_mode=native-untagged
# Alternatively if you want to also restrict what vlans are allowed through
# you could use:
# ovs_options tag=1 vlan_mode=native-untagged trunks=10,20,30,40
mtu 9000
# Ceph cluster communication vlan (jumbo frames)
allow-vmbr0 vlan55
iface vlan55 inet static
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=55
ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
address 10.55.0.10
netmask 255.255.0.0
mtu 9000

carpenike · Jul 10, 2019

It certainly appears to be related to the 10GB NIC.
I replaced the 10GB NIC with one of the onboard NICs in the hosts, connected to the same switch through the copper port.
Is there anything in particular that could be configured on the NIC it self?

Romsch · Jul 10, 2019

The best thing is that you use two switches and without OpenvSwitch. A switch for the nodes cluster and the other switch for Ceph cluster, the two 10 GB / s NICs must be physically separated.
Please check this guide, then you must have the full network speed.

best regards,
roman

carpenike · Jul 10, 2019

Thanks!

In this case though, I've got 2 VMs running in the cluster and abysmal performance on the 10 GB NICs when the guest leaves the local vSwitch destined for another host. When switching the 10GB to a 1GB onboard NIC with everything else the same, speeds go up to 1GB standard consistently.

cyrus104 · May 3, 2020

I know this is an old thread and probably dead but Carpenike did you resolve your issue.

I'm running into a very similar issue and would greatly benefit if you found any resolution to your problem.

I have 2 VLANs for node cluster and one for Ceph, all on the same switch... the same Unifi US-16-XG. With a 3 node cluster, I am using 6 interface on the switch. Is this switch the issue?

Again sorry to revive a dead post but hoping a resolution was found.

carpenike said:
Thanks!

In this case though, I've got 2 VMs running in the cluster and abysmal performance on the 10 GB NICs when the guest leaves the local vSwitch destined for another host. When switching the 10GB to a 1GB onboard NIC with everything else the same, speeds go up to 1GB standard consistently.

carpenike · May 3, 2020

cyrus104 said:
I know this is an old thread and probably dead but Carpenike did you resolve your issue.

I'm running into a very similar issue and would greatly benefit if you found any resolution to your problem.

I have 2 VLANs for node cluster and one for Ceph, all on the same switch... the same Unifi US-16-XG. With a 3 node cluster, I am using 6 interface on the switch. Is this switch the issue?

Again sorry to revive a dead post but hoping a resolution was found.

Hi there!

Sorry I won’t be much help... I moved to bare metal and now running everything in Kubernetes. Fixed my 10gb perf problems.

Search

Search

Very Slow iPerf performance from Proxmox VM to VMs on different host

carpenike

New Member

carpenike

New Member

Romsch

Well-Known Member

carpenike

New Member

cyrus104

Active Member

carpenike

New Member