2 nics, how to double migration speed

zezinho

New Member
Sep 12, 2019
11
2
3
France
mageia.org
hi, I have configured a cluster, each node with two GB nics in LACP with a linux bond.
All seems working in each node :
cat /sys/class/net/bond0/speed -> 2000

In the H3C 5500 switches I also can see 2Gb teaming is up.

Still, VM migration only uses the first nic, up till 120MB/s. If I unplug this first cable, the second one becomes active.

Am I missing something to check?

Thanks.
 
is your storage fast enough to handle the traffic?
Yes the storage is good enough, it goes above 500MB/s.

The network is configured like the documentation indicates, with the (802.3ad)(LACP) option. What is hard to find is if linux kernel is really using both links of a bond. Here is what I can read in /sys :

/sys/class/net/bond0/bonding/active_slave
/sys/class/net/bond0/bonding/ad_actor_key
9
/sys/class/net/bond0/bonding/ad_actor_sys_prio
65535
/sys/class/net/bond0/bonding/ad_actor_system
00:00:00:00:00:00
/sys/class/net/bond0/bonding/ad_aggregator
2
/sys/class/net/bond0/bonding/ad_num_ports
2
/sys/class/net/bond0/bonding/ad_partner_key
5
/sys/class/net/bond0/bonding/ad_partner_mac
bc:ea:fa:3d:ac:f6
/sys/class/net/bond0/bonding/ad_select
stable 0
/sys/class/net/bond0/bonding/ad_user_port_key
0
/sys/class/net/bond0/bonding/all_slaves_active
0
/sys/class/net/bond0/bonding/arp_all_targets
any 0
/sys/class/net/bond0/bonding/arp_interval
0
/sys/class/net/bond0/bonding/arp_ip_target
/sys/class/net/bond0/bonding/arp_validate
none 0
/sys/class/net/bond0/bonding/downdelay
0
/sys/class/net/bond0/bonding/fail_over_mac
none 0
/sys/class/net/bond0/bonding/lacp_rate
slow 0
/sys/class/net/bond0/bonding/lp_interval
1
/sys/class/net/bond0/bonding/miimon
100
/sys/class/net/bond0/bonding/mii_status
up
/sys/class/net/bond0/bonding/min_links
0
/sys/class/net/bond0/bonding/mode
802.3ad 4
/sys/class/net/bond0/bonding/num_grat_arp
1
/sys/class/net/bond0/bonding/num_unsol_na
1
/sys/class/net/bond0/bonding/packets_per_slave
1
/sys/class/net/bond0/bonding/primary
/sys/class/net/bond0/bonding/primary_reselect
always 0
/sys/class/net/bond0/bonding/queue_id
eno1:0 eno2:0
/sys/class/net/bond0/bonding/resend_igmp
1
/sys/class/net/bond0/bonding/slaves
eno1 eno2
/sys/class/net/bond0/bonding/tlb_dynamic_lb
1
/sys/class/net/bond0/bonding/updelay
0
/sys/class/net/bond0/bonding/use_carrier
1
/sys/class/net/bond0/bonding/xmit_hash_policy
layer2 0



/sys/class/net/bond0/carrier
1
/sys/class/net/bond0/carrier_changes
2
/sys/class/net/bond0/carrier_down_count
1
/sys/class/net/bond0/carrier_up_count
1

Maybe one can compare with his working system, so that we find the culprit?
 
AFAIK: LACP only aggregates bandwidth to different targets, so it won't help with the same target.
Thanks for both two answers. Using the hash policy "1 or layer3+4" allows to double the speed of 2 migrations to 2 different hosts run. That's better. And as LnxBil said, I can't expect more in LACP bond. Is there a bonding mode that allows double speed in a single copy?
 
problem with round-robin is packet ordering can't be wrong on switch side at reassembling and so packet drop/retransmit.
That's why lacp only use 1 link for 1 tcp/udp connection. But loadbalance multiple connections across multiple links.

I don't think that qemu migration have an option to use multiple tcp connections currently.

Edit:
It seem that qemu support it recently, it's called "multifds" feature
https://wiki.qemu.org/Features/Migration-Multiple-fds
libvirt also implement it
https://www.spinics.net/linux/fedora/libvir/msg179290.html

Nothing in proxmox to handle it currently, but It could be added easily I think.