Poor performance of live migration

m4rek11

Well-Known Member
Jan 3, 2020
33
1
48
35
Hello!

I have a Proxmox cluster built with 5 nodes. Each of them has a Mellanox ConnectX-4 or 5 network card with 100 Gb/s. The servers are configured identically in terms of network interfaces. On these cards, I have LACP set up, followed by bridges, and VLANs on top of them (I’m attaching a screenshot).

prox_network.png
For live migration, I use a dedicated VLAN (bond0.107), where I have configured the network for this specific VLAN along with type=insecure. Nothing else runs over this VLAN. Additionally, all Proxmox nodes are connected to a 100 Gb/s switch.

Of course jumbo is set on that vlan (MTU 9000).

Code:
ping -M do -s 8000 172.16.107.16
PING 172.16.107.16 (172.16.107.16) 8000(8028) bytes of data.
8008 bytes from 172.16.107.16: icmp_seq=1 ttl=64 time=0.265 ms
8008 bytes from 172.16.107.16: icmp_seq=2 ttl=64 time=0.233 ms
8008 bytes from 172.16.107.16: icmp_seq=3 ttl=64 time=0.226 ms
8008 bytes from 172.16.107.16: icmp_seq=4 ttl=64 time=0.215 ms
^C
--- 172.16.107.16 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3075ms
rtt min/avg/max/mdev = 0.215/0.234/0.265/0.018 ms
root@NODE1:~# ping -M do -s 8000 172.16.107.15
PING 172.16.107.15 (172.16.107.15) 8000(8028) bytes of data.
8008 bytes from 172.16.107.15: icmp_seq=1 ttl=64 time=0.077 ms
8008 bytes from 172.16.107.15: icmp_seq=2 ttl=64 time=0.055 ms
8008 bytes from 172.16.107.15: icmp_seq=3 ttl=64 time=0.051 ms

I have an issue where the live migration speed is quite poor (in my opinion), as I can't achieve more than ~1.8 GiB/s (~15 Gb/s) and often it is much less, in MiB/s. Is there anything else I can do or check?

One thing that comes to mind is trying to change the "hash policy" for LACP, as I currently have only layer 2 (I would switch to 2+3). I would appreciate any suggestions—thank you!

W will be grateful for any tips.

P.S. I ran iperf3 tests on this VLAN, and for a single server, I achieved speeds of around 30 Gb/s, while with multiple servers, it reached around 50 Gb/s.

Example of live migration logs:



Code:
Header
Proxmox
Virtual Environment 8.3.3
Search
Node 'NODE1'
Server View
Logs
()
task started by HA resource agent
2025-02-12 12:22:06 use dedicated network address for sending migration traffic (172.16.107.15)
2025-02-12 12:22:06 starting migration of VM 110 to node 'NODE1' (172.16.107.15)
2025-02-12 12:22:06 starting VM 110 on remote node 'NODE1'
2025-02-12 12:22:11 start remote tunnel
2025-02-12 12:22:12 ssh tunnel ver 1
2025-02-12 12:22:12 starting online/live migration on tcp:172.16.107.15:60000
2025-02-12 12:22:12 set migration capabilities
2025-02-12 12:22:12 migration downtime limit: 100 ms
2025-02-12 12:22:12 migration cachesize: 4.0 GiB
2025-02-12 12:22:12 set migration parameters
2025-02-12 12:22:12 start migrate command to tcp:172.16.107.15:60000
2025-02-12 12:22:13 migration active, transferred 1.3 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:14 migration active, transferred 2.8 GiB of 32.0 GiB VM-state, 1.6 GiB/s
2025-02-12 12:22:15 migration active, transferred 4.3 GiB of 32.0 GiB VM-state, 1.5 GiB/s
2025-02-12 12:22:16 migration active, transferred 5.8 GiB of 32.0 GiB VM-state, 1.5 GiB/s
2025-02-12 12:22:17 migration active, transferred 7.3 GiB of 32.0 GiB VM-state, 1.5 GiB/s
2025-02-12 12:22:18 migration active, transferred 9.0 GiB of 32.0 GiB VM-state, 1.6 GiB/s
2025-02-12 12:22:19 migration active, transferred 10.7 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:20 migration active, transferred 12.4 GiB of 32.0 GiB VM-state, 1.7 GiB/s
2025-02-12 12:22:21 migration active, transferred 14.1 GiB of 32.0 GiB VM-state, 1.4 GiB/s
2025-02-12 12:22:22 migration active, transferred 15.6 GiB of 32.0 GiB VM-state, 1.5 GiB/s
2025-02-12 12:22:23 migration active, transferred 17.3 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:24 migration active, transferred 19.0 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:25 migration active, transferred 20.7 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:26 migration active, transferred 22.4 GiB of 32.0 GiB VM-state, 1.7 GiB/s
2025-02-12 12:22:27 migration active, transferred 24.2 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:28 migration active, transferred 25.9 GiB of 32.0 GiB VM-state, 1.7 GiB/s
2025-02-12 12:22:29 migration active, transferred 27.6 GiB of 32.0 GiB VM-state, 1.7 GiB/s
2025-02-12 12:22:30 migration active, transferred 29.3 GiB of 32.0 GiB VM-state, 1.8 GiB/s
2025-02-12 12:22:31 migration active, transferred 30.9 GiB of 32.0 GiB VM-state, 427.8 MiB/s
2025-02-12 12:22:33 migration active, transferred 31.0 GiB of 32.0 GiB VM-state, 16.5 MiB/s
2025-02-12 12:22:34 migration active, transferred 31.1 GiB of 32.0 GiB VM-state, 225.6 MiB/s
2025-02-12 12:22:35 auto-increased downtime to continue migration: 200 ms
2025-02-12 12:22:35 migration active, transferred 31.2 GiB of 32.0 GiB VM-state, 101.1 MiB/s
2025-02-12 12:22:35 xbzrle: send updates to 33617 pages in 5.1 MiB encoded memory, cache-miss 41.07%, overflow 62
2025-02-12 12:22:36 auto-increased downtime to continue migration: 400 ms
2025-02-12 12:22:37 migration active, transferred 31.2 GiB of 32.0 GiB VM-state, 107.6 MiB/s
2025-02-12 12:22:37 xbzrle: send updates to 69631 pages in 7.2 MiB encoded memory, cache-miss 6.98%, overflow 76
2025-02-12 12:22:38 migration active, transferred 31.2 GiB of 32.0 GiB VM-state, 86.3 MiB/s, VM dirties lots of memory: 114.3 MiB/s
2025-02-12 12:22:38 xbzrle: send updates to 99985 pages in 8.9 MiB encoded memory, cache-miss 25.69%, overflow 79
2025-02-12 12:22:38 auto-increased downtime to continue migration: 800 ms
2025-02-12 12:22:39 average migration speed: 1.2 GiB/s - downtime 252 ms
2025-02-12 12:22:39 migration status: completed
2025-02-12 12:22:43 migration finished successfully (duration 00:00:38)
TASK OK

Code:
2025-02-12 12:23:01 use dedicated network address for sending migration traffic (172.16.107.18)
2025-02-12 12:23:01 starting migration of VM 105 to node 'NODE3' (172.16.107.18)
2025-02-12 12:23:01 starting VM 105 on remote node 'NODE3'
2025-02-12 12:23:06 start remote tunnel
2025-02-12 12:23:08 ssh tunnel ver 1
2025-02-12 12:23:08 starting online/live migration on tcp:172.16.107.18:60000
2025-02-12 12:23:08 set migration capabilities
2025-02-12 12:23:08 migration downtime limit: 100 ms
2025-02-12 12:23:08 migration cachesize: 256.0 MiB
2025-02-12 12:23:08 set migration parameters
2025-02-12 12:23:08 start migrate command to tcp:172.16.107.18:60000
2025-02-12 12:23:09 migration active, transferred 368.5 MiB of 2.0 GiB VM-state, 403.9 MiB/s
2025-02-12 12:23:10 migration active, transferred 729.2 MiB of 2.0 GiB VM-state, 434.7 MiB/s
2025-02-12 12:23:11 migration active, transferred 1.1 GiB of 2.0 GiB VM-state, 348.2 MiB/s
2025-02-12 12:23:12 migration active, transferred 1.4 GiB of 2.0 GiB VM-state, 369.1 MiB/s
2025-02-12 12:23:13 average migration speed: 413.0 MiB/s - downtime 31 ms
2025-02-12 12:23:13 migration status: completed
2025-02-12 12:23:17 migration finished successfully (duration 00:00:17)
TASK OK

"ethtool" command results:


Code:
ethtool bond0
Settings for bond0:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 200000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes
root@NODE1:~# ethtool bond0.107
Settings for bond0.107:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 200000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes
root@NODE1:~# ethtool ens1f0np0
Settings for ens1f0np0:
        Supported ports: [ Backplane ]
        Supported link modes:   1000baseKX/Full
                                10000baseKR/Full
                                40000baseKR4/Full
                                40000baseCR4/Full
                                40000baseSR4/Full
                                40000baseLR4/Full
                                56000baseKR4/Full
                                25000baseCR/Full
                                25000baseKR/Full
                                25000baseSR/Full
                                50000baseCR2/Full
                                50000baseKR2/Full
                                100000baseKR4/Full
                                100000baseSR4/Full
                                100000baseCR4/Full
                                100000baseLR4_ER4/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None        RS      BASER
        Advertised link modes:  1000baseKX/Full
                                10000baseKR/Full
                                40000baseKR4/Full
                                40000baseCR4/Full
                                40000baseSR4/Full
                                40000baseLR4/Full
                                56000baseKR4/Full
                                25000baseCR/Full
                                25000baseKR/Full
                                25000baseSR/Full
                                50000baseCR2/Full
                                50000baseKR2/Full
                                100000baseKR4/Full
                                100000baseSR4/Full
                                100000baseCR4/Full
                                100000baseLR4_ER4/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: RS
        Link partner advertised link modes:  Not reported
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 100000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes
---
Your faithfully,
Marek
 
Another log of live migration - this is my the "largest" vm.

Code:
2025-02-12 12:26:28 use dedicated network address for sending migration traffic (172.16.107.15)
2025-02-12 12:26:29 starting migration of VM 119 to node 'NODE1' (172.16.107.15)
2025-02-12 12:26:29 starting VM 119 on remote node NODE1
2025-02-12 12:26:40 start remote tunnel
2025-02-12 12:26:41 ssh tunnel ver 1
2025-02-12 12:26:41 starting online/live migration on tcp:172.16.107.15:60000
2025-02-12 12:26:41 set migration capabilities
2025-02-12 12:26:41 migration downtime limit: 100 ms
2025-02-12 12:26:41 migration cachesize: 16.0 GiB
2025-02-12 12:26:41 set migration parameters
2025-02-12 12:26:41 start migrate command to tcp:172.16.107.15:60000
2025-02-12 12:26:42 migration active, transferred 331.0 MiB of 81.0 GiB VM-state, 704.4 MiB/s
2025-02-12 12:26:43 migration active, transferred 972.7 MiB of 81.0 GiB VM-state, 650.6 MiB/s
2025-02-12 12:26:44 migration active, transferred 1.5 GiB of 81.0 GiB VM-state, 583.4 MiB/s
2025-02-12 12:26:45 migration active, transferred 2.1 GiB of 81.0 GiB VM-state, 591.4 MiB/s
2025-02-12 12:26:46 migration active, transferred 2.7 GiB of 81.0 GiB VM-state, 687.6 MiB/s
2025-02-12 12:26:47 migration active, transferred 3.4 GiB of 81.0 GiB VM-state, 875.1 MiB/s
2025-02-12 12:26:48 migration active, transferred 4.2 GiB of 81.0 GiB VM-state, 692.4 MiB/s
2025-02-12 12:26:49 migration active, transferred 4.9 GiB of 81.0 GiB VM-state, 569.8 MiB/s
2025-02-12 12:26:50 migration active, transferred 5.7 GiB of 81.0 GiB VM-state, 876.3 MiB/s
2025-02-12 12:26:51 migration active, transferred 6.7 GiB of 81.0 GiB VM-state, 876.3 MiB/s
2025-02-12 12:26:52 migration active, transferred 7.4 GiB of 81.0 GiB VM-state, 711.2 MiB/s
2025-02-12 12:26:53 migration active, transferred 8.2 GiB of 81.0 GiB VM-state, 992.8 MiB/s
2025-02-12 12:26:54 migration active, transferred 8.9 GiB of 81.0 GiB VM-state, 860.6 MiB/s
2025-02-12 12:26:55 migration active, transferred 9.7 GiB of 81.0 GiB VM-state, 572.9 MiB/s
2025-02-12 12:26:56 migration active, transferred 10.5 GiB of 81.0 GiB VM-state, 907.8 MiB/s
2025-02-12 12:26:57 migration active, transferred 11.3 GiB of 81.0 GiB VM-state, 691.8 MiB/s
2025-02-12 12:26:58 migration active, transferred 12.3 GiB of 81.0 GiB VM-state, 859.3 MiB/s
2025-02-12 12:26:59 migration active, transferred 12.9 GiB of 81.0 GiB VM-state, 625.1 MiB/s
2025-02-12 12:27:00 migration active, transferred 13.6 GiB of 81.0 GiB VM-state, 675.6 MiB/s
2025-02-12 12:27:01 migration active, transferred 14.5 GiB of 81.0 GiB VM-state, 851.0 MiB/s
2025-02-12 12:27:02 migration active, transferred 15.5 GiB of 81.0 GiB VM-state, 639.5 MiB/s
2025-02-12 12:27:03 migration active, transferred 16.3 GiB of 81.0 GiB VM-state, 1.0 GiB/s
2025-02-12 12:27:04 migration active, transferred 17.0 GiB of 81.0 GiB VM-state, 852.0 MiB/s
2025-02-12 12:27:05 migration active, transferred 18.2 GiB of 81.0 GiB VM-state, 1.4 GiB/s


...


2025-02-12 12:28:43 migration active, transferred 79.4 GiB of 81.0 GiB VM-state, 175.6 MiB/s
2025-02-12 12:28:45 migration active, transferred 79.6 GiB of 81.0 GiB VM-state, 209.2 MiB/s
2025-02-12 12:28:46 migration active, transferred 79.8 GiB of 81.0 GiB VM-state, 204.8 MiB/s
2025-02-12 12:28:47 migration active, transferred 80.1 GiB of 81.0 GiB VM-state, 202.0 MiB/s
2025-02-12 12:28:48 migration active, transferred 80.3 GiB of 81.0 GiB VM-state, 204.4 MiB/s
2025-02-12 12:28:49 migration active, transferred 80.5 GiB of 81.0 GiB VM-state, 262.3 MiB/s
2025-02-12 12:28:50 migration active, transferred 80.7 GiB of 81.0 GiB VM-state, 182.1 MiB/s
2025-02-12 12:28:52 migration active, transferred 81.0 GiB of 81.0 GiB VM-state, 233.1 MiB/s
2025-02-12 12:28:53 migration active, transferred 81.2 GiB of 81.0 GiB VM-state, 223.9 MiB/s
2025-02-12 12:28:54 migration active, transferred 81.4 GiB of 81.0 GiB VM-state, 216.7 MiB/s
2025-02-12 12:28:55 migration active, transferred 81.7 GiB of 81.0 GiB VM-state, 608.1 MiB/s
2025-02-12 12:28:55 xbzrle: send updates to 13495 pages in 3.2 MiB encoded memory, cache-miss 93.49%, overflow 96
2025-02-12 12:28:56 migration active, transferred 81.9 GiB of 81.0 GiB VM-state, 342.5 MiB/s
2025-02-12 12:28:56 xbzrle: send updates to 79572 pages in 132.3 MiB encoded memory, cache-miss 93.49%, overflow 5384
2025-02-12 12:28:57 migration active, transferred 82.0 GiB of 81.0 GiB VM-state, 180.6 MiB/s
2025-02-12 12:28:57 xbzrle: send updates to 130657 pages in 221.0 MiB encoded memory, cache-miss 93.49%, overflow 8996
2025-02-12 12:28:59 migration active, transferred 82.1 GiB of 81.0 GiB VM-state, 318.4 MiB/s
2025-02-12 12:28:59 xbzrle: send updates to 194772 pages in 256.6 MiB encoded memory, cache-miss 93.49%, overflow 10536
2025-02-12 12:29:00 migration active, transferred 82.1 GiB of 81.0 GiB VM-state, 168.6 MiB/s
2025-02-12 12:29:00 xbzrle: send updates to 263738 pages in 276.2 MiB encoded memory, cache-miss 93.49%, overflow 11170
2025-02-12 12:29:01 migration active, transferred 82.3 GiB of 81.0 GiB VM-state, 259.7 MiB/s
2025-02-12 12:29:01 xbzrle: send updates to 298968 pages in 326.5 MiB encoded memory, cache-miss 93.49%, overflow 12868
2025-02-12 12:29:02 migration active, transferred 82.4 GiB of 81.0 GiB VM-state, 220.0 MiB/s
2025-02-12 12:29:02 xbzrle: send updates to 331815 pages in 385.9 MiB encoded memory, cache-miss 93.49%, overflow 14970
2025-02-12 12:29:03 migration active, transferred 82.6 GiB of 81.0 GiB VM-state, 219.8 MiB/s
2025-02-12 12:29:03 xbzrle: send updates to 343751 pages in 409.4 MiB encoded memory, cache-miss 93.49%, overflow 15640
2025-02-12 12:29:04 migration active, transferred 82.9 GiB of 81.0 GiB VM-state, 316.6 MiB/s


...


2025-02-12 12:34:01 migration active, transferred 107.5 GiB of 81.0 GiB VM-state, 114.1 MiB/s, VM dirties lots of memory: 186.1 MiB/s
2025-02-12 12:34:01 xbzrle: send updates to 13185406 pages in 12.9 GiB encoded memory, cache-miss 25.15%, overflow 559790
2025-02-12 12:34:02 migration active, transferred 107.5 GiB of 81.0 GiB VM-state, 212.6 MiB/s
2025-02-12 12:34:02 xbzrle: send updates to 13261950 pages in 12.9 GiB encoded memory, cache-miss 25.15%, overflow 559805
2025-02-12 12:34:03 migration active, transferred 107.6 GiB of 81.0 GiB VM-state, 155.4 MiB/s, VM dirties lots of memory: 186.1 MiB/s
2025-02-12 12:34:03 xbzrle: send updates to 13301490 pages in 12.9 GiB encoded memory, cache-miss 25.15%, overflow 561235
2025-02-12 12:34:04 migration active, transferred 107.6 GiB of 81.0 GiB VM-state, 159.6 MiB/s, VM dirties lots of memory: 186.1 MiB/s
2025-02-12 12:34:04 xbzrle: send updates to 13333273 pages in 13.0 GiB encoded memory, cache-miss 25.15%, overflow 562253
2025-02-12 12:34:05 migration active, transferred 107.7 GiB of 81.0 GiB VM-state, 159.1 MiB/s, VM dirties lots of memory: 186.1 MiB/s
2025-02-12 12:34:05 xbzrle: send updates to 13362163 pages in 13.0 GiB encoded memory, cache-miss 25.15%, overflow 563297
2025-02-12 12:34:06 migration active, transferred 107.7 GiB of 81.0 GiB VM-state, 136.0 MiB/s, VM dirties lots of memory: 186.1 MiB/s
2025-02-12 12:34:06 xbzrle: send updates to 13382848 pages in 13.0 GiB encoded memory, cache-miss 25.15%, overflow 564054
2025-02-12 12:34:08 migration active, transferred 107.8 GiB of 81.0 GiB VM-state, 119.4 MiB/s, VM dirties lots of memory: 186.1 MiB/s
2025-02-12 12:34:08 xbzrle: send updates to 13407859 pages in 13.0 GiB encoded memory, cache-miss 25.15%, overflow 565242
2025-02-12 12:34:09 migration active, transferred 107.9 GiB of 81.0 GiB VM-state, 211.9 MiB/s
2025-02-12 12:34:09 xbzrle: send updates to 13451751 pages in 13.1 GiB encoded memory, cache-miss 25.15%, overflow 565625
2025-02-12 12:34:10 migration active, transferred 107.9 GiB of 81.0 GiB VM-state, 237.3 MiB/s
2025-02-12 12:34:10 xbzrle: send updates to 13502528 pages in 13.1 GiB encoded memory, cache-miss 21.22%, overflow 565625
2025-02-12 12:34:11 migration active, transferred 108.0 GiB of 81.0 GiB VM-state, 168.5 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:11 xbzrle: send updates to 13540816 pages in 13.1 GiB encoded memory, cache-miss 21.22%, overflow 568868
2025-02-12 12:34:12 migration active, transferred 108.1 GiB of 81.0 GiB VM-state, 179.3 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:12 xbzrle: send updates to 13573835 pages in 13.2 GiB encoded memory, cache-miss 21.22%, overflow 571139
2025-02-12 12:34:13 migration active, transferred 108.2 GiB of 81.0 GiB VM-state, 133.4 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:13 xbzrle: send updates to 13603808 pages in 13.2 GiB encoded memory, cache-miss 21.22%, overflow 572469
2025-02-12 12:34:14 migration active, transferred 108.3 GiB of 81.0 GiB VM-state, 158.6 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:14 xbzrle: send updates to 13627586 pages in 13.2 GiB encoded memory, cache-miss 21.22%, overflow 572946
2025-02-12 12:34:15 migration active, transferred 108.4 GiB of 81.0 GiB VM-state, 175.9 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:15 xbzrle: send updates to 13648472 pages in 13.2 GiB encoded memory, cache-miss 21.22%, overflow 573798
2025-02-12 12:34:17 migration active, transferred 108.4 GiB of 81.0 GiB VM-state, 160.9 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:17 xbzrle: send updates to 13668076 pages in 13.2 GiB encoded memory, cache-miss 21.22%, overflow 574702
2025-02-12 12:34:18 migration active, transferred 108.5 GiB of 81.0 GiB VM-state, 185.3 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:18 xbzrle: send updates to 13704028 pages in 13.2 GiB encoded memory, cache-miss 21.22%, overflow 575212
2025-02-12 12:34:19 migration active, transferred 108.5 GiB of 81.0 GiB VM-state, 125.2 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:19 xbzrle: send updates to 13751391 pages in 13.3 GiB encoded memory, cache-miss 21.22%, overflow 576029
2025-02-12 12:34:20 migration active, transferred 108.6 GiB of 81.0 GiB VM-state, 157.7 MiB/s, VM dirties lots of memory: 217.7 MiB/s
2025-02-12 12:34:20 xbzrle: send updates to 13785381 pages in 13.3 GiB encoded memory, cache-miss 21.22%, overflow 577906
2025-02-12 12:34:20 auto-increased downtime to continue migration: 12800 ms
2025-02-12 12:34:22 migration active, transferred 108.7 GiB of 81.0 GiB VM-state, 509.5 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:22 xbzrle: send updates to 13826135 pages in 13.3 GiB encoded memory, cache-miss 45.61%, overflow 579380
2025-02-12 12:34:23 migration active, transferred 108.8 GiB of 81.0 GiB VM-state, 184.1 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:23 xbzrle: send updates to 13851056 pages in 13.3 GiB encoded memory, cache-miss 45.61%, overflow 580625
2025-02-12 12:34:24 migration active, transferred 108.9 GiB of 81.0 GiB VM-state, 150.7 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:24 xbzrle: send updates to 13873793 pages in 13.4 GiB encoded memory, cache-miss 45.61%, overflow 581931
2025-02-12 12:34:25 migration active, transferred 109.0 GiB of 81.0 GiB VM-state, 120.5 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:25 xbzrle: send updates to 13894584 pages in 13.4 GiB encoded memory, cache-miss 45.61%, overflow 583378
2025-02-12 12:34:26 migration active, transferred 109.1 GiB of 81.0 GiB VM-state, 114.2 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:26 xbzrle: send updates to 13913584 pages in 13.4 GiB encoded memory, cache-miss 45.61%, overflow 584744
2025-02-12 12:34:27 migration active, transferred 109.2 GiB of 81.0 GiB VM-state, 219.8 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:27 xbzrle: send updates to 13962696 pages in 13.4 GiB encoded memory, cache-miss 45.61%, overflow 585158
2025-02-12 12:34:28 migration active, transferred 109.2 GiB of 81.0 GiB VM-state, 165.0 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:28 xbzrle: send updates to 14024775 pages in 13.4 GiB encoded memory, cache-miss 45.61%, overflow 585160
2025-02-12 12:34:29 migration active, transferred 109.3 GiB of 81.0 GiB VM-state, 180.3 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:29 xbzrle: send updates to 14057768 pages in 13.5 GiB encoded memory, cache-miss 45.61%, overflow 586225
2025-02-12 12:34:30 migration active, transferred 109.4 GiB of 81.0 GiB VM-state, 168.8 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:30 xbzrle: send updates to 14085307 pages in 13.5 GiB encoded memory, cache-miss 45.61%, overflow 587639
2025-02-12 12:34:32 migration active, transferred 109.5 GiB of 81.0 GiB VM-state, 165.2 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:32 xbzrle: send updates to 14110855 pages in 13.5 GiB encoded memory, cache-miss 45.61%, overflow 588567
2025-02-12 12:34:33 migration active, transferred 109.6 GiB of 81.0 GiB VM-state, 174.0 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:33 xbzrle: send updates to 14135332 pages in 13.5 GiB encoded memory, cache-miss 45.61%, overflow 589259
2025-02-12 12:34:34 migration active, transferred 109.7 GiB of 81.0 GiB VM-state, 163.5 MiB/s, VM dirties lots of memory: 2.6 GiB/s
2025-02-12 12:34:34 xbzrle: send updates to 14159474 pages in 13.6 GiB encoded memory, cache-miss 45.61%, overflow 589957
2025-02-12 12:34:35 migration active, transferred 109.8 GiB of 81.0 GiB VM-state, 176.7 MiB/s
2025-02-12 12:34:35 xbzrle: send updates to 14179000 pages in 13.6 GiB encoded memory, cache-miss 38.88%, overflow 590759
2025-02-12 12:34:36 migration active, transferred 109.9 GiB of 81.0 GiB VM-state, 138.2 MiB/s
2025-02-12 12:34:36 xbzrle: send updates to 14194923 pages in 13.6 GiB encoded memory, cache-miss 38.88%, overflow 591540
2025-02-12 12:34:37 migration active, transferred 110.0 GiB of 81.0 GiB VM-state, 138.7 MiB/s
2025-02-12 12:34:37 xbzrle: send updates to 14213834 pages in 13.6 GiB encoded memory, cache-miss 38.88%, overflow 592280
2025-02-12 12:34:38 migration active, transferred 110.1 GiB of 81.0 GiB VM-state, 109.7 MiB/s
2025-02-12 12:34:38 xbzrle: send updates to 14234464 pages in 13.6 GiB encoded memory, cache-miss 38.88%, overflow 593253
2025-02-12 12:34:40 migration active, transferred 110.2 GiB of 81.0 GiB VM-state, 126.8 MiB/s
2025-02-12 12:34:40 xbzrle: send updates to 14253394 pages in 13.6 GiB encoded memory, cache-miss 38.88%, overflow 594169
2025-02-12 12:34:41 migration active, transferred 110.3 GiB of 81.0 GiB VM-state, 269.1 MiB/s
2025-02-12 12:34:41 xbzrle: send updates to 14277141 pages in 13.7 GiB encoded memory, cache-miss 38.88%, overflow 595001
2025-02-12 12:34:42 migration active, transferred 110.4 GiB of 81.0 GiB VM-state, 250.9 MiB/s
2025-02-12 12:34:42 xbzrle: send updates to 14331417 pages in 13.7 GiB encoded memory, cache-miss 38.88%, overflow 595001
2025-02-12 12:34:43 auto-increased downtime to continue migration: 25600 ms
2025-02-12 12:34:43 migration active, transferred 110.5 GiB of 81.0 GiB VM-state, 272.9 MiB/s
2025-02-12 12:34:43 xbzrle: send updates to 14400058 pages in 13.7 GiB encoded memory, cache-miss 43.45%, overflow 596606
2025-02-12 12:35:13 average migration speed: 162.0 MiB/s - downtime 24583 ms
2025-02-12 12:35:13 migration status: completed
VM quit/powerdown failed - terminating now with SIGTERM
2025-02-12 12:35:30 migration finished successfully (duration 00:09:02)
TASK OK
 
When you do the live migration, do you also move the storage? Maybe you have a problem with the read/write speed of the disks?
 
When you do the live migration, do you also move the storage? Maybe you have a problem with the read/write speed of the disks?

It is only live migration, disks are not transferred. Storage (over NFS to TrueNAS) of disk for vms in my enviroment are NVMe and read/write speeds are easily above 2000 MB/s, so I don't think it's a matter of disk limitation.
 

Attachments

  • iperf.png
    iperf.png
    595 KB · Views: 5
Last edited:
1. migration currently uses a single connection, so you need to compare apples to apples when looking at iperf3 (https://bugzilla.proxmox.com/show_bug.cgi?id=5766 / https://bugzilla.proxmox.com/show_bug.cgi?id=4152)
2. your VM seems to be quite busy - your big example shows that it overshoots by 50% (i.e., it transfers state, but that state gets invalidated before the migration is done - see the log where Qemu complains that the VM writes 2.6GB of memory per second!), this will make the migration take longer for obvious reasons (but should get better once multi-FD migration is implemented)

I suspect you only manage to finish migration of the "big" VM at all because Qemu starts to throttle the CPU to slow down the VM activity and increases the "remaining allowed delta before freezing the VM"..
 
1. migration currently uses a single connection, so you need to compare apples to apples when looking at iperf3 (https://bugzilla.proxmox.com/show_bug.cgi?id=5766 / https://bugzilla.proxmox.com/show_bug.cgi?id=4152)
2. your VM seems to be quite busy - your big example shows that it overshoots by 50% (i.e., it transfers state, but that state gets invalidated before the migration is done - see the log where Qemu complains that the VM writes 2.6GB of memory per second!), this will make the migration take longer for obvious reasons (but should get better once multi-FD migration is implemented)

I suspect you only manage to finish migration of the "big" VM at all because Qemu starts to throttle the CPU to slow down the VM activity and increases the "remaining allowed delta before freezing the VM"..
Thank you for your reply.

Using iperf3 with a single connection (-P 1), I am getting speeds of ~5-10 Gb/s, which is not a satisfactory result. As I understood, implementing multi FD in the future will somewhat improve this performance.

Yes, it is a large machine, but I intentionally chose it for testing purposes.

To summarize, I am probably not able to do anything in Proxmox to increase live migration performance, right?
 
Using iperf3 with a single connection (-P 1), I am getting speeds of ~5-10 Gb/s, which is not a satisfactory result.

it is the performance you get over a single connection using your setup.

As I understood, implementing multi FD in the future will somewhat improve this performance.

yes, multi FD will allow multiple migration streams in parallel, similar to what iperf with -P does.

To summarize, I am probably not able to do anything in Proxmox to increase live migration performance, right?

you can try reducing the memory load inside the VM if that is possible (for the duration of the migration).
 
  • Like
Reactions: m4rek11