[SOLVED] slow live migration after upgrade proxmox to the 6.4

sidereus

Member
Jul 25, 2019
45
7
13
54
I see that live migration goes much slower after updating all cluster to the 6.4. Here is a dedicated migration network 10Gb.
Bash:
root@asr5:~# cat /etc/pve/datacenter.cfg
keyboard: en-us
migration: network=192.168.122.1/24,type=insecure
Now it is:
Code:
2021-04-30 00:33:14 use dedicated network address for sending migration traffic (192.168.122.2)
2021-04-30 00:33:15 starting migration of VM 100 to node 'asr2' (192.168.122.2)
2021-04-30 00:33:15 starting VM 100 on remote node 'asr2'
2021-04-30 00:33:16 start remote tunnel
2021-04-30 00:33:17 ssh tunnel ver 1
2021-04-30 00:33:17 starting online/live migration on tcp:192.168.122.2:60000
2021-04-30 00:33:17 set migration capabilities
2021-04-30 00:33:17 migration downtime limit: 100 ms
2021-04-30 00:33:17 migration cachesize: 2.0 GiB
2021-04-30 00:33:17 set migration parameters
2021-04-30 00:33:17 start migrate command to tcp:192.168.122.2:60000
2021-04-30 00:33:18 migration active, transferred 78.6 MiB of 16.0 GiB VM-state, 3.2 GiB/s
2021-04-30 00:33:19 migration active, transferred 202.1 MiB of 16.0 GiB VM-state, 135.7 MiB/s
2021-04-30 00:33:20 migration active, transferred 333.1 MiB of 16.0 GiB VM-state, 138.8 MiB/s
2021-04-30 00:33:21 migration active, transferred 461.6 MiB of 16.0 GiB VM-state, 129.7 MiB/s
2021-04-30 00:33:22 migration active, transferred 592.5 MiB of 16.0 GiB VM-state, 133.3 MiB/s
2021-04-30 00:33:23 migration active, transferred 659.3 MiB of 16.0 GiB VM-state, 3.7 GiB/s
2021-04-30 00:33:24 migration active, transferred 727.9 MiB of 16.0 GiB VM-state, 138.8 MiB/s
2021-04-30 00:33:25 migration active, transferred 852.3 MiB of 16.0 GiB VM-state, 632.7 MiB/s
2021-04-30 00:33:26 migration active, transferred 984.2 MiB of 16.0 GiB VM-state, 137.6 MiB/s
2021-04-30 00:33:27 migration active, transferred 1.1 GiB of 16.0 GiB VM-state, 130.1 MiB/s
2021-04-30 00:33:28 migration active, transferred 1.2 GiB of 16.0 GiB VM-state, 127.8 MiB/s
2021-04-30 00:33:29 migration active, transferred 1.3 GiB of 16.0 GiB VM-state, 127.8 MiB/s
2021-04-30 00:33:30 migration active, transferred 1.4 GiB of 16.0 GiB VM-state, 129.1 MiB/s
2021-04-30 00:33:31 migration active, transferred 1.6 GiB of 16.0 GiB VM-state, 128.5 MiB/s
2021-04-30 00:33:32 migration active, transferred 1.7 GiB of 16.0 GiB VM-state, 128.2 MiB/s
2021-04-30 00:33:33 migration active, transferred 1.8 GiB of 16.0 GiB VM-state, 128.8 MiB/s
2021-04-30 00:33:34 migration active, transferred 1.9 GiB of 16.0 GiB VM-state, 131.3 MiB/s
2021-04-30 00:33:35 migration active, transferred 2.1 GiB of 16.0 GiB VM-state, 129.8 MiB/s
2021-04-30 00:33:36 migration active, transferred 2.2 GiB of 16.0 GiB VM-state, 204.2 MiB/s
2021-04-30 00:33:37 migration active, transferred 2.2 GiB of 16.0 GiB VM-state, 3.4 GiB/s
2021-04-30 00:33:38 migration active, transferred 2.3 GiB of 16.0 GiB VM-state, 343.4 MiB/s
2021-04-30 00:33:39 migration active, transferred 2.4 GiB of 16.0 GiB VM-state, 129.6 MiB/s
2021-04-30 00:33:40 migration active, transferred 2.5 GiB of 16.0 GiB VM-state, 133.0 MiB/s
2021-04-30 00:33:41 migration active, transferred 2.7 GiB of 16.0 GiB VM-state, 148.2 MiB/s
2021-04-30 00:33:42 migration active, transferred 2.8 GiB of 16.0 GiB VM-state, 148.6 MiB/s
2021-04-30 00:33:43 migration active, transferred 2.9 GiB of 16.0 GiB VM-state, 139.9 MiB/s
2021-04-30 00:33:44 migration active, transferred 3.0 GiB of 16.0 GiB VM-state, 131.2 MiB/s
2021-04-30 00:33:46 migration active, transferred 3.2 GiB of 16.0 GiB VM-state, 126.5 MiB/s
2021-04-30 00:33:47 migration active, transferred 3.3 GiB of 16.0 GiB VM-state, 133.9 MiB/s
2021-04-30 00:33:48 migration active, transferred 3.4 GiB of 16.0 GiB VM-state, 127.8 MiB/s
2021-04-30 00:33:49 migration active, transferred 3.6 GiB of 16.0 GiB VM-state, 130.5 MiB/s
2021-04-30 00:33:50 average migration speed: 497.0 MiB/s - downtime 150 ms
2021-04-30 00:33:50 migration status: completed
2021-04-30 00:33:53 migration finished successfully (duration 00:00:39)
TASK OK
Before upgrade it was:
Code:
2021-04-29 23:53:59 use dedicated network address for sending migration traffic (192.168.122.5)
2021-04-29 23:53:59 starting migration of VM 106 to node 'asr5' (192.168.122.5)
2021-04-29 23:53:59 starting VM 106 on remote node 'asr5'
2021-04-29 23:54:02 start remote tunnel
2021-04-29 23:54:03 ssh tunnel ver 1
2021-04-29 23:54:03 starting online/live migration on tcp:192.168.122.5:60000
2021-04-29 23:54:03 set migration_caps
2021-04-29 23:54:03 migration speed limit: 8589934592 B/s
2021-04-29 23:54:03 migration downtime limit: 100 ms
2021-04-29 23:54:03 migration cachesize: 4294967296 B
2021-04-29 23:54:03 set migration parameters
2021-04-29 23:54:03 start migrate command to tcp:192.168.122.5:60000
2021-04-29 23:54:04 migration status: active (transferred 183037313, remaining 26159312896), total 34377637888)
2021-04-29 23:54:04 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:05 migration status: active (transferred 823898871, remaining 20425682944), total 34377637888)
2021-04-29 23:54:05 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:06 migration status: active (transferred 2004781823, remaining 19231305728), total 34377637888)
2021-04-29 23:54:06 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:07 migration status: active (transferred 3196984154, remaining 18017955840), total 34377637888)
2021-04-29 23:54:07 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:08 migration status: active (transferred 4000011729, remaining 13603848192), total 34377637888)
2021-04-29 23:54:08 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:09 migration status: active (transferred 4234999674, remaining 4838821888), total 34377637888)
2021-04-29 23:54:09 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:10 migration status: active (transferred 5453903404, remaining 3522039808), total 34377637888)
2021-04-29 23:54:10 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:11 migration status: active (transferred 6693572106, remaining 2214924288), total 34377637888)
2021-04-29 23:54:11 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:12 migration status: active (transferred 7902592541, remaining 938213376), total 34377637888)
2021-04-29 23:54:12 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 8942329553, remaining 262746112), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 42236 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9012998578, remaining 191201280), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 59456 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9091655245, remaining 111120384), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 78620 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9159325281, remaining 114331648), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 95108 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9238625305, remaining 116924416), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 562402 pages 959 cachemiss 114292 overflow 41
2021-04-29 23:54:13 migration status: active (transferred 9292475720, remaining 208162816), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 5359714 pages 6748 cachemiss 126236 overflow 87
2021-04-29 23:54:13 migration status: active (transferred 9350766383, remaining 138899456), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 9540718 pages 10479 cachemiss 139413 overflow 118
2021-04-29 23:54:13 migration status: active (transferred 9414494851, remaining 160677888), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 10916858 pages 17346 cachemiss 154595 overflow 143
2021-04-29 23:54:13 migration status: active (transferred 9463164789, remaining 76521472), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 16693053 pages 27363 cachemiss 165029 overflow 256
2021-04-29 23:54:14 migration speed: 2978.91 MB/s - downtime 218 ms
2021-04-29 23:54:14 migration status: completed
2021-04-29 23:54:17 migration finished successfully (duration 00:00:19)
TASK OK
The current iperf3 test on this network. All is fine here:
Bash:
root@asr4:~# date && iperf3 -c 192.168.122.2
Fri 30 Apr 2021 12:50:14 AM MSK
Connecting to host 192.168.122.2, port 5201
[  5] local 192.168.122.4 port 32812 connected to 192.168.122.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.16 GBytes  9.92 Gbits/sec   51   1.70 MBytes       
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    3   1.70 MBytes       
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec    2   1.70 MBytes       
[  5]   4.00-5.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec    1   1.70 MBytes       
[  5]   6.00-7.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   7.00-8.00   sec  1.15 GBytes  9.91 Gbits/sec    0   1.70 MBytes       
[  5]   8.00-9.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   9.00-10.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec   57             sender
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec                  receiver

iperf Done.
 
I am also seeing a regression. Insecure live VM migration of the VM-state (no local disk migration) is about 128MiB/s over my dedicated 40Gbps link.

Code:
2021-04-29 20:34:01 use dedicated network address for sending migration traffic (10.42.15.1)
2021-04-29 20:34:01 starting migration of VM 101 to node 'pve01' (10.42.15.1)
2021-04-29 20:34:02 starting VM 101 on remote node 'pve01'
2021-04-29 20:34:03 [pve01] trying to acquire lock...
2021-04-29 20:34:03 [pve01]  OK
2021-04-29 20:34:03 start remote tunnel
2021-04-29 20:34:04 ssh tunnel ver 1
2021-04-29 20:34:04 starting online/live migration on tcp:10.42.15.1:60004
2021-04-29 20:34:04 set migration capabilities
2021-04-29 20:34:04 migration downtime limit: 100 ms
2021-04-29 20:34:04 migration cachesize: 1.0 GiB
2021-04-29 20:34:04 set migration parameters
2021-04-29 20:34:04 start migrate command to tcp:10.42.15.1:60004
2021-04-29 20:34:05 migration active, transferred 132.5 MiB of 8.0 GiB VM-state, 128.1 MiB/s
2021-04-29 20:34:06 migration active, transferred 263.6 MiB of 8.0 GiB VM-state, 127.9 MiB/s
2021-04-29 20:34:07 migration active, transferred 395.6 MiB of 8.0 GiB VM-state, 128.2 MiB/s
2021-04-29 20:34:08 migration active, transferred 524.9 MiB of 8.0 GiB VM-state, 128.8 MiB/s
2021-04-29 20:34:09 migration active, transferred 636.3 MiB of 8.0 GiB VM-state, 128.0 MiB/s
...
2021-04-29 20:35:02 migration active, transferred 7.2 GiB of 8.0 GiB VM-state, 128.0 MiB/s
2021-04-29 20:35:03 migration active, transferred 7.3 GiB of 8.0 GiB VM-state, 126.9 MiB/s
2021-04-29 20:35:04 migration active, transferred 7.4 GiB of 8.0 GiB VM-state, 134.5 MiB/s
2021-04-29 20:35:05 migration active, transferred 7.6 GiB of 8.0 GiB VM-state, 130.5 MiB/s
2021-04-29 20:35:06 migration active, transferred 7.7 GiB of 8.0 GiB VM-state, 130.1 MiB/s
2021-04-29 20:35:08 migration active, transferred 7.9 GiB of 8.0 GiB VM-state, 135.1 MiB/s
2021-04-29 20:35:08 average migration speed: 128.3 MiB/s - downtime 38 ms
2021-04-29 20:35:08 migration status: completed
2021-04-29 20:35:10 migration finished successfully (duration 00:01:09)
TASK OK

Code:
# cat /etc/pve/datacenter.cfg
console: html5
keyboard: en-us
migration: insecure,network=10.42.15.0/24

Code:
# iperf3 -c 10.42.15.2
Connecting to host 10.42.15.2, port 5201
[  5] local 10.42.15.1 port 60998 connected to 10.42.15.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.24 GBytes  36.4 Gbits/sec   16   1.63 MBytes       
[  5]   1.00-2.00   sec  4.31 GBytes  37.1 Gbits/sec    0   1.67 MBytes       
[  5]   2.00-3.00   sec  3.42 GBytes  29.3 Gbits/sec    0   1.75 MBytes       
[  5]   3.00-4.00   sec  4.34 GBytes  37.3 Gbits/sec    0   1.77 MBytes       
[  5]   4.00-5.00   sec  4.43 GBytes  38.0 Gbits/sec    0   1.82 MBytes       
[  5]   5.00-6.00   sec  4.43 GBytes  38.0 Gbits/sec    0   1.83 MBytes       
[  5]   6.00-7.00   sec  4.37 GBytes  37.5 Gbits/sec    0   1.83 MBytes       
[  5]   7.00-8.00   sec  4.48 GBytes  38.5 Gbits/sec    0   1.85 MBytes       
[  5]   8.00-9.00   sec  4.48 GBytes  38.5 Gbits/sec    0   1.85 MBytes       
[  5]   9.00-10.00  sec  4.38 GBytes  37.7 Gbits/sec    0   1.85 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  42.9 GBytes  36.8 Gbits/sec   16             sender
[  5]   0.00-10.00  sec  42.9 GBytes  36.8 Gbits/sec                  receiver
 
After setting speed limit to 8GB/sec it's much better:
Code:
2021-05-01 14:08:47 use dedicated network address for sending migration traffic (192.168.122.5)
2021-05-01 14:08:47 starting migration of VM 117 to node 'asr5' (192.168.122.5)
2021-05-01 14:08:48 starting VM 117 on remote node 'asr5'
2021-05-01 14:08:50 start remote tunnel
2021-05-01 14:08:51 ssh tunnel ver 1
2021-05-01 14:08:51 starting online/live migration on tcp:192.168.122.5:60000
2021-05-01 14:08:51 set migration capabilities
2021-05-01 14:08:51 migration speed limit: 8.0 GiB/s
2021-05-01 14:08:51 migration downtime limit: 100 ms
2021-05-01 14:08:51 migration cachesize: 1.0 GiB
2021-05-01 14:08:51 set migration parameters
2021-05-01 14:08:51 start migrate command to tcp:192.168.122.5:60000
2021-05-01 14:08:52 migration active, transferred 1.1 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:53 migration active, transferred 2.2 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:54 migration active, transferred 3.4 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:55 migration active, transferred 4.5 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:56 migration active, transferred 5.7 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:57 migration active, transferred 7.3 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:58 average migration speed: 1.1 GiB/s - downtime 141 ms
2021-05-01 14:08:58 migration status: completed
2021-05-01 14:09:01 migration finished successfully (duration 00:00:14)
TASK OK
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!