[SOLVED] slow live migration after upgrade proxmox to the 6.4

sidereus

Member
Jul 25, 2019
45
8
13
55
I see that live migration goes much slower after updating all cluster to the 6.4. Here is a dedicated migration network 10Gb.
Bash:
root@asr5:~# cat /etc/pve/datacenter.cfg
keyboard: en-us
migration: network=192.168.122.1/24,type=insecure
Now it is:
Code:
2021-04-30 00:33:14 use dedicated network address for sending migration traffic (192.168.122.2)
2021-04-30 00:33:15 starting migration of VM 100 to node 'asr2' (192.168.122.2)
2021-04-30 00:33:15 starting VM 100 on remote node 'asr2'
2021-04-30 00:33:16 start remote tunnel
2021-04-30 00:33:17 ssh tunnel ver 1
2021-04-30 00:33:17 starting online/live migration on tcp:192.168.122.2:60000
2021-04-30 00:33:17 set migration capabilities
2021-04-30 00:33:17 migration downtime limit: 100 ms
2021-04-30 00:33:17 migration cachesize: 2.0 GiB
2021-04-30 00:33:17 set migration parameters
2021-04-30 00:33:17 start migrate command to tcp:192.168.122.2:60000
2021-04-30 00:33:18 migration active, transferred 78.6 MiB of 16.0 GiB VM-state, 3.2 GiB/s
2021-04-30 00:33:19 migration active, transferred 202.1 MiB of 16.0 GiB VM-state, 135.7 MiB/s
2021-04-30 00:33:20 migration active, transferred 333.1 MiB of 16.0 GiB VM-state, 138.8 MiB/s
2021-04-30 00:33:21 migration active, transferred 461.6 MiB of 16.0 GiB VM-state, 129.7 MiB/s
2021-04-30 00:33:22 migration active, transferred 592.5 MiB of 16.0 GiB VM-state, 133.3 MiB/s
2021-04-30 00:33:23 migration active, transferred 659.3 MiB of 16.0 GiB VM-state, 3.7 GiB/s
2021-04-30 00:33:24 migration active, transferred 727.9 MiB of 16.0 GiB VM-state, 138.8 MiB/s
2021-04-30 00:33:25 migration active, transferred 852.3 MiB of 16.0 GiB VM-state, 632.7 MiB/s
2021-04-30 00:33:26 migration active, transferred 984.2 MiB of 16.0 GiB VM-state, 137.6 MiB/s
2021-04-30 00:33:27 migration active, transferred 1.1 GiB of 16.0 GiB VM-state, 130.1 MiB/s
2021-04-30 00:33:28 migration active, transferred 1.2 GiB of 16.0 GiB VM-state, 127.8 MiB/s
2021-04-30 00:33:29 migration active, transferred 1.3 GiB of 16.0 GiB VM-state, 127.8 MiB/s
2021-04-30 00:33:30 migration active, transferred 1.4 GiB of 16.0 GiB VM-state, 129.1 MiB/s
2021-04-30 00:33:31 migration active, transferred 1.6 GiB of 16.0 GiB VM-state, 128.5 MiB/s
2021-04-30 00:33:32 migration active, transferred 1.7 GiB of 16.0 GiB VM-state, 128.2 MiB/s
2021-04-30 00:33:33 migration active, transferred 1.8 GiB of 16.0 GiB VM-state, 128.8 MiB/s
2021-04-30 00:33:34 migration active, transferred 1.9 GiB of 16.0 GiB VM-state, 131.3 MiB/s
2021-04-30 00:33:35 migration active, transferred 2.1 GiB of 16.0 GiB VM-state, 129.8 MiB/s
2021-04-30 00:33:36 migration active, transferred 2.2 GiB of 16.0 GiB VM-state, 204.2 MiB/s
2021-04-30 00:33:37 migration active, transferred 2.2 GiB of 16.0 GiB VM-state, 3.4 GiB/s
2021-04-30 00:33:38 migration active, transferred 2.3 GiB of 16.0 GiB VM-state, 343.4 MiB/s
2021-04-30 00:33:39 migration active, transferred 2.4 GiB of 16.0 GiB VM-state, 129.6 MiB/s
2021-04-30 00:33:40 migration active, transferred 2.5 GiB of 16.0 GiB VM-state, 133.0 MiB/s
2021-04-30 00:33:41 migration active, transferred 2.7 GiB of 16.0 GiB VM-state, 148.2 MiB/s
2021-04-30 00:33:42 migration active, transferred 2.8 GiB of 16.0 GiB VM-state, 148.6 MiB/s
2021-04-30 00:33:43 migration active, transferred 2.9 GiB of 16.0 GiB VM-state, 139.9 MiB/s
2021-04-30 00:33:44 migration active, transferred 3.0 GiB of 16.0 GiB VM-state, 131.2 MiB/s
2021-04-30 00:33:46 migration active, transferred 3.2 GiB of 16.0 GiB VM-state, 126.5 MiB/s
2021-04-30 00:33:47 migration active, transferred 3.3 GiB of 16.0 GiB VM-state, 133.9 MiB/s
2021-04-30 00:33:48 migration active, transferred 3.4 GiB of 16.0 GiB VM-state, 127.8 MiB/s
2021-04-30 00:33:49 migration active, transferred 3.6 GiB of 16.0 GiB VM-state, 130.5 MiB/s
2021-04-30 00:33:50 average migration speed: 497.0 MiB/s - downtime 150 ms
2021-04-30 00:33:50 migration status: completed
2021-04-30 00:33:53 migration finished successfully (duration 00:00:39)
TASK OK
Before upgrade it was:
Code:
2021-04-29 23:53:59 use dedicated network address for sending migration traffic (192.168.122.5)
2021-04-29 23:53:59 starting migration of VM 106 to node 'asr5' (192.168.122.5)
2021-04-29 23:53:59 starting VM 106 on remote node 'asr5'
2021-04-29 23:54:02 start remote tunnel
2021-04-29 23:54:03 ssh tunnel ver 1
2021-04-29 23:54:03 starting online/live migration on tcp:192.168.122.5:60000
2021-04-29 23:54:03 set migration_caps
2021-04-29 23:54:03 migration speed limit: 8589934592 B/s
2021-04-29 23:54:03 migration downtime limit: 100 ms
2021-04-29 23:54:03 migration cachesize: 4294967296 B
2021-04-29 23:54:03 set migration parameters
2021-04-29 23:54:03 start migrate command to tcp:192.168.122.5:60000
2021-04-29 23:54:04 migration status: active (transferred 183037313, remaining 26159312896), total 34377637888)
2021-04-29 23:54:04 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:05 migration status: active (transferred 823898871, remaining 20425682944), total 34377637888)
2021-04-29 23:54:05 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:06 migration status: active (transferred 2004781823, remaining 19231305728), total 34377637888)
2021-04-29 23:54:06 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:07 migration status: active (transferred 3196984154, remaining 18017955840), total 34377637888)
2021-04-29 23:54:07 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:08 migration status: active (transferred 4000011729, remaining 13603848192), total 34377637888)
2021-04-29 23:54:08 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:09 migration status: active (transferred 4234999674, remaining 4838821888), total 34377637888)
2021-04-29 23:54:09 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:10 migration status: active (transferred 5453903404, remaining 3522039808), total 34377637888)
2021-04-29 23:54:10 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:11 migration status: active (transferred 6693572106, remaining 2214924288), total 34377637888)
2021-04-29 23:54:11 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:12 migration status: active (transferred 7902592541, remaining 938213376), total 34377637888)
2021-04-29 23:54:12 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 0 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 8942329553, remaining 262746112), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 42236 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9012998578, remaining 191201280), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 59456 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9091655245, remaining 111120384), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 78620 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9159325281, remaining 114331648), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 0 pages 0 cachemiss 95108 overflow 0
2021-04-29 23:54:13 migration status: active (transferred 9238625305, remaining 116924416), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 562402 pages 959 cachemiss 114292 overflow 41
2021-04-29 23:54:13 migration status: active (transferred 9292475720, remaining 208162816), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 5359714 pages 6748 cachemiss 126236 overflow 87
2021-04-29 23:54:13 migration status: active (transferred 9350766383, remaining 138899456), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 9540718 pages 10479 cachemiss 139413 overflow 118
2021-04-29 23:54:13 migration status: active (transferred 9414494851, remaining 160677888), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 10916858 pages 17346 cachemiss 154595 overflow 143
2021-04-29 23:54:13 migration status: active (transferred 9463164789, remaining 76521472), total 34377637888)
2021-04-29 23:54:13 migration xbzrle cachesize: 4294967296 transferred 16693053 pages 27363 cachemiss 165029 overflow 256
2021-04-29 23:54:14 migration speed: 2978.91 MB/s - downtime 218 ms
2021-04-29 23:54:14 migration status: completed
2021-04-29 23:54:17 migration finished successfully (duration 00:00:19)
TASK OK
The current iperf3 test on this network. All is fine here:
Bash:
root@asr4:~# date && iperf3 -c 192.168.122.2
Fri 30 Apr 2021 12:50:14 AM MSK
Connecting to host 192.168.122.2, port 5201
[  5] local 192.168.122.4 port 32812 connected to 192.168.122.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.16 GBytes  9.92 Gbits/sec   51   1.70 MBytes       
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec    3   1.70 MBytes       
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec    2   1.70 MBytes       
[  5]   4.00-5.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec    1   1.70 MBytes       
[  5]   6.00-7.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   7.00-8.00   sec  1.15 GBytes  9.91 Gbits/sec    0   1.70 MBytes       
[  5]   8.00-9.00   sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
[  5]   9.00-10.00  sec  1.15 GBytes  9.90 Gbits/sec    0   1.70 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec   57             sender
[  5]   0.00-10.00  sec  11.5 GBytes  9.90 Gbits/sec                  receiver

iperf Done.
 
I am also seeing a regression. Insecure live VM migration of the VM-state (no local disk migration) is about 128MiB/s over my dedicated 40Gbps link.

Code:
2021-04-29 20:34:01 use dedicated network address for sending migration traffic (10.42.15.1)
2021-04-29 20:34:01 starting migration of VM 101 to node 'pve01' (10.42.15.1)
2021-04-29 20:34:02 starting VM 101 on remote node 'pve01'
2021-04-29 20:34:03 [pve01] trying to acquire lock...
2021-04-29 20:34:03 [pve01]  OK
2021-04-29 20:34:03 start remote tunnel
2021-04-29 20:34:04 ssh tunnel ver 1
2021-04-29 20:34:04 starting online/live migration on tcp:10.42.15.1:60004
2021-04-29 20:34:04 set migration capabilities
2021-04-29 20:34:04 migration downtime limit: 100 ms
2021-04-29 20:34:04 migration cachesize: 1.0 GiB
2021-04-29 20:34:04 set migration parameters
2021-04-29 20:34:04 start migrate command to tcp:10.42.15.1:60004
2021-04-29 20:34:05 migration active, transferred 132.5 MiB of 8.0 GiB VM-state, 128.1 MiB/s
2021-04-29 20:34:06 migration active, transferred 263.6 MiB of 8.0 GiB VM-state, 127.9 MiB/s
2021-04-29 20:34:07 migration active, transferred 395.6 MiB of 8.0 GiB VM-state, 128.2 MiB/s
2021-04-29 20:34:08 migration active, transferred 524.9 MiB of 8.0 GiB VM-state, 128.8 MiB/s
2021-04-29 20:34:09 migration active, transferred 636.3 MiB of 8.0 GiB VM-state, 128.0 MiB/s
...
2021-04-29 20:35:02 migration active, transferred 7.2 GiB of 8.0 GiB VM-state, 128.0 MiB/s
2021-04-29 20:35:03 migration active, transferred 7.3 GiB of 8.0 GiB VM-state, 126.9 MiB/s
2021-04-29 20:35:04 migration active, transferred 7.4 GiB of 8.0 GiB VM-state, 134.5 MiB/s
2021-04-29 20:35:05 migration active, transferred 7.6 GiB of 8.0 GiB VM-state, 130.5 MiB/s
2021-04-29 20:35:06 migration active, transferred 7.7 GiB of 8.0 GiB VM-state, 130.1 MiB/s
2021-04-29 20:35:08 migration active, transferred 7.9 GiB of 8.0 GiB VM-state, 135.1 MiB/s
2021-04-29 20:35:08 average migration speed: 128.3 MiB/s - downtime 38 ms
2021-04-29 20:35:08 migration status: completed
2021-04-29 20:35:10 migration finished successfully (duration 00:01:09)
TASK OK

Code:
# cat /etc/pve/datacenter.cfg
console: html5
keyboard: en-us
migration: insecure,network=10.42.15.0/24

Code:
# iperf3 -c 10.42.15.2
Connecting to host 10.42.15.2, port 5201
[  5] local 10.42.15.1 port 60998 connected to 10.42.15.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.24 GBytes  36.4 Gbits/sec   16   1.63 MBytes       
[  5]   1.00-2.00   sec  4.31 GBytes  37.1 Gbits/sec    0   1.67 MBytes       
[  5]   2.00-3.00   sec  3.42 GBytes  29.3 Gbits/sec    0   1.75 MBytes       
[  5]   3.00-4.00   sec  4.34 GBytes  37.3 Gbits/sec    0   1.77 MBytes       
[  5]   4.00-5.00   sec  4.43 GBytes  38.0 Gbits/sec    0   1.82 MBytes       
[  5]   5.00-6.00   sec  4.43 GBytes  38.0 Gbits/sec    0   1.83 MBytes       
[  5]   6.00-7.00   sec  4.37 GBytes  37.5 Gbits/sec    0   1.83 MBytes       
[  5]   7.00-8.00   sec  4.48 GBytes  38.5 Gbits/sec    0   1.85 MBytes       
[  5]   8.00-9.00   sec  4.48 GBytes  38.5 Gbits/sec    0   1.85 MBytes       
[  5]   9.00-10.00  sec  4.38 GBytes  37.7 Gbits/sec    0   1.85 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  42.9 GBytes  36.8 Gbits/sec   16             sender
[  5]   0.00-10.00  sec  42.9 GBytes  36.8 Gbits/sec                  receiver
 
After setting speed limit to 8GB/sec it's much better:
Code:
2021-05-01 14:08:47 use dedicated network address for sending migration traffic (192.168.122.5)
2021-05-01 14:08:47 starting migration of VM 117 to node 'asr5' (192.168.122.5)
2021-05-01 14:08:48 starting VM 117 on remote node 'asr5'
2021-05-01 14:08:50 start remote tunnel
2021-05-01 14:08:51 ssh tunnel ver 1
2021-05-01 14:08:51 starting online/live migration on tcp:192.168.122.5:60000
2021-05-01 14:08:51 set migration capabilities
2021-05-01 14:08:51 migration speed limit: 8.0 GiB/s
2021-05-01 14:08:51 migration downtime limit: 100 ms
2021-05-01 14:08:51 migration cachesize: 1.0 GiB
2021-05-01 14:08:51 set migration parameters
2021-05-01 14:08:51 start migrate command to tcp:192.168.122.5:60000
2021-05-01 14:08:52 migration active, transferred 1.1 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:53 migration active, transferred 2.2 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:54 migration active, transferred 3.4 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:55 migration active, transferred 4.5 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:56 migration active, transferred 5.7 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:57 migration active, transferred 7.3 GiB of 8.0 GiB VM-state, 1.2 GiB/s
2021-05-01 14:08:58 average migration speed: 1.1 GiB/s - downtime 141 ms
2021-05-01 14:08:58 migration status: completed
2021-05-01 14:09:01 migration finished successfully (duration 00:00:14)
TASK OK