Slow HA migration of VM with shared storage

dmitrynovice

Active Member
Sep 18, 2018
18
0
41
35
Hello.
HA migration of VM between two nodes was really slow (speed: 9.19 MB/s).
Total ram of VM is 8 gb.
Storage is shared (Ceph external).
Network speed between nodes is 10 Gbits/sec.

Usually migration speed is something like in this example: (1365.33 MB/s - downtime 90 ms).
Can you please explain what is caused that huge speed drop?
Nodes, storage, network devices are not overloaded.

datacenter.cfg:
migration: type=insecure,network=10.10.36.0/23

task started by HA resource agent
2019-04-03 16:10:12 use dedicated network address for sending migration traffic (10.10.36.44)
2019-04-03 16:10:12 starting migration of VM 139 to node 'prox4' (10.10.36.44)
2019-04-03 16:10:12 copying disk images
2019-04-03 16:10:12 starting VM 139 on remote node 'prox4'
2019-04-03 16:10:14 start remote tunnel
2019-04-03 16:10:14 ssh tunnel ver 1
2019-04-03 16:10:14 starting online/live migration on tcp:10.10.36.44:60000
2019-04-03 16:10:14 migrate_set_speed: 8589934592
2019-04-03 16:10:14 migrate_set_downtime: 0.1
2019-04-03 16:10:14 set migration_caps
2019-04-03 16:10:14 set cachesize: 1073741824
2019-04-03 16:10:14 start migrate command to tcp:10.10.36.44:60000
2019-04-03 16:10:15 migration status: active (transferred 7317239, remaining 8598532096), total 8607571968)
2019-04-03 16:10:15 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-04-03 16:10:16 migration status: active (transferred 24430966, remaining 8579624960), total 8607571968)
2019-04-03 16:10:16 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-04-03 16:10:17 migration status: active (transferred 41115421, remaining 8562257920), total 8607571968)
...
2019-04-03 16:25:05 migration speed: 9.19 MB/s - downtime 414 ms
2019-04-03 16:25:05 migration status: completed
2019-04-03 16:25:08 migration finished successfully (duration 00:14:56)
TASK OK

root@prox2:~# iperf -c 10.10.36.44
------------------------------------------------------------
Client connecting to 10.10.36.44, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 3] local 10.10.36.42 port 37470 connected with 10.10.36.44 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.5 GBytes 9.89 Gbits/sec


root@prox4:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.10.36.44 port 5001 connected with 10.10.36.42 port 37470
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 11.5 GBytes 9.89 Gbits/sec

proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-6 (running version: 5.3-6/37b3c8df)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-34
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
Migration of another vm between same nodes:

2019-04-03 17:51:48 migration speed: 1170.29 MB/s - downtime 140 ms
2019-04-03 17:51:48 migration status: completed
2019-04-03 17:51:51 migration finished successfully (duration 00:00:20)
TASK OK

VM has 16 gb of ram.
Weird...
 
Hi
Can you please explain what is caused that huge speed drop?
Yes.

The calculation is simple Memory size/migration time.
Memory size includes also unused memory.
If at the migration a memory block is changed it hat to retransmitted, but will not be count in the transfer speed.

2019-04-03 16:10:16 migration status: active (transferred 24430966, remaining 8579624960), total 8607571968)
2019-04-03 16:10:16 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-04-03 16:10:17 migration status: active (transferred 41115421, remaining 8562257920), total 8607571968)

If you look at transferred and remaining in the bracket you see what I mean.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!