Hi collegues,
i would like to ask you about migration speed between PVE cluster nodes.
I have a 3-node PVE 8 cluster with 2x40G network links: one for CEPH cluster (1) and another one for PVE cluster/CEPH public network (2).
CEPH OSDs is all-nvme.
In cluster options i've set also one of these 40G netwoks for migration (2)
When i migrating VM with 32G RAM from one node to another i get these results:
After some googling i've set "insecure" type for migration:
After this change result is:
Speed is increased but whis is a way too far from capabilities of 40G network and disk speed:
May be there is more parameters i should use too speed up a migration process?
i would like to ask you about migration speed between PVE cluster nodes.
I have a 3-node PVE 8 cluster with 2x40G network links: one for CEPH cluster (1) and another one for PVE cluster/CEPH public network (2).
CEPH OSDs is all-nvme.
In cluster options i've set also one of these 40G netwoks for migration (2)
When i migrating VM with 32G RAM from one node to another i get these results:
Code:
2023-09-18 11:01:18 starting migration of VM 100 to node 'pve-up-1' (10.100.41.30)
2023-09-18 11:01:18 starting VM 100 on remote node 'pve-up-1'
2023-09-18 11:01:20 start remote tunnel
2023-09-18 11:01:20 ssh tunnel ver 1
2023-09-18 11:01:20 starting online/live migration on unix:/run/qemu-server/100.migrate
2023-09-18 11:01:20 set migration capabilities
2023-09-18 11:01:20 migration downtime limit: 100 ms
2023-09-18 11:01:20 migration cachesize: 4.0 GiB
...
2023-09-18 11:02:20 xbzrle: send updates to 65487 pages in 7.9 MiB encoded memory, cache-miss 84.62%, overflow 604
2023-09-18 11:02:21 auto-increased downtime to continue migration: 200 ms
2023-09-18 11:02:22 migration active, transferred 28.6 GiB of 32.1 GiB VM-state, 287.9 MiB/s
2023-09-18 11:02:22 xbzrle: send updates to 190269 pages in 50.6 MiB encoded memory, cache-miss 34.56%, overflow 861
2023-09-18 11:02:24 migration active, transferred 28.6 GiB of 32.1 GiB VM-state, 151.9 MiB/s, VM dirties lots of memory: 299.1 MiB/s
2023-09-18 11:02:24 xbzrle: send updates to 306352 pages in 60.5 MiB encoded memory, cache-miss 27.66%, overflow 1180
2023-09-18 11:02:25 auto-increased downtime to continue migration: 400 ms
2023-09-18 11:02:27 average migration speed: 491.1 MiB/s - downtime 531 ms
2023-09-18 11:02:27 migration status: completed
2023-09-18 11:02:28 Waiting for spice server migration
2023-09-18 11:02:30 migration finished successfully (duration 00:01:13)
After some googling i've set "insecure" type for migration:
Code:
/etc/pve/datacenter.cfg
migration: network=10.100.41.10/24,type=insecure
After this change result is:
Code:
2023-09-18 11:18:03 use dedicated network address for sending migration traffic (10.100.41.20)
2023-09-18 11:18:03 starting migration of VM 100 to node 'pve-down-2' (10.100.41.20)
2023-09-18 11:18:03 starting VM 100 on remote node 'pve-down-2'
2023-09-18 11:18:05 start remote tunnel
2023-09-18 11:18:06 ssh tunnel ver 1
2023-09-18 11:18:06 starting online/live migration on tcp:10.100.41.20:60000
2023-09-18 11:18:06 set migration capabilities
2023-09-18 11:18:06 migration downtime limit: 100 ms
2023-09-18 11:18:06 migration cachesize: 4.0 GiB
...
2023-09-18 11:18:37 xbzrle: send updates to 47084 pages in 4.1 MiB encoded memory, cache-miss 78.66%, overflow 294
2023-09-18 11:18:38 auto-increased downtime to continue migration: 200 ms
2023-09-18 11:18:39 migration active, transferred 28.2 GiB of 32.1 GiB VM-state, 179.1 MiB/s
2023-09-18 11:18:39 xbzrle: send updates to 129779 pages in 20.8 MiB encoded memory, cache-miss 16.74%, overflow 357
2023-09-18 11:18:40 auto-increased downtime to continue migration: 400 ms
2023-09-18 11:18:41 migration active, transferred 28.2 GiB of 32.1 GiB VM-state, 104.6 MiB/s, VM dirties lots of memory: 177.4 MiB/s
2023-09-18 11:18:41 xbzrle: send updates to 220757 pages in 42.1 MiB encoded memory, cache-miss 20.16%, overflow 424
2023-09-18 11:18:42 auto-increased downtime to continue migration: 800 ms
2023-09-18 11:18:44 migration active, transferred 28.3 GiB of 32.1 GiB VM-state, 339.8 MiB/s
2023-09-18 11:18:44 xbzrle: send updates to 339327 pages in 50.7 MiB encoded memory, cache-miss 21.37%, overflow 620
2023-09-18 11:18:46 average migration speed: 822.5 MiB/s - downtime 472 ms
2023-09-18 11:18:46 migration status: completed
2023-09-18 11:18:47 Waiting for spice server migration
2023-09-18 11:18:49 migration finished successfully (duration 00:00:47)
Speed is increased but whis is a way too far from capabilities of 40G network and disk speed:
Code:
root@pve-down-1:/tmp# rados bench -p scbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve-down-1_1588020
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 615 599 2395.85 2396 0.0213653 0.0262351
2 16 1238 1222 2443.8 2492 0.164208 0.0239879
3 16 1605 1589 2118.48 1468 0.0379306 0.0300604
4 16 2261 2245 2244.8 2624 0.0292071 0.0283965
5 16 2949 2933 2346.19 2752 0.015009 0.0272138
6 16 3687 3671 2447.11 2952 0.0131768 0.0261067
7 16 4417 4401 2514.63 2920 0.0262535 0.02521
8 16 5089 5073 2536.27 2688 0.010768 0.0251883
9 16 5756 5740 2550.88 2668 0.0152786 0.0250424
10 15 6397 6382 2552.56 2568 0.00827939 0.0247853
11 6 6397 6391 2323.79 36 0.218062 0.0248901
12 6 6397 6391 2130.14 0 - 0.0248901
Total time run: 12.5638
Total writes made: 6397
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 2036.65
Stddev Bandwidth: 1056.8
Max bandwidth (MB/sec): 2952
Min bandwidth (MB/sec): 0
Average IOPS: 509
Stddev IOPS: 264.2
Max IOPS: 738
Min IOPS: 0
Average Latency(s): 0.0274665
Stddev Latency(s): 0.0937761
Max latency(s): 3.00742
Min latency(s): 0.00576705
May be there is more parameters i should use too speed up a migration process?