Migration of VM with replication job not possible, why?

carsten2

Well-Known Member
Mar 25, 2017
249
20
58
55
Currently migration of a VM with replication job is not possible but with a LXC-container it is (although only cold migration). Why? Proxmox should do the same thing with VMs as with LXCs: shutdown at local server, replicate, startup at remote server.

Improvement: To minimize the migration also in the currently working LXC, the actions should be actually:
replicate, shutdown, replicate, startup

Also, why is it not possible to do a real live migration with replication?

Another near-live-migration-method whould be to snapshot the VM including ram, replicate the vm and restart it.
 
Hi
Also, why is it not possible to do a real live migration with replication?
Because it is not implemented.
Qemu contains no function for dirty bit map for delta sync.
 
What about the other questions:

1) Why is it possible to warm-migrate LXC but not VMs? The process should be the same
2) What about the improvement "replicate-shutdown-replicate-startup" to minimize downtime?
3) What about the possibility to use migrate a paused or suspended VM/LXC?
 
Q: Why is it possible to warm-migrate LXC but not VMs? The process should be the same
A: Qemu contains no function for dirty bit map for delta sync?

I am not talking about live migration, where some feature in QEMU might be missing, but about a migration of VMs in the same way as with running LXC containers, i.e. by automatically stopping, replicate, restarting the vm. It works in LXC and would work with VMs also, but proxmox doesn't allow it.

2) If you set up replication beforehand, this is exactly what happens on offline migration.

I dont think so, see the example migration log below, it starts with "shutdown" immediately, instead of first replicate as long the LXC is running, then shutdown, then replicate the rest and then starting on the remote server:

2019-11-20 16:32:47 shutdown CT 110
2019-11-20 16:32:49 starting migration of CT 110 to node 'hal9034' (192.168.42.34)
2019-11-20 16:32:49 found local volume 'vmdata1zfs:subvol-110-disk-0' (in current VM config)
2019-11-20 16:32:49 start replication job
2019-11-20 16:32:49 guest => CT 110, running => 0
2019-11-20 16:32:49 volumes => vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:50 create snapshot '__replicate_110-0_1574263969__' on vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:50 incremental sync 'vmdata1zfs:subvol-110-disk-0' (__replicate_110-0_1574263838__ => __replicate_110-0_1574263969__)
2019-11-20 16:32:51 data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263838__ name data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263838__ -
2019-11-20 16:32:51 send from @__replicate_110-0_1574263838__ to data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263969__ estimated size is 6.96M
2019-11-20 16:32:51 total estimated size is 6.96M
2019-11-20 16:32:51 TIME SENT SNAPSHOT data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263969__
2019-11-20 16:32:52 delete previous replication snapshot '__replicate_110-0_1574263838__' on vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:53 (remote_finalize_local_job) delete stale replication snapshot '__replicate_110-0_1574263838__' on vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:53 end replication job
2019-11-20 16:32:53 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=hal9034' root@192.168.42.34 pvesr set-state 110 \''{"local/hal9030":{"last_node":"hal9030","fail_count":0,"last_try":1574263969,"last_sync":1574263969,"duration":4.037778,"last_iteration":1574263969,"storeid_list":["vmdata1zfs"]}}'\'
2019-11-20 16:32:54 start final cleanup
2019-11-20 16:32:54 start container on target node
2019-11-20 16:32:54 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=hal9034' root@192.168.42.34 pct start 110
2019-11-20 16:32:56 migration finished successfully (duration 00:00:09)
TASK OK
 
If you set up one minute replication job, the max time to sync would be data written in last 60 seconds.
This is pretty much the same as sync before send.
 
If you set up one minute replication job, the max time to sync would be data written in last 60 seconds.
This is pretty much the same as sync before send.

There is always a workaround for things a software should do automatically. The change is trivial and would avoid having to manually reonfigure things.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!