Migration of VM with replication job not possible, why?

carsten2 · Nov 19, 2019

Currently migration of a VM with replication job is not possible but with a LXC-container it is (although only cold migration). Why? Proxmox should do the same thing with VMs as with LXCs: shutdown at local server, replicate, startup at remote server.

Improvement: To minimize the migration also in the currently working LXC, the actions should be actually:
replicate, shutdown, replicate, startup

Also, why is it not possible to do a real live migration with replication?

Another near-live-migration-method whould be to snapshot the VM including ram, replicate the vm and restart it.

wolfgang · Nov 20, 2019

Hi

carsten2 said:
Also, why is it not possible to do a real live migration with replication?

Because it is not implemented.
Qemu contains no function for dirty bit map for delta sync.

carsten2 · Nov 20, 2019

What about the other questions:

1) Why is it possible to warm-migrate LXC but not VMs? The process should be the same
2) What about the improvement "replicate-shutdown-replicate-startup" to minimize downtime?
3) What about the possibility to use migrate a paused or suspended VM/LXC?

mailinglists · Nov 20, 2019

1) Qemu contains no function for dirty bit map for delta sync?
2) If you set up replication beforehand, this is exactly what happens on offline migration.
3) See: https://bugzilla.proxmox.com/show_bug.cgi?id=2252 .

carsten2 · Nov 20, 2019

mailinglists said:
Q: Why is it possible to warm-migrate LXC but not VMs? The process should be the same
A: Qemu contains no function for dirty bit map for delta sync?

I am not talking about live migration, where some feature in QEMU might be missing, but about a migration of VMs in the same way as with running LXC containers, i.e. by automatically stopping, replicate, restarting the vm. It works in LXC and would work with VMs also, but proxmox doesn't allow it.

mailinglists said:
2) If you set up replication beforehand, this is exactly what happens on offline migration.

I dont think so, see the example migration log below, it starts with "shutdown" immediately, instead of first replicate as long the LXC is running, then shutdown, then replicate the rest and then starting on the remote server:

2019-11-20 16:32:47 shutdown CT 110
2019-11-20 16:32:49 starting migration of CT 110 to node 'hal9034' (192.168.42.34)
2019-11-20 16:32:49 found local volume 'vmdata1zfs:subvol-110-disk-0' (in current VM config)
2019-11-20 16:32:49 start replication job
2019-11-20 16:32:49 guest => CT 110, running => 0
2019-11-20 16:32:49 volumes => vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:50 create snapshot '__replicate_110-0_1574263969__' on vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:50 incremental sync 'vmdata1zfs:subvol-110-disk-0' (__replicate_110-0_1574263838__ => __replicate_110-0_1574263969__)
2019-11-20 16:32:51 data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263838__ name data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263838__ -
2019-11-20 16:32:51 send from @__replicate_110-0_1574263838__ to data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263969__ estimated size is 6.96M
2019-11-20 16:32:51 total estimated size is 6.96M
2019-11-20 16:32:51 TIME SENT SNAPSHOT data1/vmdata1/subvol-110-disk-0@__replicate_110-0_1574263969__
2019-11-20 16:32:52 delete previous replication snapshot '__replicate_110-0_1574263838__' on vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:53 (remote_finalize_local_job) delete stale replication snapshot '__replicate_110-0_1574263838__' on vmdata1zfs:subvol-110-disk-0
2019-11-20 16:32:53 end replication job
2019-11-20 16:32:53 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=hal9034' root@192.168.42.34 pvesr set-state 110 \''{"local/hal9030":{"last_node":"hal9030","fail_count":0,"last_try":1574263969,"last_sync":1574263969,"duration":4.037778,"last_iteration":1574263969,"storeid_list":["vmdata1zfs"]}}'\'
2019-11-20 16:32:54 start final cleanup
2019-11-20 16:32:54 start container on target node
2019-11-20 16:32:54 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=hal9034' root@192.168.42.34 pct start 110
2019-11-20 16:32:56 migration finished successfully (duration 00:00:09)
TASK OK

mailinglists · Nov 20, 2019

If you set up one minute replication job, the max time to sync would be data written in last 60 seconds.
This is pretty much the same as sync before send.

carsten2 · Nov 20, 2019

mailinglists said:
If you set up one minute replication job, the max time to sync would be data written in last 60 seconds.
This is pretty much the same as sync before send.

There is always a workaround for things a software should do automatically. The change is trivial and would avoid having to manually reonfigure things.

mailinglists · Nov 21, 2019

if it is so trivial, please do contribute and submit the code yourself.

Search

Search

Migration of VM with replication job not possible, why?

carsten2

Renowned Member

wolfgang

Proxmox Retired Staff

carsten2

Renowned Member

mailinglists

Renowned Member

carsten2

Renowned Member

mailinglists

Renowned Member

carsten2

Renowned Member

mailinglists

Renowned Member

We value your privacy