Feature request: "2-steps" cold migration with LXC on ZFS

suisse

Renowned Member
Sep 11, 2015
3
0
66
Hello,

Would you consider implementing a cold migration for LXC containers on ZFS that works in two steps, like this:

- snapshot the filesystems, transfer the snapshot to the new host
- stop the container, transfer the snapshot delta
- start the container on the new node

... sort of like the double-rsync strategy that you had with OpenVZ?

Since we don't have live migration anymore with LXC, I feel that this would be a great way to reduce the downtime of cold migrations, especially for large containers.

Best regards,
Arthur
 
We already got this.

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pvesr

Here is a detailed log of a restart migration (CT running on ZFS, with replication enabled).

Code:
2018-10-08 17:54:36 shutdown CT 100
2018-10-08 17:54:38 starting migration of CT 100 to node 'pve-6-34' (192.168.6.34)
2018-10-08 17:54:38 found local volume 'local-zfs:subvol-100-disk-1' (in current VM config)
2018-10-08 17:54:38 start replication job
2018-10-08 17:54:38 guest => CT 100, running => 0
2018-10-08 17:54:38 volumes => local-zfs:subvol-100-disk-1
2018-10-08 17:54:39 create snapshot '__replicate_100-0_1539014078__' on local-zfs:subvol-100-disk-1
2018-10-08 17:54:39 incremental sync 'local-zfs:subvol-100-disk-1' (__replicate_100-0_1539014040__ => __replicate_100-0_1539014078__)
2018-10-08 17:54:39 send from @__replicate_100-0_1539014040__ to rpool/data/subvol-100-disk-1@__replicate_100-0_1539014078__ estimated size is 875K
2018-10-08 17:54:39 total estimated size is 875K
2018-10-08 17:54:39 TIME        SENT   SNAPSHOT
2018-10-08 17:54:39 rpool/data/subvol-100-disk-1@__replicate_100-0_1539014040__    name    rpool/data/subvol-100-disk-1@__replicate_100-0_1539014040__    -
2018-10-08 17:54:39 delete previous replication snapshot '__replicate_100-0_1539014040__' on local-zfs:subvol-100-disk-1
2018-10-08 17:54:40 (remote_finalize_local_job) delete stale replication snapshot '__replicate_100-0_1539014040__' on local-zfs:subvol-100-disk-1
2018-10-08 17:54:40 end replication job
2018-10-08 17:54:40 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-6-34' root@192.168.6.34 pvesr set-state 100 \''{"local/pve-6-33":{"last_try":1539014078,"last_sync":1539014078,"duration":1.510501,"last_node":"pve-6-33","fail_count":0,"storeid_list":["local-zfs"],"last_iteration":1539014078}}'\'
2018-10-08 17:54:40 start final cleanup
2018-10-08 17:54:41 start container on target node
2018-10-08 17:54:41 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve-6-34' root@192.168.6.34 pct start 100
2018-10-08 17:54:42 migration finished successfully (duration 00:00:06)
TASK OK
 
Thanks; I'd read about Storage Replication but missed the fact that it affected migrations as well. That's really cool :-)

Does that mean that I need replication enabled on a schedule for my containers though? That would be a bit inconvenient, I don't want my containers to be always replicated. For my use case, it would be awesome to be able to one-shot the replication+migration without creating a schedule :-)

Best,
Arthur
 
Thanks; I'd read about Storage Replication but missed the fact that it affected migrations as well. That's really cool :)

yes!

Does that mean that I need replication enabled on a schedule for my containers though? That would be a bit inconvenient, I don't want my containers to be always replicated. For my use case, it would be awesome to be able to one-shot the replication+migration without creating a schedule :)

Best,
Arthur

If you have no snapshot on the target node, the full data have to be sent. You will see no speed improvement.

Please file a feature request (enhancement request) on https://bugzilla.proxmox.com
 
  • Like
Reactions: MikeWebb