Live migration is now possible with local ZFS

mram

Renowned Member
Mar 16, 2014
76
20
73
I want to thank the Proxmox team for yet another amazing update with v5. This post is just for information for anyone out there running a ZFS cluster with no shared storage. It is possible to live migrate QEMU VMs but you cannot use the GUI (yet). If you attempt to migrate a running VM using ZFS local storage to another node running ZFS local storage you will be presented with this error:

2017-07-28 12:42:51 can't migrate local disk 'local-zfs:vm-100-disk-1': can't live migrate attached local disks without with-local-disks option

The fix is to simply use the shell to begin the migration like this:

qm migrate 100 otherhostname --online --with-local-disks

Note: you cannot already have a replication copy of the data on the destination node. If you already have a replication it might be faster to shutdown and do the offline migration. If you still want to do a live migration, remove the replication, wait until it has been deleted from the destination node, then run the qm migrate from the shell.

Anecdotally my initial testing on my homelab shows performance of ~240MB/s for replication over 10GbE, but online migration only ~50MB/s over the same lines. YMMV.
 
I've tested it here on two of us servers. And it didn't work sucessfully. Everytime the same error, after syncing all the disks.
Code:
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 15543 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2166758229, remaining 17178624), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 18506 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2178674444, remaining 6664192), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 21409 overflow 0
2017-12-25 15:50:12 migration status error: failed
2017-12-25 15:50:12 ERROR: online migrate failure - aborting
2017-12-25 15:50:12 aborting phase 2 - cleanup resources
2017-12-25 15:50:12 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2017/12/25 15:50:12 socat[20276] E write(5, 0x55bc57a350d0, 28): Broken pipe
Here is the vmconfig:
Code:
agent: 1
boot: dc bootdisk: scsi0
cores: 8
cpuunits: 1000
hotplug: disk,network,usb,memory
memory: 2048
name: utemplatetest
net0: virtio=66:FA:EF:1C:1B:4E,bridge=vmbr0,tag=40
numa: 1
ostype: l26
sata0: none,media=cdrom
scsi0: HDD-vmdata-KVM:base-112-disk-2,discard=on,size=32G
scsi1: HDD-vmdata-KVM:base-112-disk-1,discard=on,size=4G scsihw: virtio-scsi-pci smbios1: uuid=ea405629-b91f-4a80-833c-898b2093aa25
sockets: 1
vga: qxl
Did anyone know why that job is broken all the time?
 
I've tested it here on two of us servers. And it didn't work sucessfully. Everytime the same error, after syncing all the disks.
Code:
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 15543 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2166758229, remaining 17178624), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 18506 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2178674444, remaining 6664192), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 21409 overflow 0
2017-12-25 15:50:12 migration status error: failed
2017-12-25 15:50:12 ERROR: online migrate failure - aborting
2017-12-25 15:50:12 aborting phase 2 - cleanup resources
2017-12-25 15:50:12 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2017/12/25 15:50:12 socat[20276] E write(5, 0x55bc57a350d0, 28): Broken pipe
Here is the vmconfig:
Code:
agent: 1
boot: dc bootdisk: scsi0
cores: 8
cpuunits: 1000
hotplug: disk,network,usb,memory
memory: 2048
name: utemplatetest
net0: virtio=66:FA:EF:1C:1B:4E,bridge=vmbr0,tag=40
numa: 1
ostype: l26
sata0: none,media=cdrom
scsi0: HDD-vmdata-KVM:base-112-disk-2,discard=on,size=32G
scsi1: HDD-vmdata-KVM:base-112-disk-1,discard=on,size=4G scsihw: virtio-scsi-pci smbios1: uuid=ea405629-b91f-4a80-833c-898b2093aa25
sockets: 1
vga: qxl
Did anyone know why that job is broken all the time?

I didnt try it yet (the command prompt), but maybe thats why it isnt in the gui yet ;)
 
@fireon What storage are you using? Is there enough disk space for both disks (full size, no thin provisioning)?
 
@fireon What storage are you using? Is there enough disk space for both disks (full size, no thin provisioning)?
Hi,
IMHO this has nothing to to with disk space. One disk work without trouble, with two disks the migration fails often (allways?).


Since a long time I only use live migration with one local disks. You can also use live migration if the other disks are moved to shared/distributed storage, so that only one on local storage is left.

Udo
 
@udo: does it always fail with the same/similar error-message?:
Code:
2018-08-01 xx:xx:xx ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve1' root@xx.xx.xx.xx pvesm free local-lvm:vm-101-disk-2' failed: exit code 5

referring to https://bugzilla.proxmox.com/show_bug.cgi?id=1852 - And this we could reproduce with a thinprovisioned disk, where the target storage didn't have enough free space for the whole disk.

Else it could be a different bug you're running into...
 
  • Like
Reactions: fireon
I've been able to do a live migration successfully with VMs that only have one disk using the CLI.

But it doesn't work if the VM has a snapshot. I get this message: "online storage migration not possible if snapshot exists".

Why is the snapshot a problem? With ZFS it should be quite easy to do a full send of the volume with all its snapshots.
 
I've been able to do a live migration successfully with VMs that only have one disk using the CLI.

But it doesn't work if the VM has a snapshot. I get this message: "online storage migration not possible if snapshot exists".

Why is the snapshot a problem? With ZFS it should be quite easy to do a full send of the volume with all its snapshots.
I suppose that live migration is qemu-kvm feature, not storage feature.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!