Live migration is now possible with local ZFS

mram · Jul 28, 2017

I want to thank the Proxmox team for yet another amazing update with v5. This post is just for information for anyone out there running a ZFS cluster with no shared storage. It is possible to live migrate QEMU VMs but you cannot use the GUI (yet). If you attempt to migrate a running VM using ZFS local storage to another node running ZFS local storage you will be presented with this error:

2017-07-28 12:42:51 can't migrate local disk 'local-zfs:vm-100-disk-1': can't live migrate attached local disks without with-local-disks option

The fix is to simply use the shell to begin the migration like this:

qm migrate 100 otherhostname --online --with-local-disks

Note: you cannot already have a replication copy of the data on the destination node. If you already have a replication it might be faster to shutdown and do the offline migration. If you still want to do a live migration, remove the replication, wait until it has been deleted from the destination node, then run the qm migrate from the shell.

Anecdotally my initial testing on my homelab shows performance of ~240MB/s for replication over 10GbE, but online migration only ~50MB/s over the same lines. YMMV.

Bart Brouwer · Dec 13, 2017

vetman said:
qm migrate 100 otherhostname --online --with-local-disks

Nice, is this possible with the GUI now?

VGusev2007 · Dec 25, 2017

No. You need to use a console for that now.

fireon · Dec 25, 2017

I've tested it here on two of us servers. And it didn't work sucessfully. Everytime the same error, after syncing all the disks.

Code:

2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 15543 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2166758229, remaining 17178624), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 18506 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2178674444, remaining 6664192), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 21409 overflow 0
2017-12-25 15:50:12 migration status error: failed
2017-12-25 15:50:12 ERROR: online migrate failure - aborting
2017-12-25 15:50:12 aborting phase 2 - cleanup resources
2017-12-25 15:50:12 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2017/12/25 15:50:12 socat[20276] E write(5, 0x55bc57a350d0, 28): Broken pipe

Here is the vmconfig:

Code:

agent: 1
boot: dc bootdisk: scsi0
cores: 8
cpuunits: 1000
hotplug: disk,network,usb,memory
memory: 2048
name: utemplatetest
net0: virtio=66:FA:EF:1C:1B:4E,bridge=vmbr0,tag=40
numa: 1
ostype: l26
sata0: none,media=cdrom
scsi0: HDD-vmdata-KVM:base-112-disk-2,discard=on,size=32G
scsi1: HDD-vmdata-KVM:base-112-disk-1,discard=on,size=4G scsihw: virtio-scsi-pci smbios1: uuid=ea405629-b91f-4a80-833c-898b2093aa25
sockets: 1
vga: qxl

Did anyone know why that job is broken all the time?

Bart Brouwer · Jan 6, 2018

fireon said:

I've tested it here on two of us servers. And it didn't work sucessfully. Everytime the same error, after syncing all the disks.

Code:

2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 15543 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2166758229, remaining 17178624), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 18506 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2178674444, remaining 6664192), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 21409 overflow 0
2017-12-25 15:50:12 migration status error: failed
2017-12-25 15:50:12 ERROR: online migrate failure - aborting
2017-12-25 15:50:12 aborting phase 2 - cleanup resources
2017-12-25 15:50:12 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2017/12/25 15:50:12 socat[20276] E write(5, 0x55bc57a350d0, 28): Broken pipe

Here is the vmconfig:

Code:

agent: 1
boot: dc bootdisk: scsi0
cores: 8
cpuunits: 1000
hotplug: disk,network,usb,memory
memory: 2048
name: utemplatetest
net0: virtio=66:FA:EF:1C:1B:4E,bridge=vmbr0,tag=40
numa: 1
ostype: l26
sata0: none,media=cdrom
scsi0: HDD-vmdata-KVM:base-112-disk-2,discard=on,size=32G
scsi1: HDD-vmdata-KVM:base-112-disk-1,discard=on,size=4G scsihw: virtio-scsi-pci smbios1: uuid=ea405629-b91f-4a80-833c-898b2093aa25
sockets: 1
vga: qxl

Did anyone know why that job is broken all the time?

I didnt try it yet (the command prompt), but maybe thats why it isnt in the gui yet

fireon · Jan 6, 2018

Bart Brouwer said:
I didnt try it yet (the command prompt), but maybe thats why it isnt in the gui yet

Maybe

mailinglists · Jan 8, 2018

Can you try it on a VM which has only one disk?

fireon · Jan 9, 2018

mailinglists said:
Can you try it on a VM which has only one disk?

Yes, but for this i must install a new one. Here all (templates included) have two hdd's. So not before the weekend.

andy77 · Aug 2, 2018

Any news on that? Is the live migration yet working from the shell on ZFS?

guletz · Aug 2, 2018

For me, is working from shell for any VM who has only one vdisk.

fireon · Aug 2, 2018

guletz said:
For me, is working from shell for any VM who has only one vdisk.

Me too.

mira · Aug 8, 2018

@fireon What storage are you using? Is there enough disk space for both disks (full size, no thin provisioning)?

fireon · Aug 8, 2018

I use thin provisioning with ZFS.

udo · Aug 8, 2018

dlimbeck said:
@fireon What storage are you using? Is there enough disk space for both disks (full size, no thin provisioning)?

Hi,
IMHO this has nothing to to with disk space. One disk work without trouble, with two disks the migration fails often (allways?).

Since a long time I only use live migration with one local disks. You can also use live migration if the other disks are moved to shared/distributed storage, so that only one on local storage is left.

Udo

Stoiko Ivanov · Aug 9, 2018

@udo: does it always fail with the same/similar error-message?:

Code:

2018-08-01 xx:xx:xx ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve1' root@xx.xx.xx.xx pvesm free local-lvm:vm-101-disk-2' failed: exit code 5

referring to https://bugzilla.proxmox.com/show_bug.cgi?id=1852 - And this we could reproduce with a thinprovisioned disk, where the target storage didn't have enough free space for the whole disk.

Else it could be a different bug you're running into...

pmontepagano · Dec 12, 2018

I've been able to do a live migration successfully with VMs that only have one disk using the CLI.

But it doesn't work if the VM has a snapshot. I get this message: "online storage migration not possible if snapshot exists".

Why is the snapshot a problem? With ZFS it should be quite easy to do a full send of the volume with all its snapshots.

VGusev2007 · Dec 12, 2018

pmontepagano said:
I've been able to do a live migration successfully with VMs that only have one disk using the CLI.

But it doesn't work if the VM has a snapshot. I get this message: "online storage migration not possible if snapshot exists".

Why is the snapshot a problem? With ZFS it should be quite easy to do a full send of the volume with all its snapshots.

I suppose that live migration is qemu-kvm feature, not storage feature.

tschanness · Dec 12, 2018

correct.

Search

Search

Live migration is now possible with local ZFS

mram

Renowned Member

Bart Brouwer

Member

VGusev2007

Renowned Member

fireon

Distinguished Member

Bart Brouwer

Member

fireon

Distinguished Member

mailinglists

Renowned Member

fireon

Distinguished Member

andy77

Renowned Member

guletz

Famous Member

fireon

Distinguished Member

mira

Proxmox Staff Member

fireon

Distinguished Member

udo

Distinguished Member

Stoiko Ivanov

Proxmox Staff Member

pmontepagano

New Member

VGusev2007

Renowned Member

tschanness

Member