Live migration is now possible with local ZFS

vetman

Active Member
Mar 16, 2014
32
5
28
I want to thank the Proxmox team for yet another amazing update with v5. This post is just for information for anyone out there running a ZFS cluster with no shared storage. It is possible to live migrate QEMU VMs but you cannot use the GUI (yet). If you attempt to migrate a running VM using ZFS local storage to another node running ZFS local storage you will be presented with this error:

2017-07-28 12:42:51 can't migrate local disk 'local-zfs:vm-100-disk-1': can't live migrate attached local disks without with-local-disks option

The fix is to simply use the shell to begin the migration like this:

qm migrate 100 otherhostname --online --with-local-disks

Note: you cannot already have a replication copy of the data on the destination node. If you already have a replication it might be faster to shutdown and do the offline migration. If you still want to do a live migration, remove the replication, wait until it has been deleted from the destination node, then run the qm migrate from the shell.

Anecdotally my initial testing on my homelab shows performance of ~240MB/s for replication over 10GbE, but online migration only ~50MB/s over the same lines. YMMV.
 

fireon

Famous Member
Oct 25, 2010
3,571
271
103
38
Austria/Graz
iteas.at
I've tested it here on two of us servers. And it didn't work sucessfully. Everytime the same error, after syncing all the disks.
Code:
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 15543 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2166758229, remaining 17178624), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 18506 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2178674444, remaining 6664192), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 21409 overflow 0
2017-12-25 15:50:12 migration status error: failed
2017-12-25 15:50:12 ERROR: online migrate failure - aborting
2017-12-25 15:50:12 aborting phase 2 - cleanup resources
2017-12-25 15:50:12 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2017/12/25 15:50:12 socat[20276] E write(5, 0x55bc57a350d0, 28): Broken pipe
Here is the vmconfig:
Code:
agent: 1
boot: dc bootdisk: scsi0
cores: 8
cpuunits: 1000
hotplug: disk,network,usb,memory
memory: 2048
name: utemplatetest
net0: virtio=66:FA:EF:1C:1B:4E,bridge=vmbr0,tag=40
numa: 1
ostype: l26
sata0: none,media=cdrom
scsi0: HDD-vmdata-KVM:base-112-disk-2,discard=on,size=32G
scsi1: HDD-vmdata-KVM:base-112-disk-1,discard=on,size=4G scsihw: virtio-scsi-pci smbios1: uuid=ea405629-b91f-4a80-833c-898b2093aa25
sockets: 1
vga: qxl
Did anyone know why that job is broken all the time?
 
Jan 18, 2017
97
2
8
37
I've tested it here on two of us servers. And it didn't work sucessfully. Everytime the same error, after syncing all the disks.
Code:
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 15543 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2166758229, remaining 17178624), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 18506 overflow 0
2017-12-25 15:50:12 migration status: active (transferred 2178674444, remaining 6664192), total 2282569728)
2017-12-25 15:50:12 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 21409 overflow 0
2017-12-25 15:50:12 migration status error: failed
2017-12-25 15:50:12 ERROR: online migrate failure - aborting
2017-12-25 15:50:12 aborting phase 2 - cleanup resources
2017-12-25 15:50:12 migrate_cancel
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi1: Done.
2017/12/25 15:50:12 socat[20276] E write(5, 0x55bc57a350d0, 28): Broken pipe
Here is the vmconfig:
Code:
agent: 1
boot: dc bootdisk: scsi0
cores: 8
cpuunits: 1000
hotplug: disk,network,usb,memory
memory: 2048
name: utemplatetest
net0: virtio=66:FA:EF:1C:1B:4E,bridge=vmbr0,tag=40
numa: 1
ostype: l26
sata0: none,media=cdrom
scsi0: HDD-vmdata-KVM:base-112-disk-2,discard=on,size=32G
scsi1: HDD-vmdata-KVM:base-112-disk-1,discard=on,size=4G scsihw: virtio-scsi-pci smbios1: uuid=ea405629-b91f-4a80-833c-898b2093aa25
sockets: 1
vga: qxl
Did anyone know why that job is broken all the time?

I didnt try it yet (the command prompt), but maybe thats why it isnt in the gui yet ;)
 

mira

Proxmox Staff Member
Staff member
Aug 1, 2018
608
53
33
@fireon What storage are you using? Is there enough disk space for both disks (full size, no thin provisioning)?
 

udo

Famous Member
Apr 22, 2009
5,904
168
83
Ahrensburg; Germany
@fireon What storage are you using? Is there enough disk space for both disks (full size, no thin provisioning)?
Hi,
IMHO this has nothing to to with disk space. One disk work without trouble, with two disks the migration fails often (allways?).


Since a long time I only use live migration with one local disks. You can also use live migration if the other disks are moved to shared/distributed storage, so that only one on local storage is left.

Udo
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
4,399
551
118
@udo: does it always fail with the same/similar error-message?:
Code:
2018-08-01 xx:xx:xx ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve1' root@xx.xx.xx.xx pvesm free local-lvm:vm-101-disk-2' failed: exit code 5

referring to https://bugzilla.proxmox.com/show_bug.cgi?id=1852 - And this we could reproduce with a thinprovisioned disk, where the target storage didn't have enough free space for the whole disk.

Else it could be a different bug you're running into...
 
  • Like
Reactions: fireon

pmontepagano

New Member
Jul 17, 2013
7
1
1
I've been able to do a live migration successfully with VMs that only have one disk using the CLI.

But it doesn't work if the VM has a snapshot. I get this message: "online storage migration not possible if snapshot exists".

Why is the snapshot a problem? With ZFS it should be quite easy to do a full send of the volume with all its snapshots.
 

VGusev2007

Active Member
May 24, 2010
93
5
28
Russia
I've been able to do a live migration successfully with VMs that only have one disk using the CLI.

But it doesn't work if the VM has a snapshot. I get this message: "online storage migration not possible if snapshot exists".

Why is the snapshot a problem? With ZFS it should be quite easy to do a full send of the volume with all its snapshots.
I suppose that live migration is qemu-kvm feature, not storage feature.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!