Storage Migration

Machi · Sep 3, 2018

Hello,

I would like to like does Proxmox support Storage Migration including live Migration? Such as

- Ceph <--> Ceph
- Ceph <--> local LVM
- local LVM <---> Local LVM

Thanks!

LnxBil · Sep 3, 2018

I haven't checked all migration types you asked for, but in general it works flawlessly and online, but only for KVM-based VMs.

I did an online SAN switch from an old to a new system without any downtime.

spirit · Sep 3, 2018

Machi said:
Hello,

I would like to like does Proxmox support Storage Migration including live Migration? Such as

- Ceph <--> Ceph
- Ceph <--> local LVM
- local LVM <---> Local LVM

Thanks!

vm live migration + storage migration at the same time is available but command line only (and only with local storage as source)

qm migrate <vmid> <target> --with-local-disks --targetstorage yourtargetstorageonremotehost

udo · Sep 4, 2018

LnxBil said:
I haven't checked all migration types you asked for, but in general it works flawlessly and online, but only for KVM-based VMs.

I did an online SAN switch from an old to a new system without any downtime.

Hi,
but sparse files aren't anymore sparse after migration....

Udo

udo · Sep 4, 2018

spirit said:
vm live migration + storage migration at the same time is available but command line only (and only with local storage as source)

qm migrate <vmid> <target> --with-local-disks --targetstorage yourtargetstorageonremotehost

Hi,
I assume also with an different target storage, the online migration will work reliable with on disk only?
Or is the "multible disk migration"-bug resolved?

Udo

spirit · Sep 4, 2018

udo said:
Hi,
but sparse files aren't anymore sparse after migration....

Udo

with livemigration + storage migration, yes. (because of nbd protocol, maybe it will be fixed in qemu 3.0).
for classic storage migration, it's depend of the storage. (ceph for example, isn't sparse after migration).

As workaround, in last proxmox update, they are new guest agent feature (agent: 1,fstrim_cloned_disks=1) , to do an fstrim through qemu agent after the migration. (if you have virtio-scsi + discard).

spirit · Sep 4, 2018

udo said:
Hi,
I assume also with an different target storage, the online migration will work reliable with on disk only?
Or is the "multible disk migration"-bug resolved?

Udo

It's working with multiple disks, but without iothreads. (I have tested it on more than 100vm migration).
I need to test it again with qemu 3.0 for iothreads. (should be available soon in proxmox)

mir · Sep 4, 2018

spirit said:
with livemigration + storage migration, yes. (because of nbd protocol, maybe it will be fixed in qemu 3.0).
for classic storage migration, it's depend of the storage. (ceph for example, isn't sparse after migration).

As workaround, in last proxmox update, they are new guest agent feature (agent: 1,fstrim_cloned_disks=1) , to do an fstrim through qemu agent after the migration. (if you have virtio-scsi + discard).

If you run fstrim there is no need for discard. fstrim does not rely on nor does it use discard. discard is only used in combination with a filesystem delete.

spirit · Sep 4, 2018

mir said:
If you run fstrim there is no need for discard. fstrim does not rely on nor does it use discard. discard is only used in combination with a filesystem delete.

That's not true.

fstrim use FIMTRIM ioctl, and you need discard support on the device. (I'm not talking about discard option in /etc/fstab).

for ext4 for example:

ext4: check if device support discard in FITRIM ioctl
http://patchwork.ozlabs.org/patch/83271/

if your device don't have discard support, fstrim will stop with
FITRIM ioctl failed: Operation not supported

udo · Sep 4, 2018

spirit said:
It's working with multiple disks, but without iothreads. (I have tested it on more than 100vm migration).
I need to test it again with qemu 3.0 for iothreads. (should be available soon in proxmox)

Hi Spirit,
sorry - don't work.

Just tried an online-migration (1Gb connection), which failed.

Code:

root@pve01:~# cat /etc/pve/qemu-server/210.conf
boot: cd
bootdisk: scsi0
cores: 2
cpu: kvm64,flags=+pcid
hotplug: 1
lock: migrate
memory: 2048
name: vdb02
net0: virtio=6A:C6:68:EA:F8:8F,bridge=vmbr0,tag=2
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-210-disk-1,format=raw,size=25G
scsi1: local-zfs:vm-210-disk-2,format=raw,size=41G
scsihw: virtio-scsi-pci
serial0: socket
sockets: 1

Code:

root@pve01:~# qm migrate 210 pve02 --online --with-local-disks
2018-09-04 11:37:01 starting migration of VM 210 to node 'pve02' (10.x.x.12)
2018-09-04 11:37:01 found local disk 'local-zfs:vm-210-disk-1' (in current VM config)
2018-09-04 11:37:01 found local disk 'local-zfs:vm-210-disk-2' (in current VM config)
2018-09-04 11:37:01 copying disk images
2018-09-04 11:37:01 starting VM 210 on remote node 'pve02'
2018-09-04 11:37:05 start remote tunnel
2018-09-04 11:37:06 ssh tunnel ver 1
2018-09-04 11:37:06 starting storage migration
2018-09-04 11:37:06 scsi1: start migration to nbd:10.x.x.12:60001:exportname=drive-scsi1
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 44023414784 bytes total: 44023414784 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 111149056 bytes remaining: 43912265728 bytes total: 44023414784 bytes progression: 0.25 % busy: 1 ready: 0
drive-scsi1: transferred: 228589568 bytes remaining: 43794825216 bytes total: 44023414784 bytes progression: 0.52 % busy: 1 ready: 0
drive-scsi1: transferred: 348127232 bytes remaining: 43675287552 bytes total: 44023414784 bytes progression: 0.79 % busy: 1 ready: 0
...
drive-scsi1: transferred: 43708841984 bytes remaining: 314572800 bytes total: 44023414784 bytes progression: 99.29 % busy: 1 ready: 0
drive-scsi1: transferred: 43825233920 bytes remaining: 198180864 bytes total: 44023414784 bytes progression: 99.55 % busy: 1 ready: 0
drive-scsi1: transferred: 43942674432 bytes remaining: 80740352 bytes total: 44023414784 bytes progression: 99.82 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2018-09-04 11:45:35 scsi0: start migration to nbd:10.x.x.12:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 26843545600 bytes total: 26843545600 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 61865984 bytes remaining: 26781679616 bytes total: 26843545600 bytes progression: 0.23 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 166723584 bytes remaining: 26676822016 bytes total: 26843545600 bytes progression: 0.62 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1 
...
drive-scsi0: transferred: 26734493696 bytes remaining: 111017984 bytes total: 26845511680 bytes progression: 99.59 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 26845380608 bytes remaining: 131072 bytes total: 26845511680 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 26845642752 bytes remaining: 0 bytes total: 26845642752 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2018-09-04 11:50:33 starting online/live migration on tcp:10.x.x.12:60000
2018-09-04 11:50:33 migrate_set_speed: 8589934592
2018-09-04 11:50:33 migrate_set_downtime: 0.1
2018-09-04 11:50:33 set migration_caps
2018-09-04 11:50:33 set cachesize: 268435456
2018-09-04 11:50:33 start migrate command to tcp:10.x.x.12:60000
2018-09-04 11:50:34 migration status: active (transferred 96142201, remaining 2062778368), total 2165121024)
2018-09-04 11:50:34 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:35 migration status: active (transferred 166236744, remaining 1989947392), total 2165121024)
2018-09-04 11:50:35 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:36 migration status: active (transferred 249148312, remaining 1900494848), total 2165121024)
2018-09-04 11:50:36 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:37 migration status: active (transferred 342631354, remaining 1802543104), total 2165121024)
2018-09-04 11:50:37 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:38 migration status: active (transferred 423618924, remaining 1701597184), total 2165121024)
2018-09-04 11:50:38 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:39 migration status: active (transferred 514885572, remaining 1596645376), total 2165121024)
2018-09-04 11:50:39 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:40 migration status: active (transferred 609658356, remaining 1491333120), total 2165121024)
2018-09-04 11:50:40 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
...
2018-09-04 11:50:54 migration status: active (transferred 1963964198, remaining 18128896), total 2165121024)
2018-09-04 11:50:54 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:54 migration speed: 2.47 MB/s - downtime 111 ms
2018-09-04 11:50:54 migration status: completed
drive-scsi0: transferred: 26845904896 bytes remaining: 0 bytes total: 26845904896 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
2018-09-04 11:51:03 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-2' failed: exit code 1
2018-09-04 11:51:10 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-1' failed: exit code 1
2018-09-04 11:51:10 ERROR: Failed to completed storage migration
2018-09-04 11:51:10 ERROR: migration finished with problems (duration 00:14:10)
migration problems
root@pve01:~#

Code:

pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-2-pve)
pve-manager: 5.2-7 (running version: 5.2-7/8d88e66a)
pve-kernel-4.15: 5.2-5
pve-kernel-4.15.18-2-pve: 4.15.18-20
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.7-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-1
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-25
pve-docs: 5.2-8
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-32
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Udo

Stoiko Ivanov · Sep 4, 2018

We have an open bug-report https://bugzilla.proxmox.com/show_bug.cgi?id=1852 for the migration with multiple local disks, but weren't able to conclusively reproduce the issue locally.
@udo could you upgrade to the latest packages in pve-no-subscription and try to reproduce the error you're getting.

If it's reproducible we'd be grateful for some pointers to you're particular configuration (e.g. vm-configs, storage configuration, are the VMs idle during the migration)

Thank you!

spirit · Sep 4, 2018

@udo

That's strange, the code is in
/usr/share/perl5/PVE/QemuMigrate.pm,phase3_cleanup()

eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}); };

drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.

then, if error

if (my $err = $@) {
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };

drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job

eval { PVE::QemuMigrate::cleanup_remotedisks($self) };

2018-09-04 11:51:03 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-2' failed: exit code 1
2018-09-04 11:51:10 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-1' failed: exit code 1

I don't understand why it's going in error, if the block job has been succefully completed.

maybe can you try to display error

if (my $err = $@) {
$self->log('err', "$err");
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };

udo · Sep 4, 2018

spirit said:
@udo

That's strange, the code is in
/usr/share/perl5/PVE/QemuMigrate.pm,phase3_cleanup()

eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}); };

drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.

then, if error

if (my $err = $@) {
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };

drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job

eval { PVE::QemuMigrate::cleanup_remotedisks($self) };

2018-09-04 11:51:03 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-2' failed: exit code 1
2018-09-04 11:51:10 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-1' failed: exit code 1

I don't understand why it's going in error, if the block job has been succefully completed.

maybe can you try to display error

if (my $err = $@) {
$self->log('err', "$err");
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };

Hi Spirit,
I've found the reason for the issue: there are old snapshots!

Code:

root@pve01:~# zfs list -t snapshot
NAME                                                     USED  AVAIL  REFER  MOUNTPOINT
rpool/data/vm-210-disk-1@rep_vdb02_2017-11-28_16:15:01  3.62G      -  6.89G  -
rpool/data/vm-210-disk-2@rep_vdb02_2017-11-28_16:15:01  12.5G      -  33.2G  -
rpool/data/vm-210-disk-2@__migration__                     0B      -  33.2G  -

After removing the snapshots, the live migration work!

The migration should skip, if there are snapshots (or handle snapshots).

Udo

spirit · Sep 4, 2018

@udo

interesting.
Thees snapshots seem to be related to zfs replication feature. (do you have enable it on the vm ?)
They should be used only in case of offline vm migrate (without --local-disks).

I'm really not sure that qemu live migration + qemu storage migration is compatible with zfs replication features.

mir · Sep 5, 2018

spirit said:
fstrim use FIMTRIM ioctl, and you need discard support on the device. (I'm not talking about discard option in /etc/fstab).

I was referring to the proxmox checkbox 'discard' to prevent confusion not whether the filesystem supports discard or not. Many users have 'discard' checked while running fstrim on a regular basis which is foolish.

spirit · Sep 5, 2018

mir said:
I was referring to the proxmox checkbox 'discard' to prevent confusion not whether the filesystem supports discard or not. Many users have 'discard' checked while running fstrim on a regular basis which is foolish.

Well, if you have checkbox discard + virtio-scsi, and storage support discard, then fstrim will work. (with or without discard in /etc/fstab in the guest).

discard on the /etc/fstab, is to allow to discard directly when write occur. (and with kernel < 4.9, it was slow because deleting a file was waiting that discard was finished. But since kernel 4.9, it's now async, so pretty fast).

udo · Sep 5, 2018

spirit said:
@udo

interesting.
Thees snapshots seem to be related to zfs replication feature. (do you have enable it on the vm ?)
They should be used only in case of offline vm migrate (without --local-disks).

I'm really not sure that qemu live migration + qemu storage migration is compatible with zfs replication features.

Hi Spirit,
I've used this VM for an zfs replication test a time ago and looks, that i don't cleaned up afterwards...

Udo

spirit · Sep 5, 2018

@udo
thanks, I'll try to reproduce on my side.

Search

Search

Storage Migration

Machi

New Member

LnxBil

Distinguished Member

spirit

Distinguished Member

udo

Distinguished Member

udo

Distinguished Member

spirit

Distinguished Member

spirit

Distinguished Member

mir

Famous Member

spirit

Distinguished Member

udo

Distinguished Member

Stoiko Ivanov

Proxmox Staff Member

spirit

Distinguished Member

udo

Distinguished Member

spirit

Distinguished Member

mir

Famous Member

spirit

Distinguished Member

udo

Distinguished Member

spirit

Distinguished Member