Storage Migration

Machi

New Member
Aug 2, 2018
14
1
1
50
Hello,

I would like to like does Proxmox support Storage Migration including live Migration? Such as

- Ceph <--> Ceph
- Ceph <--> local LVM
- local LVM <---> Local LVM

Thanks!
 
I haven't checked all migration types you asked for, but in general it works flawlessly and online, but only for KVM-based VMs.

I did an online SAN switch from an old to a new system without any downtime.
 
Hello,

I would like to like does Proxmox support Storage Migration including live Migration? Such as

- Ceph <--> Ceph
- Ceph <--> local LVM
- local LVM <---> Local LVM

Thanks!

vm live migration + storage migration at the same time is available but command line only (and only with local storage as source)

qm migrate <vmid> <target> --with-local-disks --targetstorage yourtargetstorageonremotehost
 
vm live migration + storage migration at the same time is available but command line only (and only with local storage as source)

qm migrate <vmid> <target> --with-local-disks --targetstorage yourtargetstorageonremotehost
Hi,
I assume also with an different target storage, the online migration will work reliable with on disk only?
Or is the "multible disk migration"-bug resolved?

Udo
 
Hi,
but sparse files aren't anymore sparse after migration....

Udo

with livemigration + storage migration, yes. (because of nbd protocol, maybe it will be fixed in qemu 3.0).
for classic storage migration, it's depend of the storage. (ceph for example, isn't sparse after migration).

As workaround, in last proxmox update, they are new guest agent feature (agent: 1,fstrim_cloned_disks=1) , to do an fstrim through qemu agent after the migration. (if you have virtio-scsi + discard).
 
Hi,
I assume also with an different target storage, the online migration will work reliable with on disk only?
Or is the "multible disk migration"-bug resolved?

Udo

It's working with multiple disks, but without iothreads. (I have tested it on more than 100vm migration).
I need to test it again with qemu 3.0 for iothreads. (should be available soon in proxmox)
 
with livemigration + storage migration, yes. (because of nbd protocol, maybe it will be fixed in qemu 3.0).
for classic storage migration, it's depend of the storage. (ceph for example, isn't sparse after migration).

As workaround, in last proxmox update, they are new guest agent feature (agent: 1,fstrim_cloned_disks=1) , to do an fstrim through qemu agent after the migration. (if you have virtio-scsi + discard).
If you run fstrim there is no need for discard. fstrim does not rely on nor does it use discard. discard is only used in combination with a filesystem delete.
 
If you run fstrim there is no need for discard. fstrim does not rely on nor does it use discard. discard is only used in combination with a filesystem delete.

That's not true.

fstrim use FIMTRIM ioctl, and you need discard support on the device. (I'm not talking about discard option in /etc/fstab).

for ext4 for example:

ext4: check if device support discard in FITRIM ioctl
http://patchwork.ozlabs.org/patch/83271/

if your device don't have discard support, fstrim will stop with
FITRIM ioctl failed: Operation not supported
 
It's working with multiple disks, but without iothreads. (I have tested it on more than 100vm migration).
I need to test it again with qemu 3.0 for iothreads. (should be available soon in proxmox)
Hi Spirit,
sorry - don't work.

Just tried an online-migration (1Gb connection), which failed.
Code:
root@pve01:~# cat /etc/pve/qemu-server/210.conf
boot: cd
bootdisk: scsi0
cores: 2
cpu: kvm64,flags=+pcid
hotplug: 1
lock: migrate
memory: 2048
name: vdb02
net0: virtio=6A:C6:68:EA:F8:8F,bridge=vmbr0,tag=2
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-210-disk-1,format=raw,size=25G
scsi1: local-zfs:vm-210-disk-2,format=raw,size=41G
scsihw: virtio-scsi-pci
serial0: socket
sockets: 1
Code:
root@pve01:~# qm migrate 210 pve02 --online --with-local-disks
2018-09-04 11:37:01 starting migration of VM 210 to node 'pve02' (10.x.x.12)
2018-09-04 11:37:01 found local disk 'local-zfs:vm-210-disk-1' (in current VM config)
2018-09-04 11:37:01 found local disk 'local-zfs:vm-210-disk-2' (in current VM config)
2018-09-04 11:37:01 copying disk images
2018-09-04 11:37:01 starting VM 210 on remote node 'pve02'
2018-09-04 11:37:05 start remote tunnel
2018-09-04 11:37:06 ssh tunnel ver 1
2018-09-04 11:37:06 starting storage migration
2018-09-04 11:37:06 scsi1: start migration to nbd:10.x.x.12:60001:exportname=drive-scsi1
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 44023414784 bytes total: 44023414784 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 111149056 bytes remaining: 43912265728 bytes total: 44023414784 bytes progression: 0.25 % busy: 1 ready: 0
drive-scsi1: transferred: 228589568 bytes remaining: 43794825216 bytes total: 44023414784 bytes progression: 0.52 % busy: 1 ready: 0
drive-scsi1: transferred: 348127232 bytes remaining: 43675287552 bytes total: 44023414784 bytes progression: 0.79 % busy: 1 ready: 0
...
drive-scsi1: transferred: 43708841984 bytes remaining: 314572800 bytes total: 44023414784 bytes progression: 99.29 % busy: 1 ready: 0
drive-scsi1: transferred: 43825233920 bytes remaining: 198180864 bytes total: 44023414784 bytes progression: 99.55 % busy: 1 ready: 0
drive-scsi1: transferred: 43942674432 bytes remaining: 80740352 bytes total: 44023414784 bytes progression: 99.82 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2018-09-04 11:45:35 scsi0: start migration to nbd:10.x.x.12:60001:exportname=drive-scsi0
drive mirror is starting for drive-scsi0
drive-scsi0: transferred: 0 bytes remaining: 26843545600 bytes total: 26843545600 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 61865984 bytes remaining: 26781679616 bytes total: 26843545600 bytes progression: 0.23 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 166723584 bytes remaining: 26676822016 bytes total: 26843545600 bytes progression: 0.62 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1 
...
drive-scsi0: transferred: 26734493696 bytes remaining: 111017984 bytes total: 26845511680 bytes progression: 99.59 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 26845380608 bytes remaining: 131072 bytes total: 26845511680 bytes progression: 100.00 % busy: 1 ready: 0
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi0: transferred: 26845642752 bytes remaining: 0 bytes total: 26845642752 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
2018-09-04 11:50:33 starting online/live migration on tcp:10.x.x.12:60000
2018-09-04 11:50:33 migrate_set_speed: 8589934592
2018-09-04 11:50:33 migrate_set_downtime: 0.1
2018-09-04 11:50:33 set migration_caps
2018-09-04 11:50:33 set cachesize: 268435456
2018-09-04 11:50:33 start migrate command to tcp:10.x.x.12:60000
2018-09-04 11:50:34 migration status: active (transferred 96142201, remaining 2062778368), total 2165121024)
2018-09-04 11:50:34 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:35 migration status: active (transferred 166236744, remaining 1989947392), total 2165121024)
2018-09-04 11:50:35 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:36 migration status: active (transferred 249148312, remaining 1900494848), total 2165121024)
2018-09-04 11:50:36 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:37 migration status: active (transferred 342631354, remaining 1802543104), total 2165121024)
2018-09-04 11:50:37 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:38 migration status: active (transferred 423618924, remaining 1701597184), total 2165121024)
2018-09-04 11:50:38 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:39 migration status: active (transferred 514885572, remaining 1596645376), total 2165121024)
2018-09-04 11:50:39 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:40 migration status: active (transferred 609658356, remaining 1491333120), total 2165121024)
2018-09-04 11:50:40 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
...
2018-09-04 11:50:54 migration status: active (transferred 1963964198, remaining 18128896), total 2165121024)
2018-09-04 11:50:54 migration xbzrle cachesize: 268435456 transferred 0 pages 0 cachemiss 0 overflow 0
2018-09-04 11:50:54 migration speed: 2.47 MB/s - downtime 111 ms
2018-09-04 11:50:54 migration status: completed
drive-scsi0: transferred: 26845904896 bytes remaining: 0 bytes total: 26845904896 bytes progression: 100.00 % busy: 0 ready: 1
drive-scsi1: transferred: 44023414784 bytes remaining: 0 bytes total: 44023414784 bytes progression: 100.00 % busy: 0 ready: 1
all mirroring jobs are ready
drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
2018-09-04 11:51:03 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-2' failed: exit code 1
2018-09-04 11:51:10 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-1' failed: exit code 1
2018-09-04 11:51:10 ERROR: Failed to completed storage migration
2018-09-04 11:51:10 ERROR: migration finished with problems (duration 00:14:10)
migration problems
root@pve01:~#
Code:
pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-2-pve)
pve-manager: 5.2-7 (running version: 5.2-7/8d88e66a)
pve-kernel-4.15: 5.2-5
pve-kernel-4.15.18-2-pve: 4.15.18-20
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.7-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-38
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-10
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-1
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-29
pve-container: 2.0-25
pve-docs: 5.2-8
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-32
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
Udo
 
We have an open bug-report https://bugzilla.proxmox.com/show_bug.cgi?id=1852 for the migration with multiple local disks, but weren't able to conclusively reproduce the issue locally.
@udo could you upgrade to the latest packages in pve-no-subscription and try to reproduce the error you're getting.

If it's reproducible we'd be grateful for some pointers to you're particular configuration (e.g. vm-configs, storage configuration, are the VMs idle during the migration)

Thank you!
 
@udo

That's strange, the code is in
/usr/share/perl5/PVE/QemuMigrate.pm,phase3_cleanup()

eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}); };

drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.

then, if error

if (my $err = $@) {
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };


drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job

eval { PVE::QemuMigrate::cleanup_remotedisks($self) };

2018-09-04 11:51:03 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-2' failed: exit code 1
2018-09-04 11:51:10 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-1' failed: exit code 1

I don't understand why it's going in error, if the block job has been succefully completed.



maybe can you try to display error

if (my $err = $@) {
$self->log('err', "$err");
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };
 
  • Like
Reactions: alamadrid
@udo

That's strange, the code is in
/usr/share/perl5/PVE/QemuMigrate.pm,phase3_cleanup()

eval { PVE::QemuServer::qemu_drive_mirror_monitor($vmid, undef, $self->{storage_migration_jobs}); };

drive-scsi0: Completing block job...
drive-scsi0: Completed successfully.
drive-scsi1: Completing block job...
drive-scsi1: Completed successfully.

then, if error

if (my $err = $@) {
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };


drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job
drive-scsi0: Cancelling block job
drive-scsi1: Cancelling block job

eval { PVE::QemuMigrate::cleanup_remotedisks($self) };

2018-09-04 11:51:03 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-2' failed: exit code 1
2018-09-04 11:51:10 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve02' root@10.x.x.12 pvesm free local-zfs:vm-210-disk-1' failed: exit code 1

I don't understand why it's going in error, if the block job has been succefully completed.



maybe can you try to display error

if (my $err = $@) {
$self->log('err', "$err");
eval { PVE::QemuServer::qemu_blockjobs_cancel($vmid, $self->{storage_migration_jobs}) };
Hi Spirit,
I've found the reason for the issue: there are old snapshots!
Code:
root@pve01:~# zfs list -t snapshot
NAME                                                     USED  AVAIL  REFER  MOUNTPOINT
rpool/data/vm-210-disk-1@rep_vdb02_2017-11-28_16:15:01  3.62G      -  6.89G  -
rpool/data/vm-210-disk-2@rep_vdb02_2017-11-28_16:15:01  12.5G      -  33.2G  -
rpool/data/vm-210-disk-2@__migration__                     0B      -  33.2G  -
After removing the snapshots, the live migration work!

The migration should skip, if there are snapshots (or handle snapshots).

Udo
 
  • Like
Reactions: alamadrid
@udo

interesting.
Thees snapshots seem to be related to zfs replication feature. (do you have enable it on the vm ?)
They should be used only in case of offline vm migrate (without --local-disks).

I'm really not sure that qemu live migration + qemu storage migration is compatible with zfs replication features.
 
fstrim use FIMTRIM ioctl, and you need discard support on the device. (I'm not talking about discard option in /etc/fstab).
I was referring to the proxmox checkbox 'discard' to prevent confusion not whether the filesystem supports discard or not. Many users have 'discard' checked while running fstrim on a regular basis which is foolish.
 
  • Like
Reactions: alamadrid
I was referring to the proxmox checkbox 'discard' to prevent confusion not whether the filesystem supports discard or not. Many users have 'discard' checked while running fstrim on a regular basis which is foolish.

Well, if you have checkbox discard + virtio-scsi, and storage support discard, then fstrim will work. (with or without discard in /etc/fstab in the guest).

discard on the /etc/fstab, is to allow to discard directly when write occur. (and with kernel < 4.9, it was slow because deleting a file was waiting that discard was finished. But since kernel 4.9, it's now async, so pretty fast).
 
  • Like
Reactions: alamadrid
@udo

interesting.
Thees snapshots seem to be related to zfs replication feature. (do you have enable it on the vm ?)
They should be used only in case of offline vm migrate (without --local-disks).

I'm really not sure that qemu live migration + qemu storage migration is compatible with zfs replication features.
Hi Spirit,
I've used this VM for an zfs replication test a time ago and looks, that i don't cleaned up afterwards...

Udo
 
  • Like
Reactions: alamadrid

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!