Hi Thomas,
Thanks for the update.
Apologies in advance for going tangent, but please bear with me.
We've been testing live migration with local storage on different kinds of KVM platform every now and then in the past 5~6 years, but to honest, we have barely come across any KVM platform that can do this well (not even qemu-kvm-ev) as opposed to VMware or Hyper-V. At one point we even had to put Linux VMs on Hyper-V (I know there are people who don't like to hear this) in order to have a reliable live migration between local storage.
After having performed this kind of migration with VMware and Hyper-V numerous times with no major problems, we're looking forward to seeing KVM/QEMU in general catch up, especially if Proxmox team can help improve this situation.
Live storage migration used to be available in earlier versions of CentOS 6.x but then got deprecated due to some reliability issues and has not been re-added since 7.x I think unless one installs qemu-kvm-ev, which works but still not great. (Barely tested on other distros so can't comment too much on that)
Have a look at this old KB if you happen to have a login:
https://access.redhat.com/solutions/60034
Another example is the latest OnApp seems to utilise qemu-kvm-ev to achieve live storage migration, but also have a look at the following guide when it comes to the pre-requisites:
https://docs.onapp.com/ugm/6.0/appliances/virtual-servers/migrate-virtual-server
The general feedback has always been 'yeah … it can work ... but not reliable/robust enough ... not recommended'
People may even say 'why bother, just use shared/distributed storage or do offline migration'. Granted there's shared storage in place, there are still times where one may need to phase out an old shared storage platform, or e.g. geographically re-locate the VMs. Surely there's a lot more use cases for this.
The problems with live storage migration we’ve come across are generally:
- Surely everything has problems, but we often get very inconsistent bugs/results from different KVM platforms.
- Sparse virtual disks often don't get migrated properly, including empty blocks are read/transferred, output disk becomes thick provisioned.
- Relatively more prone to higher I/O load.
- Live-migrating more than one disk or even one large disk (e.g. 1TB) may not be liable.
- Live-migrating a whole VM (memory + disk) is less recommended than migrating the disk first then switch the VM to a different host.
- Sometimes virtual disks can get corrupted or go missing.
- Sometimes it may be necessary to pre-create a blank virtual disk with the same name on the destination.
I think maybe this is the reason why live migration with local storage can only be done via CLI as per
https://bugzilla.proxmox.com/show_bug.cgi?id=1979
In relation to Proxmox, we've also experienced quite a lot of bizarre problems as well, I believe some were probably due to KVM/QEMU, some could be due to Proxmox. For example, while testing 5.3.6 or 5.3.7 few weeks ago, I realised a small running VM with only one blank qcow2 in a test environment with no other workloads would become multiple qcow2/raw disks (the number would vary, sometimes 5, 6, 7) on the destination then went corrupted if '--targetstorage' flag was not specified, even though the destination had the same storage path (e.g. dir /vmstorage).
The test servers had quite beefy hardware:
- 12 physical cores @ 3.0 Ghz, plenty of RAM
- 96GB RAM
- Local storage with 4 x 10k RPM SAS spinning drives in RAID10
- 1GbE link for live migration
After upgrading Proxmox to 5.3.8, said bug seems to have gone away but we got another bug:
https://bugzilla.proxmox.com/show_bug.cgi?id=2070
If you do a bit of search on this forum regarding live migration with local storage, the answers have been again inconsistent, which suggests a reliability problem in QEMU.
We don't take KVM/QEMU/Proxmox for granted given they're open-source, more so we'd like to see if Proxmox team can help improve the implantation as this is indeed a good feature to have.
Cheers.