Roadmap to implement QEMU 3.x?

virtRoo

Member
Jan 27, 2019
27
4
23
Hi everyone,

Given QEMU 3.1 has already been released, is there a roadmap for implementing QEMU 3.x in Proxmox?

Cheers.
 
Somewhat, we're waiting for a stable bug fix release to base upon, (a 3.x.1 version) to provide a stable base for our users, this either will be build upon a upstream bug fix release or, sooner or later, a pick from currently known "backport candidates" of fixes from us.

Do you miss any specific stuff from 3.0 or 3.1?
 
Hi Thomas,

Thanks for the update.

I'm under the impression QEMU 3.x now has a better implementation for performing live storage migration. We've been looking forward to seeing a more reliable KVM solution that can do live migration with local storage.
 
While you need to go over the CLI and pass the "--with-local-disks" option to "qm migrate" you can use live stroage already in PVE, and it works pretty well for me, just FYI.
 
While you need to go over the CLI and pass the "--with-local-disks" option to "qm migrate" you can use live stroage already in PVE, and it works pretty well for me, just FYI.
Hi Thomas,
sorry but it's work reliable with one vm-disk only.
With two or more local disks (which are in use) it's fails often - except the issue is fixed in the last time.

Udo
 
Hi Udo,

hmm, currently we queue all disk needing local-storage migration and process them the same way on after the other, so theoretically more disks should not really change anything, as the same code paths get hit.
As I did not remember when exactly I tested a VM live migration with multiple local disks I re-checked it now, a Debian VM with three disks, the root disk and an additional on LVM-Thin and another disk on ceph (rbd) - just to make it a bit more complex, all containing an FS (ext4 and XFS) and some data (all > 512MB).
The migration here worked and I had no noticeable interruption in the VM. But, as you said it may not have worked as good with past versions, and maybe it's a bit storage dependent - but also there one or multiple should not really be different.

Anyway, if you have a specific setup where you can reproduce this most or even all times with the most current Proxmox VE (PVE 5.3 at the time of writing) it would be great to have the config, the backing storage types of both target and source, thanks!
 
Hi Thomas,

Thanks for the update.

Apologies in advance for going tangent, but please bear with me.

We've been testing live migration with local storage on different kinds of KVM platform every now and then in the past 5~6 years, but to honest, we have barely come across any KVM platform that can do this well (not even qemu-kvm-ev) as opposed to VMware or Hyper-V. At one point we even had to put Linux VMs on Hyper-V (I know there are people who don't like to hear this) in order to have a reliable live migration between local storage.

After having performed this kind of migration with VMware and Hyper-V numerous times with no major problems, we're looking forward to seeing KVM/QEMU in general catch up, especially if Proxmox team can help improve this situation.

Live storage migration used to be available in earlier versions of CentOS 6.x but then got deprecated due to some reliability issues and has not been re-added since 7.x I think unless one installs qemu-kvm-ev, which works but still not great. (Barely tested on other distros so can't comment too much on that)

Have a look at this old KB if you happen to have a login: https://access.redhat.com/solutions/60034

Another example is the latest OnApp seems to utilise qemu-kvm-ev to achieve live storage migration, but also have a look at the following guide when it comes to the pre-requisites:

https://docs.onapp.com/ugm/6.0/appliances/virtual-servers/migrate-virtual-server


The general feedback has always been 'yeah … it can work ... but not reliable/robust enough ... not recommended'

People may even say 'why bother, just use shared/distributed storage or do offline migration'. Granted there's shared storage in place, there are still times where one may need to phase out an old shared storage platform, or e.g. geographically re-locate the VMs. Surely there's a lot more use cases for this.

The problems with live storage migration we’ve come across are generally:

  • Surely everything has problems, but we often get very inconsistent bugs/results from different KVM platforms.
  • Sparse virtual disks often don't get migrated properly, including empty blocks are read/transferred, output disk becomes thick provisioned.
  • Relatively more prone to higher I/O load.
  • Live-migrating more than one disk or even one large disk (e.g. 1TB) may not be liable.
  • Live-migrating a whole VM (memory + disk) is less recommended than migrating the disk first then switch the VM to a different host.
  • Sometimes virtual disks can get corrupted or go missing.
  • Sometimes it may be necessary to pre-create a blank virtual disk with the same name on the destination.
I think maybe this is the reason why live migration with local storage can only be done via CLI as per https://bugzilla.proxmox.com/show_bug.cgi?id=1979


In relation to Proxmox, we've also experienced quite a lot of bizarre problems as well, I believe some were probably due to KVM/QEMU, some could be due to Proxmox. For example, while testing 5.3.6 or 5.3.7 few weeks ago, I realised a small running VM with only one blank qcow2 in a test environment with no other workloads would become multiple qcow2/raw disks (the number would vary, sometimes 5, 6, 7) on the destination then went corrupted if '--targetstorage' flag was not specified, even though the destination had the same storage path (e.g. dir /vmstorage).

The test servers had quite beefy hardware:
  • 12 physical cores @ 3.0 Ghz, plenty of RAM
  • 96GB RAM
  • Local storage with 4 x 10k RPM SAS spinning drives in RAID10
  • 1GbE link for live migration

After upgrading Proxmox to 5.3.8, said bug seems to have gone away but we got another bug: https://bugzilla.proxmox.com/show_bug.cgi?id=2070

If you do a bit of search on this forum regarding live migration with local storage, the answers have been again inconsistent, which suggests a reliability problem in QEMU.

We don't take KVM/QEMU/Proxmox for granted given they're open-source, more so we'd like to see if Proxmox team can help improve the implantation as this is indeed a good feature to have.

Cheers.
 
  • Like
Reactions: shantanu
I have migration multiple vms (around 200 vms) with 2disks without any problems, between 2 differents rbd storage, debian guests (wheezy,jessie,stretch).

some bugs have been fixed recently (mainly with zfs, when zfs replication was enabled)
 
In relation to Proxmox, we've also experienced quite a lot of bizarre problems as well, I believe some were probably due to KVM/QEMU, some could be due to Proxmox. For example, while testing 5.3.6 or 5.3.7 few weeks ago, I realised a small running VM with only one blank qcow2 in a test environment with no other workloads would become multiple qcow2/raw disks (the number would vary, sometimes 5, 6, 7) on the destination then went corrupted if '--targetstorage' flag was not specified, even though the destination had the same storage path (e.g. dir /vmstorage).

The test servers had quite beefy hardware:
  • 12 physical cores @ 3.0 Ghz, plenty of RAM
  • 96GB RAM
  • Local storage with 4 x 10k RPM SAS spinning drives in RAID10
  • 1GbE link for live migration


After upgrading Proxmox to 5.3.8, said bug seems to have gone away
yes, this was a regression introduced when trying to fix another bug
https://git.proxmox.com/?p=qemu-server.git;a=commit;h=c7789f54ad61e4e4658377259bfbecde141a6ee4


But the qemu part is pretty stable since some releases.

Well, this is not really a "bug". It's only than we don't have targetformat option when targetstorage is specified. (I have responded in the bugtracker for some possible solutions)
 
Can you add the PCIe speed patch ?

The speed is already "unlimited" (as fast as memory copy/move works), QEMU did not artificially limited anything.
The new patch, which advertises all bus as x32 * 16GT/s is just cosmetic, mostly, but yes some (stupid) drivers actually do non-ideal stuff if they check the link speed and it seems to be to low for them...

Do you have an actual, specific, issue like this, or is this a more general request?
 
It's not really an issue, because with the recommended tuto all works well.

My Windows 10 VM as a dedicated GPU (Radeon PRO WX 3100, PCIe 3.0 8X).

With Q35 and PCIe, LnkSta stuck to 2.5GT/s ( lspci -vv | grep LnkSta) ONLY after shutdown and restart the VM (and GPUZ 8X 1.1)
If I fresh boot the host, LnkSta reports 8GT/s (8X 3.0). If I restart this VM (not shutdown and restart) after a fresh host boot , LnkSta reports 8GT/s too.

I tried to export the BIOS and add this to the config file, but the issue persist.

Finally, with i400FX and PCI, all is working fine. I can shutdown and restart the VM, LnkSta reports 8GT/s and GPUZ 8X 3.0

Honestly, I tried some benchmarks, nothing change with 8X 3.0 or 8X 1.1, but it's a little GPU.
Now, it's OK cosmetically
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!