Live migration from low CPU count host to high CPU count host

devedse

New Member
Aug 20, 2023
25
10
3
I currently have 2 Proxmox hosts:
1. Has 4 cores
2. Has 40 cores

What I would like to do is be able to live migrate a VM in between these hosts while still being able to use all CPU cores. For example, when the sun is down I'd run the VM on a low power host. When the sun comes up, I'd use solar power to live migrate it to a 40 core beast of a server and spin it up into full power.

Currently proxmox has a limitation where it's not possible to assign more CPU cores then the host has available.
Code:
TASK ERROR: MAX 4 vcpus allowed per VM on this node

Would it be possible to allow users to spin up a VM with more cores assigned to it then the host has?
 
Last edited:
So while it is on the low power node, what CPU would the 5th+ CPU be scheduled on? You can use CPU hotplugging, but that only goes up, you can’t unschedule threads on a CPU. You can run 10x4CPU VMs or my suggestion would be to scale up/down with K8s, Swarm or Slurm, leaving the 4 cores for the management plane.
 
Hi guruevi, the problem of where a virtual CPU is scheduled is something that the hypervisor has already figured out. Furthermore k8s can't do live migrations for a running node either (unless you start hosting VM's in k8s, but yeah that's where we have proxmox for).

Anyways, I found out there's actually a workaround

In the file
Code:
/usr/share/perl5/PVE/QemuServer.pm
, comment out the following lines:

Code:
    # my $allowed_vcpus = $cpuinfo->{cpus};

    # die "MAX $allowed_vcpus vcpus allowed per VM on this node\n" if ($allowed_vcpus < $maxcpus);

KVM already warns you anyway, so my request would be to remove this check in Proxmox altogether. I appreciate a warning when something isn't a good idea, but I feel like it should be up to the user to make the final decision here.
 
Last edited:
Anyways, I found out there's actually a workaround
Interesting!

Did you look into a then started VM? How many CPUs does the guest see? More than the small node has???

the problem of where a virtual CPU is scheduled is something that the hypervisor has already figured out.
You can possibly specify any number of CPUs for QEMU - the software emulator.

For KVM you need to have real registers inside of the real CPU to be available (Hyperthreading is okay). All of the configured cores need to be available in the very same moment for the KVM process to be scheduled and executed.

At least this was true several years ago...
 
After changing that script and rebooting the node I could start a VM with 40 CPU's on the small Proxmox host.

The VM on the small Proxmox host sees all of the CPU's:
1740476315573.png

Regarding KVM / QEMU I found this on the Proxmox docs:
You may sometimes encounter the term KVM (Kernel-based Virtual Machine). It means that QEMU is running with the support of the virtualization processor extensions, via the Linux KVM module. In the context of Proxmox VE QEMU and KVM can be used interchangeably, as QEMU in Proxmox VE will always try to load the KVM module.
https://pve.proxmox.com/pve-docs/ch...ntext of Proxmox,access block and PCI devices.
 
Last edited:
  • Like
Reactions: UdoB
The small node sees all of the CPU's:

I had to test this myself as it was against my obviously outdated knowledge. It actually works - not only with some emulated guest-CPU but also with type "Host".

Now I was able to run a local LLM (w/o GPU) on 60(!) CPU cores on a machine with "only" 12C/24Threads (an AMD Threadripper). All these 60 virtual cores were nominally used 15% inside the guest when the LLM did its job as they are obviously still limited to the physical compute power. (From the outside CPU usage was 50% in this case - I don't know why and it doesn't matter for this pure "does-it-work?" test.)

Whatsoever, I now have learned that a single VM may successfully use more vCPUS than the host has!

:)

Just to be clear: use cases are limited as there is - of course - no more compute power than before. But for @devedse it solved a problem.
 
Last edited:
Yeah I just wish that this change would be implemented in the Proxmox code base. I hope one of the developers can respond :).
 
Very interesting thread - I learned something. But; I wonder what happens when the VM is now live migrated, and specifically when it is actually running at full throttle on the 40 CPU beast & then gets live migrated back to the 4 CPU mouse. I'm guessing you have already tested this.
 
The migration works perfectly fine in both directions. Resulting in either a speed up or a slow down.

From what I read in this thread: https://forum.proxmox.com/threads/override-max-vcpu-allowed-per-vm.39363/ is that there's some overhead. I did a quick test with some FFMPEG video encoding and I went from 1.6 fps with 4 CPU's to 1.2 FPS with 40 virtual CPU's on the 4 CPU host. So in my case about a 25% overhead.

However when I migrate the machine to the 40 core server it immediately starts producing around 12-15 fps which is perfect for my usecase.
 
Last edited:
Yeah I just wish that this change would be implemented in the Proxmox code base. I hope one of the developers can respond :).

The recommended place for a feature request is at https://bugzilla.proxmox.com/ - and point here in that new "bug" and later there from here :)
 
The problem is indeed the overhead, basically every time the VM switches its vCPU, the host-CPU needs to be flushed and loaded with the other thread.

I personally don't see any benefit to doing this, it will perform poorly and probably has some edge cases where it doesn't work (you'd have to test how this works with eg. PCIe passthrough or vGPU) and may even cause some unexpected behavior as you need to map into memory areas of the 'real' underlying hardware.

There are better ways of scaling workloads as I said before, that's what Docker Swarm, Kubernetes, Slurm and similar systems are intended to do.

@devedse: 1.2fps on 4 cores seems really low for an ffmpeg pipeline, even 12fps on 40 cores seems low. Are you running this on ARM?
 
The problem is indeed the overhead, basically every time the VM switches its vCPU, the host-CPU needs to be flushed and loaded with the other thread.

I personally don't see any benefit to doing this, it will perform poorly and probably has some edge cases where it doesn't work (you'd have to test how this works with eg. PCIe passthrough or vGPU) and may even cause some unexpected behavior as you need to map into memory areas of the 'real' underlying hardware.

There are better ways of scaling workloads as I said before, that's what Docker Swarm, Kubernetes, Slurm and similar systems are intended to do.
Hi guruevi, if there's no benefit for you, that's perfectly fine. However keep in mind that users with different workloads (like me) do in fact benefit by having this solved.

Furthermore live migrations aren't possible with containers running on for example Kubernetes.
@devedse: 1.2fps on 4 cores seems really low for an ffmpeg pipeline, even 12fps on 40 cores seems low. Are you running this on ARM?
Let's keep the discussion on topic, I'm not using ARM and do know what I'm doing with ffmpeg :)