Override max vCPU allowed per VM

jknight

New Member
Dec 21, 2017
4
1
1
USA
My current hardware has 2x 16 core CPUs with hyperthreading enabled, giving me a max of 64 vCPU. However for a specific testing scenario I'm performing, I'm trying to override this and give a VM more than 64 vCPUs. When starting, it stops with this error

TASK ERROR: MAX 64 vcpus allowed per VM on this node

Is there a config file or cli command I can use to override this max value?

Thanks
 
You only have 64 cores, so it would make no sense to assign more, because it would result in a performance drop.
 
I deleted my previous post but I'lll recap it here for anyone looking for the solution:

You can edit ./usr/share/perl5/PVE/QemuServer.pm and apply this diff to bypass the limit

Code:
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 02a689c..5fbba08 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2961,11 +2961,6 @@ sub config_to_command {

     my $vcpus = $conf->{vcpus} ? $conf->{vcpus} : $maxcpus;

-    my $allowed_vcpus = $cpuinfo->{cpus};
-
-    die "MAX $allowed_vcpus vcpus allowed per VM on this node\n"
-       if ($allowed_vcpus < $maxcpus);
-
     push @$cmd, '-smp', "$vcpus,sockets=$sockets,cores=$cores,maxcpus=$maxcpus";

     push @$cmd, '-nodefaults';

My concerns with this code are that
  • This check is done per VM only. So if you have 64 cores on your host, it prevents you from starting a VM with 128 cores. But it wouldn't stop you from starting 2 VMs with 64 cores each. These would have similar performance issues, yet only the former case is prevented. From the hypervisor standpoint, a vCPU is simply a QEMU process. It shouldn't matter if these processes belong to the same VM or different VMs.

  • The sockets * cores math doesn't take into account that the cores are virtual (hyperthreads.) From a performance standpoint, this can be important. Proxmox also doesn't let you configure hyperthreads in your guest.

  • Not flexible. There is no config value or UI checkbox to disable this check. There are specific use cases where oversubscription of the vCPUs to physical CPUs is desired, and proxmox doesn't allow you to do this without patching their code and rebooting. VMware at least gives you a cli tool to change it, and libvirt/Ubuntu don't impose any limit.

All your assumption are wrong.
If you could provide any additional insight, that'd be appreciated. Providing as much flexibility as possible in CPU/hardware topologies is currently the main difference between me installing libvirt/Ubuntu, and proxmox. I'm not making assumptions about this, I'm literally reading the code.

Going back onto the mailing list and git history, there's no explanation of why this check was added or what problem it was trying to solve.
 
This check is done per VM only. So if you have 64 cores on your host, it prevents you from starting a VM with 128 cores. But it wouldn't stop you from starting 2 VMs with 64 cores each. These would have similar performance issues, yet only the former case is prevented. From the hypervisor standpoint, a vCPU is simply a QEMU process. It shouldn't matter if these processes belong to the same VM or different VMs.

This is not the same. In the end, one VM with 128 cores will be much slower than 2VM with 64 cores, because the kernel inside the VM has totally wrong assumptions about physically available cores.

The sockets * cores math doesn't take into account that the cores are virtual (hyperthreads.) From a performance standpoint, this can be important. Proxmox also doesn't let you configure hyperthreads in your guest.

I am unable to see that this is important. Please provide benchmarks to support that claim.

Not flexible. There is no config value or UI checkbox to disable this check. There are specific use cases where oversubscription of the vCPUs to physical CPUs is desired, and proxmox doesn't allow you to do this without patching their code and rebooting. VMware at least gives you a cli tool to change it, and libvirt/Ubuntu don't impose any limit.

This protects users from doing bad things. I don't really want to remove that protection.
 
This is not the same. In the end, one VM with 128 cores will be much slower than 2VM with 64 cores, because the kernel inside the VM has totally wrong assumptions about physically available cores.



I am unable to see that this is important. Please provide benchmarks to support that claim.



This protects users from doing bad things. I don't really want to remove that protection.


While it may protect users from doing bad things performance-wise it is a pain for power users operating different cpu configurations in a cluster.
We have 2 different types of machines in our cluster (2sockets,8threads and 2 sockets 16 threads).
A vm that resides on the 16 thread host can have 16vcpus assigned. If i migrate that to the 8 thread host i can't start it because proxmox can only assign 8vcpus but the configuration states 16. I have not tested that though.
That could be a serious issue combined with HA VM's.

Regards,
Daniel
 
  • Like
Reactions: su-ex and jknight
I confirm - this is serious issue. My configuration have 16 of 40 available cores and it have serious performance downgrade because migrated Virtual Machine cant start on another node with only 16 cpu available. This is big problem and i think this value need to dinamicly change to allowed number of cpu on current node, or just unlock option to run more then current number cpus installed
 
  • Like
Reactions: su-ex and jknight
This is not the same. In the end, one VM with 128 cores will be much slower than 2VM with 64 cores, because the kernel inside the VM has totally wrong assumptions about physically available cores.

This really depends on application. Like I said, from the host perspective, each vCPU is just another process being scheduled by the kernel. If these threads are relatively idle, there is no performance impact. Now if these vCPUs are busy, then absolutely it could have performance issues. There is no magic to this...

I am unable to see that this is important. Please provide benchmarks to support that claim.

I'm afraid I can't provide general benchmarking numbers. However I don't think benchmarks are required to see the issue. There are some applications which are sensitive to running on the same hyperthread. For example the VPP fd.io project recommends not running on hyperthreads. The physical context switching the CPU must perform in order to execute both hyperthreads "at the same time" can cause performance drops https://wiki.fd.io/view/VPP/How_To_Optimize_Performance_(System_Tuning)#Hyperthreading for example when you have two processes both hard-polling (looping) to pull data off of a queue. This gets more complicated when you have multiple VMs and you want to ensure two VMs aren't running on the same physical CPU.

I'm not specifically using VPP, but I am using an application that is similarly impacted by this type of configuration. The issue is that two vCPUs of the VM can be scheduled/assigned to the same physical core by proxmox, and this causes performance drops.

This protects users from doing bad things. I don't really want to remove that protection.
I can appreciate the effort made, however I think it should be an option to disable the check. Nobody is suggesting to completely kill the protection, but instead make it configurable for more advanced use-cases. As pointed out by other users in this thread, there are perfectly valid reasons to not have this check.
 
  • Like
Reactions: su-ex
We really need this check disabled. We need to move VMs with 32 vCPUs to hosta with 4 physical cores hosts during the night time when they are not used and back to 32 cores hosts at 8AM before starting the workshift of users. I am a proxmox fan but this limitation prevents our startup from offering our product (VDI related) to our customers. Would this work:

You can edit ./usr/share/perl5/PVE/QemuServer.pm and apply this diff to bypass the limit

Code:
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 02a689c..5fbba08 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2961,11 +2961,6 @@ sub config_to_command {

my $vcpus = $conf->{vcpus} ? $conf->{vcpus} : $maxcpus;

- my $allowed_vcpus = $cpuinfo->{cpus};
-
- die "MAX $allowed_vcpus vcpus allowed per VM on this node\n"
- if ($allowed_vcpus < $maxcpus);
-
push @$cmd, '-smp', "$vcpus,sockets=$sockets,cores=$cores,maxcpus=$maxcpus";

push @$cmd, '-nodefaults';

?

Hi Dietmar. I am a Proxmox fan but this limitation is a show stopper for us....could it be fixed so we can override it?

Thanks and all the best
 
Simply adjust the resources according to the current server and run backup and then resize when moving to another server according to available resources
 
Simply adjust the resources according to the current server and run backup and then resize when moving to another server according to available resources
That would cause downtime for HA setup.....
Bypassing the check is better.
 
  • Like
Reactions: su-ex
This is indeed a real issue for VMs running with high availability on very heterogeneous hardware:
I want to utilize my strongest (the default) node to the fullest, while the other nodes are just for fallback anyway in my case.
If a VM has to run on them, it already means a significant performance drop, thus I don't care at all about suboptimal configurations when it comes to vcpus.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!