[SOLVED] How important is having all Intel/all AMD CPUs in a cluster?

michaeljd

New Member
Aug 30, 2023
18
0
1
I'm looking to setup a PVE cluster, mostly for the benefit of being able to move VMs from one machine to the other and for high availability. In reading the PvE8 admin guide I came across this line:

Online migration of virtual machines is only supported when nodes have CPUs from the same vendor. It might work otherwise, but this is never guaranteed.

This is a real bummer since I have a mixed Epyc and embedded Xeon CPU environment. I have some dual-EPYC systems as well as some embedded Xeon mini-ITX solutions and I was planning on clustering them all together. It's all modern hardware but obviously not all AMD or all Intel.

I can understand if I have a VM with a bunch of passed-through hardware, that might not work bouncing from an AMD to an Intel-based server. But what about basic Windows and Linux VMs doing things like serving as desktop machines, pi-hole servers, open media vault server and so on?

Due to the risk of downtime and work involved restoring backups, I'd rather not just try it and find out before doing some research. How important is it to not mix CPU vendors in a cluster? Thanks in advance.
 
Last edited:
But what about basic Windows and Linux VMs doing things like serving as desktop machines, pi-hole servers, open media vault server and so on?
It really depends on the chosen CPU models and even if it would work now for you perfectly, this can change in the future.
For example if yet another CPU microarchitecture (security) flaw would be discovered that affects only one vendor (or some of its models), and needs special handling or disabling some features, which could then break live-migration from another's vendor's CPU that previously worked. Also feature sets can be widely different, e.g., SSE 4.1 was already supported by Intel Penryn (2007) as it was spearheaded by intel, but AMD only added it to their Bulldozer architecture in 2011 (using that example because I researched SSE support for a recent bug, but there are certainly others).

Due to such things, and constantly evolving software on all fronts (host kernel, guest kernel, host software, guest software, CPU microcode, firmware, ...) it's already quite a bit of effort to keep live-migration working with the exact same CPUs in a forward compatible manner, but as the HW is the same it's possible there as we can work around most things in software.

If you got a few AMD and a few Intel CPUs you could still cluster all for the convenience of having a central management UI, but restrict live-migrations (e.g., for node maintenance reboots) only to compatible nodes.
How important is it to not mix CPU vendors in a cluster? Thanks in advance.

Now, as said, depending on your actual CPUs models used, VM configuration and also guest OS, it might work, maybe even quite well, but that's not something we can guarantee at all, especially not for the future.

In the end we always recommend using a homogeneous cluster, where all nodes use CPUs from the same vendor, ideally even the exact same models, as that can save you a lot of pain and headache.
 
In addition to Thomas' answer:

I can understand if I have a VM with a bunch of passed-through hardware, that might not work bouncing from an AMD to an Intel-based server.
passed-through hardware is NEVER able to live-migrate and you should therefore not do such a thing in a ha setup. This is not going to work in ANY hypervisor. Normally you want to virtualize as much as possible in order to be live-migrateable.

We had a similar system to your's running for years: we defined three ha groups for machines of the same setup (identical machines!): one for the HA firewall (alix machines) and one for the "main" cluster and one single AMD machine with 4 sockets and a TON of RAM for your container environment. VMs where then assigned to those HA-groups. This solved the "different architecture" problem for us, while of course doing manual live-migratation with caution in those HA groups in order to be live-migratable. We also split the shared storage so that machines from different HA-groups could not migrate to nodes that were not in the same HA-group, while the configured storage beeing on the same physical backend storage. That solved the "accidential migration" on the wrong platform.
 
Thank you very much, t.lamprecht and LnxBil. Your responses were very educational for me! I have a lot of planning to do and probably purchasing some new hardware, before I can cluster and not have to worry too much about it.

 
This is a real bummer since I have a mixed Epyc and embedded Xeon CPU environment. I have some dual-EPYC systems as well as some embedded Xeon mini-ITX solutions and I was planning on clustering them all together. It's all modern hardware but obviously not all AMD or all Intel.
Even if your application/os's were not a good fit to run in generic cpu mode (eg, kvm64 which would allow cross migration) its still possible to run a functional heterogeneous hardware cluster. simply put your VMs in separate "hardware" HA groups which will ensure you only migrate/failover to matching ISA hosts.
 
Thank you, alexskysilk. Really appreciate the education!

I think it will be easier (but not cheaper) for me to just build one more Epyc server and take the embedded Xeons out of the equation. I can virtualize their workloads. I'll just be left with one embedded Xeon at that point, my TrueNAS server, which is running bare metal and is my NAS and provides shared storage for VMs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!