Can't use EPYC-ROME CPU after update

jaytee129

Member
Jun 16, 2022
132
10
18
I updated proxmox to latest version today and after the system rebooted none of my VMs would start. They all reported:

kvm: warning: host doesn't support requested feature: CPUID.0DH:EAX.xsaves [bit 3]
kvm: Host doesn't support requested features
TASK ERROR: start failed: QEMU exited with code 1
Found a post that said it's because a CPU flag was not supported. I have EPYC-ROME CPU so was using that and it worked fine. What happened?

I fixed it by changing CPU to "host" but would like to know why EPYC-ROME doesn't work anymore. I didn't change the CPU.

Any info would be appreciated
 
I had the same issue yesterday.
We are using a 3 Server Cluster with AMD EPYC 7282 CPU each.
I did an update from PVE 7.3-? to 7.4-3 and 2 VMs which had EPYC-ROME configured as CPU Type did migrate away from the first host rebooting after performing the Update. Those two machines where unable to migrate anywhere after the first migration.
Changing the CPU_Type while the VM was running was not possible, it resulted in "feature affinity not supported" or something similar (didn´t make a Screenshot and today i´m unable to reproduce).
I had to shut them down, change CPU Type and reboot them.
 
Hi,
maybe because of the following: https://lore.kernel.org/all/20230307174643.1240184-1-andrew.cooper3@citrix.com/
Because of hardware errata the XSAVES feature needed to be disabled for certain models.
Thanks. Is it something that will be like this from now on, or is it something that will be fixed?

Also,
1. what am I missing by using "host" or "EPYC" (which works) instead of "EPYC-ROME"?
2. do VMs usually adjust fine to this kind of change? The CPU was EPYC-ROME for the VM creation and OS installation (Windows, Ubuntu)
3. is this a sign that "host" is safer to use. (I know that kvm is safest option but doesn't support pcie passthrough, which I need)

Aside from knowing for future reference, I'm having some other unusual behavior since I upgraded and had to change CPU, including sluggishness. Trying to figure out if it's related to the CPU selection, the fact that it changed for the VM, or something else.

Thanks again.
 
Thank you @Neti. From your example and from reading the links you provided, it looks to me like this customization adds or removes features of whatever the "phys-bits" CPU is, so one would still get all the other flags/features of that CPU, i.e. I'm appending, not replacing.

So in my case I would want to use "EPYC-ROME" for phys-bits instead of "host" if I wanted all the other built-in flags/features of the EPYC-ROME to still exist.

Am I understanding this right?
 
Thank you @Neti. From your example and from reading the links you provided, it looks to me like this customization adds or removes features of whatever the "phys-bits" CPU is, so one would still get all the other flags/features of that CPU, i.e. I'm appending, not replacing.

So in my case I would want to use "EPYC-ROME" for phys-bits instead of "host" if I wanted all the other built-in flags/features of the EPYC-ROME to still exist.

Am I understanding this right?
ok, sorry. I misread. It's not the "phys-bits" that matter, it's the "Reported-model". I'd still like to confirm that the flags are being added or removed from the other existing flags, not replacing them all.

Also, I'm still not clear on what I lose from using "host" or "EPYC" instead of "EPYC-ROME", as well as the relative risk of choosing a specific CPU rather than simply "host". If this kind of problem, which does prevent VMs from even starting, only happens to specific CPUs once in a decade I won't be too worried about it, but if it can happen even once a year I might want to use "host" (depends on the advantages of using my specific CPU)
 
Hi,
Thanks. Is it something that will be like this from now on, or is it something that will be fixed?
it's a hardware erratum. AFAIK, disabling the feature in the kernel is the only thing that can be done to avoid running into the issue. So that is the fix.

Also,
1. what am I missing by using "host" or "EPYC" (which works) instead of "EPYC-ROME"?
With host, QEMU will try to use all features your CPU supports. Not entirely sure what feature differences are between EPYC and EPYC-ROME, but you might miss some of those. I'd guess it may lead to slight performance degradation, but shouldn't otherwise be a big deal.

2. do VMs usually adjust fine to this kind of change? The CPU was EPYC-ROME for the VM creation and OS installation (Windows, Ubuntu)
In most cases, yes. Although Windows is known to be finicky when it comes to changes to virtual hardware.

3. is this a sign that "host" is safer to use. (I know that kvm is safest option but doesn't support pcie passthrough, which I need)
host is the preferred option if you have the same CPU models in your cluster and don't need to lie to the VM about it. With different CPU models, you need to be a bit careful with live migration when using host. And between CPUs of different vendors, live migration is not generally supported.

Aside from knowing for future reference, I'm having some other unusual behavior since I upgraded and had to change CPU, including sluggishness. Trying to figure out if it's related to the CPU selection, the fact that it changed for the VM, or something else.
Even when using host CPU type?
 
Last edited:
thank you @fiona. Regarding sluggishness and other strange behavior (Chrome crashing repeatedly with no other explanation) soon after changing to host I ended up changing CPU to just "EPYC" and the problems went away.

Today I implemented the fix that @Neti proposed and my VMs restarted okay and seem to be fine. Will keep an eye out and report back if that seems to cause any problems but guessing this is the best way to ensure max CPU support and performance.

thanks again
 
I have a resolution for this and it doesn't require changing CPU types

see thread issues with CPU types when migrating to proxmox 8.1
Thanks for pointing to your recent post on this topic here, @damo2929 .

Have you tested to confirm that the fix for 8.1 will also work with 7.x, with, of course, use of the correct repositories for 7.x?

Also, I thought I would read up on XSAVES a bit - which you seem to know something about - and the following article says: the discussed instructions are necessary to implement [I]context switching[/I] — the mechanism used by the Operating System to run multiple threads and processes quasi-simultaneously on the same processor. In order to perform that, the kernel needs to be able to save the values of all registers used by the program, and restore them afterwards.

Proper context switching is key to effective resource sharing. As there are times when I notice some of my VM's seem to stutter, could the missing XSAVES be the reason/cause of that?

I'm trying to figure out the value of fixing this vs. risk of messing with it given things generally work...

Thanks again
 
Thanks for pointing to your recent post on this topic here, @damo2929 .

Have you tested to confirm that the fix for 8.1 will also work with 7.x, with, of course, use of the correct repositories for 7.x?

Also, I thought I would read up on XSAVES a bit - which you seem to know something about - and the following article says: the discussed instructions are necessary to implement [I]context switching[/I] — the mechanism used by the Operating System to run multiple threads and processes quasi-simultaneously on the same processor. In order to perform that, the kernel needs to be able to save the values of all registers used by the program, and restore them afterwards.

Proper context switching is key to effective resource sharing. As there are times when I notice some of my VM's seem to stutter, could the missing XSAVES be the reason/cause of that?

I'm trying to figure out the value of fixing this vs. risk of messing with it given things generally work...

Thanks again
for proxmox 7.4 the repo lines are different else is the same buy yes having context switching working properly speeds things up,

set to
# security updates
deb http://security.debian.org bullseye-security main contrib non-free
 
Last edited:
for proxmox 7.4 the repo lines are different else is the same buy yes having context switching working properly speeds things up,

set to
# security updates
deb http://security.debian.org bullseye-security main contrib non-free

What I'd like to know is if the change has been tested and confirmed to work for 7.x (with the correct repos, of course)

Also as I already have "deb http://security.debian.org bullseye-security main contrib non-free" as my repo for security, are you saying I would now just need to do?

Code:
apt-get install amd64-microcode

Finally, I don't see the "non-free-firmware" referred to in the fix for 8.1 - does that mean it's not needed for 7.x?
 
What I'd like to know is if the change has been tested and confirmed to work for 7.x (with the correct repos, of course)

Also as I already have "deb http://security.debian.org bullseye-security main contrib non-free" as my repo for security, are you saying I would now just need to do?

Code:
apt-get install amd64-microcode

Finally, I don't see the "non-free-firmware" referred to in the fix for 8.1 - does that mean it's not needed for 7.x?
non-free-firmware is new to debian 12 and what I was missing to get it working on 8.1 on 7 amd64-microcode will be found with just non-free as we had been using that fine since the zenbleed patching without issue on 7.4
 
Last edited:
So bottom line, here are the steps for those using 7.4 with EPYC-Rome CPU and pointing to the appropriate Security repo for it including "non-free" (as shown above):

1. run "apt install amd64-microcode" via shell
2. shutdown VMs using CPU "EPYC-Rome-fixed"
3. change CPU for these VMs back to "host" or "EPYC-Rome" (or whatever you used before you had to put in the hacked version)
4. Reboot Proxmox

If I got anything wrong, let me know.

Thanks again for providing this info.
 
I made the change but selecting "EPYC-Rome" CPU, which worked before the need to use the hacked version, did NOT work. Got the following message for all affected VMs that auto start after reboot:

kvm: warning: host doesn't support requested feature: CPUID.0DH:EAX.xsaves [bit 3]
kvm: Host doesn't support requested features
TASK ERROR: start failed: QEMU exited with code 1

Changed to "host" instead and VMs were able to start but:

1) would like to understand why the option "EPYC-Rome" doesn't work. Although I can't remember the reason, I think I went for "EPYC-Rome" CPU for a reason so would like to go back there if it's possible.

2) I did see the following dmesg reports during the first VM restart: (but only the first)

[ 260.887085] SVM: kvm [24626]: vcpu0, guest rIP: 0xfffff81384c2ce09 unimplemented wrmsr: 0xc0010115 data 0x0
[ 262.163306] SVM: kvm [24626]: vcpu1, guest rIP: 0xfffff81384c2ce09 unimplemented wrmsr: 0xc0010115 data 0x0
[ 262.302807] SVM: kvm [24626]: vcpu2, guest rIP: 0xfffff81384c2ce09 unimplemented wrmsr: 0xc0010115 data 0x0
[ 262.442790] SVM: kvm [24626]: vcpu3, guest rIP: 0xfffff81384c2ce09 unimplemented wrmsr: 0xc0010115 data 0x0
[ 262.570498] SVM: kvm [24626]: vcpu4, guest rIP: 0xfffff81384c2ce09 unimplemented wrmsr: 0xc0010115 data 0x0
[ 262.695551] SVM: kvm [24626]: vcpu5, guest rIP: 0xfffff81384c2ce09 unimplemented wrmsr: 0xc0010115 data 0x0
 
So using "host" was a bit of a bust. I mean my VMs booted up but Windows VMs stuttered and things crashed (browsers, outlook) inexplicably. Maybe coincidence. I didn't run this way for long and can't tell for sure if non Windows VMs were struggling too but I went from "host" to "EPYC" CPU and things are definitely much better - at least so far.

Don't know if/how much this has to do with the CPU that was selected during VM creation (and I don't remember which one I had selected) but it's strange to me that "host" doesn't work properly. Maybe that's why the EPYC options were invented?

Any insights would be appreciated.
 
@jaytee129

can you run lscpu and ensure that xsave is enabled for EPYC.
also compare the features with and without migitations being enabled and see if you also losing other features.

the host should only pass through to windows whats actually present on the host.

is your bios up-to-date and are all the virt features fully enabled in it including nested virtualisation which will be required now for some windows features like VBS/HVCI which might be active in your VM because they was enabled when it was created but are now missing when you run it.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!