AMD Incpetion fixes cause QEMU/KVM memory leak

liquidrory · Aug 11, 2023

Hi, I have a Proxmox 7 installation that's been going strong until just this morning. From what I can gather, any running QEMU process causes **very** fast memory allocation on the host until eventually, everything is OOM killed. The PVE node is itself a virtual machine. For the purposes of this thread "host" will refer to the hypervisor in which PVE is running, not PVE itself.

Here's what I know:

- On 1 August, the PVE node's packages were updated, and the host rebooted. No problems, and no memory leak.
- This morning, 11 August, there were no packages on PVE to be updated (so, nothing in that regard changed from 1 August to 11 August), and the host was rebooted.
- Upon starting up, *something* begins to allocate ~100-200MB/second of memory until the machine crashes
- With the fleeting seconds of usability I had with each boot, I manually disabled each container and VM set to start when the PVE node does
- The leak stopped happening as soon as no QEMU/KVM guests were set to start with the PVE node
- Starting any QEMU/KVM guest, any at all, causes the memory leak to occur. The leak continues until the guest is stopped.
- The amount of memory consumed is not dependent on the amount of memory assigned to the guest. For example, a guest only assigned 512MB of memory can still consume all 16GB of memory on the PVE node.
- The memory is not returned when the guest stops.
- Nothing in htop or ps is revealing what is using all of this memory.
- Rolling back to an earlier kernel revision does not solve the problem

I'm ready to post logs, version numbers, etc. just let me know what you need. I did some cursory searching on both Google and this forum but it looks like I'm the first person to notice this problem.

liquidrory · Aug 11, 2023

I've determined that this is likely a problem with the recent emergency hotfixes for the AMD Inception vulnerability. Taking the host back to the kernel version immediately before those fixes allows everything to work again.

Additionally, the bug affects all nested virtualization, not just on Proxmox, though it only leaks memory on Proxmox. On bare Debian 12, a single CPU core is maxed out and the guest fails to start.

fiona · Aug 14, 2023

Hi,
~~can you try booting the older kernel and see if the problem goes away?~~ EDIT: Sorry, just saw it's already mentioned, but please share the exact kernel versions. Please also share the output of pveversion -v and the VM configuration qm config <ID> --current. I might have missed something, but I don't think there are any fixes for AMD Inception in the Proxmox kernel yet, I just see one for Zenbleed, do you mean that one?

fiona · Aug 14, 2023

liquidrory said:
Hi, I have a Proxmox 7 installation that's been going strong until just this morning. From what I can gather, any running QEMU process causes **very** fast memory allocation on the host until eventually, everything is OOM killed.

So this time "host" still refers to the Proxmox VE VM? Or do you really mean QEMU process on the host.

liquidrory said:
The PVE node is itself a virtual machine. For the purposes of this thread "host" will refer to the hypervisor in which PVE is running, not PVE itself.

What hypervisor are you using to run the Proxmox VE VM? What kernel version does the hypervisor have?

liquidrory · Aug 14, 2023

The bug occurs with both Proxmox and libvirt on Debian 12 as the intermediate hypervisors. I was able to narrow it down to kernel 5.10.189 (and .190 released just a few hours later) and another person has confirmed it occurs on the 6.4 branch as well. The kernel version of the intermediate Proxmox hypervisor is 5.15.108-1-pve. I'm running it on QEMU/KVM, through libvirt.

I apologize for any confusion in my original post; The second sentence of the post does not follow the rule where host always refers to the outer host. In this case, it refers to the memory leak occuring on Proxmox as the intermediate hypervisor.

Here is the output of `pveversion -v` on the intermediate Proxmox hypervisor:

Code:

proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

fiona · Aug 14, 2023

liquidrory said:
The bug occurs with both Proxmox and libvirt on Debian 12 as the intermediate hypervisors. I was able to narrow it down to kernel 5.10.189 (and .190 released just a few hours later) and another person has confirmed it occurs on the 6.4 branch as well. The kernel version of the intermediate Proxmox hypervisor is 5.15.108-1-pve. I'm running it on QEMU/KVM, through libvirt.

So you applied the AMD Inception fixes on the host kernel and then the issue started happening?

liquidrory said:
Here is the output of `pveversion -v` on the intermediate Proxmox hypervisor:

Code:

proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)

And with the previous kernel, i.e. 5.15.107-2-pve it works?

liquidrory · Aug 14, 2023

fiona said:
So you applied the AMD Inception fixes on the host kernel and then the issue started happening?

And with the previous kernel, i.e. 5.15.107-2-pve it works?

Applying the AMD Inception fixes on the outer host breaks virtualization inside of any guests running on it, yes.

Earlier Proxmox kernels do not fix this problem, but earlier outer host kernels do. For example, Proxmox works fine and is able to run nested guests, if the host it is running on is on kernel 5.10.188.

fiona · Aug 14, 2023

We cannot influence what kind of host kernel you are running and it sounds like the issue lies with the host kernel if it's not present in older versions. Better report the issue to where you got the host kernel from.

liquidrory · Aug 14, 2023

That's the plan, I made this thread before I figured it out. It's also possible that someone on your team has better access to kernel developers than I do; I'm just some random, you guys are maintainers of a rather important project that might frequently be used to perform nested virtualization.

fiona · Aug 14, 2023

liquidrory said:
That's the plan, I made this thread before I figured it out. It's also possible that someone on your team has better access to kernel developers than I do; I'm just some random, you guys are maintainers of a rather important project that might frequently be used to perform nested virtualization.

No worries. Unfortunately, we only have limited time so we need to focus on our own kernels. The AMD Inception fixes will most likely be backported via the Ubuntu tree and we will be sure to debug the issue you reported before releasing the kernel if it affects us too.

kraut.hosting · Aug 18, 2023

@fiona @Stoiko Ivanov Just my five cents but the SRSO will be a replacement for microcode update prior Zen 3.
Didn't understand what "intermediate hypervisors" and "outer hosts" are? Guess this is nested virtualization.
But unless recent Debian kernel with SRSO fixes runs directly on the metal this might be quite an edge case.

Meanwhile it's known there are some issues with the first SRSO patchset released in upstream kernels.
There will be new upstream releases with SRSO fixes and hopefully more awareness and public testing:
https://www.phoronix.com/review/amd-inception-benchmarks

@liquidrory For testing the impact of SRSO fixes in Proxmox we need to wait for a new PVE kernel first.
Nested virtualization is a quite complex QA case and needs the first VM layer tested before going deeper.
Would be very nice if you could install a new PVE kernel bare metal once rebased on a fixed Ubuntu one

TheMrg · Aug 22, 2023

any news? safe to upgrade ?

fiona · Aug 22, 2023

TheMrg said:
any news? safe to upgrade ?

The Proxmox VE kernel does not yet have the Inception fixes applied, that will still take a bit. A kernel with downfall fixes (6.2.16-10) is currently available in the pvetest repository. In any case, you should install the microcode updates for your CPU too.

TheMrg · Aug 22, 2023

we are unclear about inception.
we use zen2 AMD cpu. are we safe?
https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7005.html is unclear
and
"These architectures do not require a microcode update since the IBPB feature introduced in 2018 to mitigate Spectre v2 already works fine to flush branch type predictions from the branch predictor."

many thanks.

kraut.hosting · Aug 22, 2023

@TheMrg For Zen2 & below you need to wait for a new kernel or swap in Zen3 CPUs to apply microcode from Debian.
The upcoming 6.5 kernel will have the latest patchset for SRSO fixes that quite likely will be backported to 5.15 LTS.
The upstream 5.15.126 has the initial fixes, but also they first need to land in Ubuntu kernel before a new PVE kernel.
Security supply chain for Epyc <= Zen2: upstream Linux kernel devs > Canonical kernel team > our PVE kernel heroes

Search

Search

AMD Incpetion fixes cause QEMU/KVM memory leak

liquidrory

New Member

liquidrory

New Member

fiona

Proxmox Staff Member

fiona

Proxmox Staff Member

liquidrory

New Member

fiona

Proxmox Staff Member

liquidrory

New Member

fiona

Proxmox Staff Member

liquidrory

New Member

fiona

Proxmox Staff Member

kraut.hosting

Member

TheMrg

Well-Known Member

fiona

Proxmox Staff Member

TheMrg

Well-Known Member

kraut.hosting

Member