[SOLVED] VMs freeze with 100% CPU

Could you share the output of
Code:
grep '' /proc/$(cat /var/run/qemu-server/123.pid)/task/*/stack
replacing the 123 with the frozen VM's ID, next time you experience a freeze?

EDIT: The output from the following might also be interesting:
Code:
root@pve8a1 ~ # cat query-vcpu.pm
#!/bin/perl

use strict;
use warnings;

use JSON;
use PVE::QemuServer::Monitor qw(mon_cmd);

my $vmid = shift or die "need to specify vmid\n";

my $res = eval {
    mon_cmd($vmid, "query-stats", target => "vcpu");
};
warn $@ if $@;
print to_json($res, { pretty => 1, canonical => 1 });
Run it with
Code:
perl query-vcpu.pm 123 > /tmp/vcpu-stats-1.log && sleep 5 && perl query-vcpu.pm 123 > /tmp/vcpu-stats-2.log
replacing 123 both times with the actual VM ID. The stats will end up in /tmp/vcpu-stats-1.log and /tmp/vcpu-stats-2.log. Maybe those give us a hint why the vCPUs are spinning at 100%.
 
Last edited:
Hi,

so I had another frozen VM and I did collect the details you need. We are experiencing these freezes on almost all servers with PVE 8. In our environments only VM's with at least 12GB RAM were affected, mostly Windows but not only.

Code:
root@pve:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-3
pve-kernel-5.13: 7.1-9
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.4
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.2
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

Code:
root@pve:~# qm config 903
agent: 1
balloon: 0
boot: order=virtio0
cores: 2
cpu: host
ide2: local:iso/omnios-r151040g.iso,media=cdrom,size=282454K
memory: 16384
meta: creation-qemu=6.1.1,ctime=1648454797
name: omnios
net0: virtio=9A:01:5A:56:86:44,bridge=vmbr0,firewall=1,tag=11
numa: 0
onboot: 1
ostype: solaris
scsihw: virtio-scsi-pci
smbios1: uuid=74f305c7-940f-4290-8281-d9582422eb3a
sockets: 1
startup: order=2
vga: qxl
virtio0: sas-vm:vm-903-disk-0,size=25G
virtio1: /dev/disk/by-id/ata-ST4000NE001-2MA101_X,backup=0,size=3907018584K
virtio2: /dev/disk/by-id/ata-ST4000NE001-2MA101_X,backup=0,size=3907018584K
virtio3: /dev/disk/by-id/ata-ST4000NE001-2MA101_X,backup=0,size=3907018584K
virtio4: /dev/disk/by-id/ata-ST4000NE001-2MA101_X,backup=0,size=3907018584K
virtio5: /dev/disk/by-id/nvme-IR-SSDPR-P34B-256-80_X,backup=0,size=250059096K
vmgenid: 8366bb77-9879-4164-9212-9714ef0b5016

Code:
root@pve:~# strace -c -p $(cat /var/run/qemu-server/903.pid)
strace: Process 3573654 attached
^Cstrace: Process 3573654 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
99.49   13.571011         483     28068           ppoll
  0.33    0.045225           9      4704           write
  0.07    0.009380           9      1030           recvmsg
  0.06    0.007896           7      1102           read
  0.05    0.006827           2      2796           ioctl
  0.00    0.000071           3        20           sendmsg
  0.00    0.000013           3         4           accept4
  0.00    0.000010           2         4           close
  0.00    0.000006           0         8           fcntl
  0.00    0.000003           0         4           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   13.640442         361     37740           total

Code:
root@pve:~# gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/903.pid)
[New LWP 3573655]
[New LWP 3573883]
[New LWP 3573884]
[New LWP 3573886]
[New LWP 3573888]
[New LWP 617547]
[New LWP 649672]

warning: Could not load vsyscall page because no executable was specified
0x00007fc460ec00f6 in ?? ()

Thread 8 (LWP 649672 "iou-wrk-3573654"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 617547 "iou-wrk-3573654"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 6 (LWP 3573888 "vnc_worker"):
#0  0x00007fc460e49d36 in ?? ()
#1  0x00007fc034f2d010 in ?? ()
#2  0x00007fc000000189 in ?? ()
#3  0x0000557cb53a0f0c in ?? ()
#4  0x0000000000000000 in ?? ()

Thread 5 (LWP 3573886 "SPICE Worker"):
#0  0x00007fc460ebffff in ?? ()
#1  0x0000557cb5c43870 in ?? ()
#2  0x00007fc0300014f0 in ?? ()
#3  0x0000000000000002 in ?? ()
#4  0x7fffffff00000001 in ?? ()
#5  0x0000000000000002 in ?? ()
#6  0x00007fc4627399ae in ?? ()
#7  0x0000557cb4841190 in ?? ()
#8  0x000000016281b4e0 in ?? ()
#9  0x7fffffff7fffffff in ?? ()
#10 0x93006b232bd9dc00 in ?? ()
#11 0x00007fc46281b4e0 in ?? ()
#12 0x00007fc0300014d0 in ?? ()
#13 0x00007fc0300014d8 in ?? ()
#14 0xffffffffffffdab8 in ?? ()
#15 0x0000000000000000 in ?? ()

Thread 4 (LWP 3573884 "CPU 1/KVM"):
#0  0x00007fc460ec1afb in ?? ()
#1  0x00007fc400000010 in ?? ()
#2  0x00007fc44f0ad1d0 in ?? ()
#3  0x00007fc44f0ad190 in ?? ()
#4  0x93006b232bd9dc00 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 3 (LWP 3573883 "CPU 0/KVM"):
#0  0x00007fc460ec1afb in ?? ()
#1  0x00007fc400000010 in ?? ()
#2  0x00007fc45530e1d0 in ?? ()
#3  0x00007fc45530e190 in ?? ()
#4  0x93006b232bd9dc00 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 2 (LWP 3573655 "call_rcu"):
#0  0x00007fc460ec54f9 in ?? ()
#1  0x0000557cb2440d6a in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 3573654 "kvm"):
#0  0x00007fc460ec00f6 in ?? ()
#1  0xffffffff64f2e239 in ?? ()
#2  0x0000557cb687b070 in ?? ()
#3  0x000000000000002e in ?? ()
#4  0x0000000000000000 in ?? ()
[Inferior 1 (process 3573654) detached]


Code:
root@pve:~# grep '' /proc/$(cat /var/run/qemu-server/903.pid)/task/*/stack
/proc/3573654/task/3573654/stack:[<0>] do_sys_poll+0x504/0x630
/proc/3573654/task/3573654/stack:[<0>] __x64_sys_ppoll+0xde/0x170
/proc/3573654/task/3573654/stack:[<0>] do_syscall_64+0x5b/0x90
/proc/3573654/task/3573654/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
/proc/3573654/task/3573655/stack:[<0>] futex_wait_queue+0x66/0xa0
/proc/3573654/task/3573655/stack:[<0>] futex_wait+0x177/0x270
/proc/3573654/task/3573655/stack:[<0>] do_futex+0x151/0x200
/proc/3573654/task/3573655/stack:[<0>] __x64_sys_futex+0x95/0x200
/proc/3573654/task/3573655/stack:[<0>] do_syscall_64+0x5b/0x90
/proc/3573654/task/3573655/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
/proc/3573654/task/3573886/stack:[<0>] do_sys_poll+0x504/0x630
/proc/3573654/task/3573886/stack:[<0>] __x64_sys_poll+0xc7/0x150
/proc/3573654/task/3573886/stack:[<0>] do_syscall_64+0x5b/0x90
/proc/3573654/task/3573886/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
/proc/3573654/task/3573888/stack:[<0>] futex_wait_queue+0x66/0xa0
/proc/3573654/task/3573888/stack:[<0>] futex_wait+0x177/0x270
/proc/3573654/task/3573888/stack:[<0>] do_futex+0x151/0x200
/proc/3573654/task/3573888/stack:[<0>] __x64_sys_futex+0x95/0x200
/proc/3573654/task/3573888/stack:[<0>] do_syscall_64+0x5b/0x90
/proc/3573654/task/3573888/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
/proc/3573654/task/617547/stack:[<0>] io_wqe_worker+0x1d8/0x390
/proc/3573654/task/617547/stack:[<0>] ret_from_fork+0x2c/0x50
/proc/3573654/task/649672/stack:[<0>] io_wqe_worker+0x1d8/0x390
/proc/3573654/task/649672/stack:[<0>] ret_from_fork+0x2c/0x50
 

Attachments

  • vcpu-stats-2.log
    11.7 KB · Views: 5
  • vcpu-stats-1.log
    11.7 KB · Views: 4
Hello. I was on the same page all you guys. There was an update this year or something that where came all this.
I have to move the VM to another host to practice all kind of stuff.
Please enable NUMA and disable System Protection. For me stop the problem.
 
Hello. I was on the same page all you guys. There was an update this year or something that where came all this.
I have to move the VM to another host to practice all kind of stuff.
Please enable NUMA and disable System Protection. For me stop the problem.
What do you mean with "disable system protection"?
 
It's been a while since I posted an update, which is good news. We haven't had any random freezes since half of July, so it seems we are stable again. I cannot really tell what solved the issue but I can say that I have now:
  • Disabled ballooning on VMs that used to crash, but still using it on others
  • Use aio=threads on VMs that used to crash, but still use default (io_uring) on others
  • Still using KSM on all hosts
  • Using proxmox 7.4.1, kernel 6.2.11
The only real change that happened since the last freeze, is that I replaced two hard disks in the Ceph cluster, that were reporting some failing SMART checks. I think this is related to the issue, but there are other forces at play here. We never had similar issues in proxmox 6, when I had more severe disk failures, and I think there are people in this thread that are not using Ceph. But maybe this can give a clue for other people that face this issue.
 
Wow, that would be nice. Does this mean that we can expect an update coming soon that incorporates this bugfix, or better yet, is it already available?
It's not yet applied and packaged, but I would expect that to happen in the following days. The fix is very straightforward after all, it's just not 100% clear it's the issue, but the discussion from the upstream mailing thread sounds very familiar to the reports here. @fweber is currently testing with a VM, making ballooning changes to have the counter go up faster, to see if we can finally reproduce the issue like that (and then we'll need to do that again to verify the fix). Once the package has gone through internal testing and lands on the pvetest repository, I'll let you know.
 
  • Like
Reactions: irekpias and sumsum
  • Like
Reactions: sonixier and sumsum
@fweber is currently testing with a VM, making ballooning changes to have the counter go up faster, to see if we can finally reproduce the issue like that (and then we'll need to do that again to verify the fix).


IMHO, it would be very useful for everyone if someone could explain how to reproduce the issue (i.e. how to do those ballooning changes), so we can test the patched kernel in our testing labs too. Thanks everyone for their efforts!
 
IMHO, it would be very useful for everyone if someone could explain how to reproduce the issue (i.e. how to do those ballooning changes), so we can test the patched kernel in our testing labs too. Thanks everyone for their efforts!

According to the upstream patch [1], the freeze occurs when the mmu_invalidate_seq counter exceeds 2^31 = 2147483648. I use the following bpftrace script to keep track of the counter value per QEMU process ID:
Code:
kprobe:direct_page_fault {
    $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_invalidate_seq;
    @counts[pid] = $ctr;
}

interval:s:2 {
    print(@counts);
    print("---\n");
}
You need to install bpftrace, then you can run it with bpftrace SCRIPT. Note that I only tested this on Bookworm / PVE 8, the script might not work with the bpftrace version shipped with Bullseye / PVE 7.4.

Now, to reproduce the freeze, it is necessary to get the counter above 2^31. Under normal operations, the counter seems to increase very slowly. KSM seems to accelerate it a bit, though I haven't fully tested this yet.

One thing that seems to work is excessive ballooning. I set up a PVE VM with 60000 MiB RAM, and used a script to alternate between 4000 MiB and 60000 MiB. If you want to try this yourself, you can use the script below as a starting point, but please note I threw this together quite quickly, so use it at your own risk (if you do want to try, replace the VMID accordingly). It might also need some modifications to work with Windows VMs. With this script, mmu_invalidate_seq reached 2^31 within a couple of hours and the VM froze with 100% CPU and 99% ppoll in the strace output. The increase speed seems proportional to the difference of HIGH and LOW, so you might be able to reproduce this faster if you assign more memory to the VM and adjust HIGH accordingly.

In my understanding, the proxmox-kernel-6.2.16-12-pve should fix this reproducer (so, the VM should not freeze even when the counter exceeds 2^31).

If you try this, we'd be happy about any feedback, also to find other factors that increase the speed at which the counter increments -- ballooning is probably not the only one.

Code:
#!/bin/bash
VMID=SET_VM_ID_HERE
HIGH=60000
LOW=4000
WAIT=1

balloon () {
    amount=$1
    echo balloon to $amount ...
    pvesh create /nodes/localhost/qemu/$VMID/monitor -command "balloon $amount"
    echo wait until actual=$amount ...
    while true;
    do
        info=$(pvesh create /nodes/localhost/qemu/$VMID/monitor -command "info balloon")
        echo $info
        if [[ $info == *"actual=$amount"* ]]; then
            break
        fi
        sleep 2
    done
}


while true;
do
    balloon $HIGH
    sleep 1
    balloon $LOW
done

[1] https://lore.kernel.org/all/f023d92.../T/#maa94d78b12c00071029cc0c8ff2785d6d14bce2e
 
Last edited:
  • Like
Reactions: fiona and VictorSTS
I'm testing with PVE7.4 and kernel pve-kernel-6.2.16-4-bpo11-pve on a testing machine. I haven't updgraded any production cluster to PVE8 mainly due to this bug, so I would like to test with PVE7.4. Unfortunatelly, bpftrace fails:

Code:
./bpftrace_script_mmu_invalidate_seq.bp:2:12-32: ERROR: Unknown struct/union: 'struct kvm_vcpu'
    $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_invalidate_seq;

The pbfscript does work in PVE8. I can upgrade the testing machine, but is there a way to make bpftrace to work on PVE7.4? I'm testing with a Windows VM and your balloon script. It's been running for an hour or so, but as I can't check mmu_invalidate_seq value I'm going blind.
 
I'm testing with PVE7.4 and kernel pve-kernel-6.2.16-4-bpo11-pve on a testing machine. I haven't updgraded any production cluster to PVE8 mainly due to this bug, so I would like to test with PVE7.4. Unfortunatelly, bpftrace fails:

Code:
./bpftrace_script_mmu_invalidate_seq.bp:2:12-32: ERROR: Unknown struct/union: 'struct kvm_vcpu'
    $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_invalidate_seq;

The pbfscript does work in PVE8. I can upgrade the testing machine, but is there a way to make bpftrace to work on PVE7.4? I'm testing with a Windows VM and your balloon script. It's been running for an hour or so, but as I can't check mmu_invalidate_seq value I'm going blind.
Make sure you have the pve-headers-6.2 package installed.

Then you can try the following version of the script:
Code:
#if defined(CONFIG_FUNCTION_TRACER)
#define CC_USING_FENTRY
#endif

#include <linux/kvm_host.h>

kprobe:direct_page_fault {
    $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_invalidate_seq;
    @counts[pid] = $ctr;
}

interval:s:2 {
    print(@counts);
    print("---\n");
}
 
I'm testing with PVE7.4 and kernel pve-kernel-6.2.16-4-bpo11-pve on a testing machine. I haven't updgraded any production cluster to PVE8 mainly due to this bug, so I would like to test with PVE7.4. Unfortunatelly, bpftrace fails:

Code:
./bpftrace_script_mmu_invalidate_seq.bp:2:12-32: ERROR: Unknown struct/union: 'struct kvm_vcpu'
    $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_invalidate_seq;

The pbfscript does work in PVE8. I can upgrade the testing machine, but is there a way to make bpftrace to work on PVE7.4? I'm testing with a Windows VM and your balloon script. It's been running for an hour or so, but as I can't check mmu_invalidate_seq value I'm going blind.
Hi,
but pve-kernel-6.2.16-4-bpo11-pve include the issue, perhaps pve-kernel-6.2.16-11-bpo11-pve_6.2.16-11~bpo11+2_amd64.deb contains the fix for pve7?
Or must we wait for pve-kernel-6.2.16-12 on pve7?

Udo
 
Took a while, but I've been able to reproduce the issue every time I tried on PVE7.4 + pve-kernel-6.2.16-4-bpo11-pve using a version of the script proposed by @fweber, which modifies the balloon memory for the VM. A few interesting observations:

  • The mmu_invalidate_seq counter increases way faster during memory balloon deflate than during inflate, so my version simply uses values HIGH=10000, LOW=4000. This test system seems to be slow to increase the ballon (maybe I'm just impatient!). Would be nice if we could "inflate the balloon" faster :)

  • For some reason, just running Windows OS (tested with 2k19 and 10) make that mmu_invalidate_seq counter to increase faster than Linux OS (tested with Ubuntu20.04 and Debian11). This is *without* the balloon script running. This may explain why this issue seems more prevalent on Windows VMs.

  • Linux VMs seem to take longer than Windows VMs to reach the bug point, even with the exact same VM configuration and script parameters. In my tests the Linux VMs took a bit less than 7 hours to freeze while the Windows ones took around 3 hours. I suppose that this is related to how the balloon driver is implemented and how it intereacts with the OS. This may explain, too, why this issue seems more prevalent on Windows VMs.

  • I haven't found any Linux/Windows workload that makes mmu_invalidate_seq to increase sensibly faster, so what is the OS/Apps doing seems to not affect when will it freeze. Some workloads, like benchmarks, make the counter increase for a few hundred from time to time, but such increase is nowhere near to the increase seen when the balloon script is running or when KSM is merging pages, which I've seen increases up to a million per second.

  • When KSM is running and merging pages mmu_invalidate_seq increases quite fast. Stoping KSM and forcing an unmerge with echo 2 > /sys/kernel/mm/ksm/run does not seems to increasemmu_invalidate_seq by much. This may explain why some have had some success disabling KSM, as this would make VMs to take longer to reach the bug.

  • Hibernating a VM and resuming it later, creates a new KVM process, so mmu_invalidate_seq is reset to zero. Same happens during a live migration. This is why the VM unfreezes with these actions.

  • For the purpose of testing this, running both the script and KSM don't help, as KSM takes much more time to notice and decide to merge pages than what the script takes to inflate/deflate the balloon, so KSM ends up doing almost nothing.

  • The highest observed value for mmu_invalidate_seq was 2147722615.

I'll install the updated kernel with the patch (pve-kernel-6.2.16-11-bpo11-pve) asap and run tests again.

@udo, that was the plan: reproduce the issue with a bad kernel so I can be sure that the patched kernel does in fact solve the problem.

PD: ballooning this much and this often takes a lot of host CPU!
 
Last edited:
@VictorSTS, thanks a lot for testing this and sharing your observations so far! Especially the observations regarding Windows VMs are interesting, and might provide an explanation for the bias towards Windows VMs we've seen in the reports in this thread.

I now tested the reproducer with the patched PVE 8 kernel, as well as the unpatched and patched PVE 7.4 kernels:
  • PVE 8: On kernel 6.2.16-10-pve, the ballooning reproducer provoked a freeze of a PVE VM yesterday. With kernel 6.2.16-12-pve, the reproducer did not freeze the VM, even though mmu_invalidate_seq > 4 * 10^9 > 2^31
  • PVE 7.4: On kernel 6.2.16-4-bpo11-pve, the ballooning reproducer provoked a freeze of a PVE VM within a couple of hours. With kernel 6.2.16-11-bpo11-pve, the reproducer did not freeze the VM, even though mmu_invalidate_seq > 7*10^9 > 2^31
So on my test systems, it seems like the patched kernels do prevent the freezes with 99% ppoll in the strace output. @VictorSTS (and others), if you manage to reproduce this, it would be great if you could share your results.
 
Last edited:
  • Like
Reactions: Pakillo77 and indi
I've tried testing with kernels 5.15, which in theory is not affected, and with 5.19, which in theory is affected by this bug (given that some people have reported freezing issues). None seem to have the field mmu_invalidate_seq in their kernel headers.

bpftrace reports:
Code:
bpftrace_script_mmu_invalidate_seq_PVE7.bp:8:12-62: ERROR: Struct/union of type 'struct kvm' does not contain a field named 'mmu_invalidate_seq'
    $ctr = ((struct kvm_vcpu*)arg0)->kvm->mmu_invalidate_seq;


Maybe kernel 5.19 has some other issue which produces the freezes, but given that its a no longer maintained version the route would be to go back to 5.15 or try the patched 6.2 kernel.

I'm installing patched 6.2 kernel now and starting tests
 
I've tried testing with kernels 5.15, which in theory is not affected, and with 5.19, which in theory is affected by this bug (given that some people have reported freezing issues). None seem to have the field mmu_invalidate_seq in their kernel headers.
The upstream patch [1] notes that it fixes commit a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update") [2], which is not included in the PVE 5.15 kernel. If I'm not mistaken, it is included in the PVE 5.19 kernel. However, it seems like the counter was called mmu_notifier_seq [3] instead of mmu_invalidate_seq at that time, so the bpftrace script would need to be adjusted accordingly to test kernel 5.19. But I agree that testing kernel 6.2 is probably more interesting. Thanks again!

[1] https://lore.kernel.org/all/f023d92.../T/#maa94d78b12c00071029cc0c8ff2785d6d14bce2e
[2] https://git.kernel.org/pub/scm/linu.../?id=a955cad84cdaffa282b3cf8f5ce69e9e5655e585
[3] https://git.kernel.org/pub/scm/linu...955cad84cdaffa282b3cf8f5ce69e9e5655e585#n4018
 
  • Like
Reactions: VictorSTS

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!