Nested PVE (on PVE host) Kernel panic Host injected async #PF in kernel mode

Darkangeel_hd · May 15, 2025

guruevi said:
That’s your guests guest, does your primary guest also have ballooning disabled.

Thats the nested PVE config, the L1 guest
the L2 ubuntu guest vm has ballooning enabled yes.

l.leahu-vladucu said:
My bad, I forgot to check whether it's already available in the pve-no-subscription repository.

Don't worry, not a big deal

Neobin said:
It is currently in the pvetest repository:
https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_test_repo

Thanks @Neobin !!

l.leahu-vladucu said:
Anyway, the change in the kernel that causes the crash has been added in kernel 5.8 - see GitHub (for easier readability in the browser) and Linux kernel mailing list. You'll thus need kernel 5.7 or older, and these are not available on PVE 7 either. Looking at the Proxmox VE Roadmap, you'll need to use the even older Proxmox VE 6.4 (download here) to see whether it works. The thing is, async page faults in kernel mode are not allowed for a good reason, so you'll have to see whether it will work better. Last but not least, I just want to mention that the suggestion of trying out PVE 6.4 is for testing purposes only, since that version has reached its end of life in September 2022.

Also, would it be possible to temporarily disable swap on the host to check whether it improves the situation? Again, this is not a general recommendation, since using swap has its benefits, but at least this would confirm our current assumptions about your issue.

YOU FOUND IT!!!

yeah provably all my other VMs have older kernels than 5.8
I will check later.

But yeah it makes total sense to restrict the PF mechanism to user space, where is actually useful.
I don't see it being as useful in kernel space.

The thing is, when disabling PF# in kernel space, did someone make any changes on qemu/kvm so it knows which faults should be dealt with using the PF approach and which ones with the halt one?

Cause the commit you mentioned means that every distro using kernel 5.8+ onward will be susceptible to the same problem as pve8, and that this problem has little to do with nested virt and its all about memory management between host and guest.

And yes i can totally try with PVE 6 to confirm if it is really not affected by it, but that is not really a solution to this problem, neither would be disabling the swap on the host, as is very useful, as not all guest memory regions must be kept active and in ram all the time...

I will also try running a ubuntu VM on the host, with the latest stable kernel and see if it exhibits the same behavior as PVE

So in which layer should the problem be tackled? qemu, linux kernel on host or linux kernel on guest?

Again thanks you all so much for your time i really appreciate it!

guerby · Aug 15, 2025

For the record on a debian 13 host (kernel 6.12.38) I ran a kvm (10.0.2) with PVE9

Code:

qemu-system-x86_64 -machine type=pc,accel=kvm -cpu host -m 8192 -drive file=vm1.qcow2,format=qcow2,if=virtio -k fr  -netdev user,id=net0,hostfwd=tcp:127.0.0.1:8066-:8006,hostfwd=tcp:127.0.0.1:8022-:22 -device virtio-net-pci,netdev=net0,addr=0x08  -serial stdio -vga none -cpu host -smp 4 -display none -cdrom  pve9.iso

Then ran cloudinit debian 13 nested VM inside the PVE, did some thing inside it and after a while I got the "injected async PF" panic on the PVE9 console.

The debian host has swap and doing lots of other things.

@l.leahu-vladucu any idea other than turning off swap (which I'd rather not do on this particular host)?

guerby · Aug 16, 2025

Code:

[32029.460800] Kernel panic - not syncing: Host injected async #PF in kernel mode
[32029.472728] CPU: 2 UID: 0 PID: 136167 Comm: pvestatd Tainted: P           O       6.14.8-2-pve #1
[32029.476915] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[32029.481049] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[32029.484024] Call Trace:
[32029.486574]  <TASK>
[32029.490233]  dump_stack_lvl+0x5f/0x90
[32029.490983]  dump_stack+0x10/0x18
[32029.493376]  panic+0x12b/0x2fa
[32029.493874]  ? early_xen_iret_patch+0xc/0xc
[32029.494539]  __kvm_handle_async_pf+0xc3/0xe0
[32029.500711]  exc_page_fault+0xb8/0x1e0
[32029.501528]  asm_exc_page_fault+0x27/0x30
[32029.510582] RIP: 0010:__put_user_4+0xd/0x20
[32029.511285] Code: 66 89 01 31 c9 0f 01 ca e9 90 a0 01 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 cb 48 c1 fb 3f 48 09 d9 0f 01 cb <89> 01 31 c9 0f 01 ca c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90
[32029.519047] RSP: 0018:ffffbf488aaabf00 EFLAGS: 00050206
[32029.522482] RAX: 00000000000213e7 RBX: 0000000000000000 RCX: 0000760692482e50
[32029.525880] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[32029.531179] RBP: ffffbf488aaabf10 R08: 0000000000000000 R09: 0000000000000000
[32029.535160] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[32029.537327] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[32029.538862]  ? schedule_tail+0x42/0x70
[32029.541446]  ret_from_fork+0x1c/0x70
[32029.543239]  ret_from_fork_asm+0x1a/0x30
[32029.544870] RIP: 0033:0x76069259d202
[32029.547517] Code: Unable to access opcode bytes at 0x76069259d1d8.
[32029.552400] RSP: 002b:00007ffe1c00b500 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[32029.559229] RAX: 0000000000000000 RBX: 00007ffe1c00b500 RCX: 000076069259d202
[32029.566832] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[32029.573968] RBP: 00006415fb85a450 R08: 0000000000000000 R09: 0000000000000000
[32029.580870] R10: 0000760692482e50 R11: 0000000000000246 R12: 00006415f63f22e8
[32029.586908] R13: 0000000000000002 R14: 0000000000000000 R15: 00006415d44df5b0
[32029.593439]  </TASK>
[32029.596630] Kernel Offset: 0x23800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[32029.602331] ---[ end Kernel panic - not syncing: Host injected async #PF in kernel mode ]---

hradec · Aug 20, 2025

I'm seeing the same problem running PVE9 beta in a nested VM on PVE8.2.4.
I'm running Debian13 vm in the nested PVE9, and after a while, I get this crash:

Code:

pve-yvr login: [ 2089.981351] INFO: task cfs_loop:1122 blocked for more than 122 seconds.
[ 2089.981899]       Tainted: P           O       6.14.8-2-pve #1
[ 2089.982220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2089.982654] task:cfs_loop        state:D stack:0     pid:1122  tgid:1120  ppid:1      task_flags:0x400040 flags:0x00000002
[ 2089.983257] Call Trace:
[ 2089.983526]  <TASK>
[ 2089.983673]  __schedule+0x466/0x13f0
[ 2089.984215]  schedule+0x29/0x130
[ 2089.984427]  kvm_async_pf_task_wait_schedule+0x186/0x1c0
[ 2089.984876]  __kvm_handle_async_pf+0x5c/0xe0
[ 2089.985126]  exc_page_fault+0xb8/0x1e0
[ 2089.985359]  asm_exc_page_fault+0x27/0x30
[ 2089.985736] RIP: 0033:0x7f87864b8474
[ 2089.985960] RSP: 002b:00007f8784fefcf0 EFLAGS: 00010246
[ 2089.986310] RAX: 0000000000000001 RBX: 000000000000025c RCX: 00005b2085773f68
[ 2089.986697] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005b208578a938
[ 2089.987061] RBP: 0000000000000001 R08: 0000000000000001 R09: 00005b2085773fa8
[ 2089.987435] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f878673a064
[ 2089.987820] R13: 00005b2085789f28 R14: 000000000000025e R15: 00007f878673a000
[ 2089.988286]  </TASK>
[ 2107.954191] Kernel panic - not syncing: Host injected async #PF in kernel mode
[ 2107.956364] CPU: 1 UID: 109 PID: 23170 Comm: saunafs-uraft-h Tainted: P           O       6.14.8-2-pve #1
[ 2107.958006] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[ 2107.958408] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 4.2023.08-4 02/15/2024
[ 2107.958893] Call Trace:
[ 2107.959136]  <TASK>
[ 2107.959355]  dump_stack_lvl+0x5f/0x90
[ 2107.959720]  dump_stack+0x10/0x18
[ 2107.959990]  panic+0x12b/0x2fa
[ 2107.960317]  ? early_xen_iret_patch+0xc/0xc
[ 2107.960617]  __kvm_handle_async_pf+0xc3/0xe0
[ 2107.960934]  exc_page_fault+0xb8/0x1e0
[ 2107.961231]  asm_exc_page_fault+0x27/0x30
[ 2107.961529] RIP: 0010:__put_user_4+0xd/0x20
[ 2107.961833] Code: 66 89 01 31 c9 0f 01 ca c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 cb 48 c1 fb 3f 48 09 d9 0f 01 cb <89> 01 31 c9 0f 01 ca c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90
[ 2107.962880] RSP: 0018:ffffc0e1a4e1ff00 EFLAGS: 00050202
[ 2107.963233] RAX: 0000000000005a82 RBX: 0000000000000000 RCX: 00007816a5721a10
[ 2107.963660] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 2107.964100] RBP: ffffc0e1a4e1ff10 R08: 0000000000000000 R09: 0000000000000000
[ 2107.964530] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 2107.964960] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2107.965390]  ? schedule_tail+0x42/0x70
[ 2107.965690]  ret_from_fork+0x1c/0x70
[ 2107.966066]  ret_from_fork_asm+0x1a/0x30
[ 2107.966366] RIP: 0033:0x7816a5801202
[ 2107.966651] Code: Unable to access opcode bytes at 0x7816a58011d8.
[ 2107.967041] RSP: 002b:00007ffd950a56a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[ 2107.967487] RAX: 0000000000000000 RBX: 00007ffd950a56a0 RCX: 00007816a5801202
[ 2107.967913] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[ 2107.968350] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 2107.968772] R10: 00007816a5721a10 R11: 0000000000000246 R12: 00007ffd950a5810
[ 2107.969194] R13: 0000000000000000 R14: 0000000000000000 R15: 00005f21d4d03c10
[ 2107.969611]  </TASK>
[ 2107.970131] Kernel Offset: 0x38800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2109.010662] Rebooting in 120 seconds..

in PVE9. HOST running PVE8 is not affected, and it is running a bunch of other things, with swap enable too.

l.leahu-vladucu · Sep 3, 2025

Thanks for the reports, everyone. It would still be interesting to see whether temporarily disabling swap on the host improves the situation, if you would like to try. Again, this is not a general recommendation, since using swap has its benefits, but at least this would confirm our current assumptions about your issue.

I'm still in the process of investigating under which conditions this bug occurs, but I was not yet able to reproduce it myself. So if you have any information that might be useful, please let me know.

akxx · Oct 20, 2025

l.leahu-vladucu said:
disabling swap on the host improves the situation

i am having the same issue with kernel panic. additionally guest in the nested pve also hangs randomly.
...
i disabled swap (ie. swapoff -a) and restarted the guest with pve inside. started the guest in the nested pve and did some load test, which were triggering the kernel panic before after a few minutes.
well, after 30 mins off testing no kernel panic. no nested guest hangs.
pve kernel on HW and in guest is: Linux 6.14.11-4-pve
...
to check visa versus i rebooted the HW host(so swap was enabled again) and did the whole procedure again.
hm .. sadly (or luckily) the kernel panic in the nested pve do not happens again

but nested guest again hangs.

@l.leahu-vladucu i hope this helps you to proceed

hradec · Oct 21, 2025

I'm having the same issues here. One thing I did to be able to see some debug information on the guest PVE was using `qm teminal <vmid>` to see the terminal console of the guest PVE, and leaving it on a byobu/tmux session on the host PVE, so I could scroll back on the output (over ssh) once I see the guest PVE rebooted.

and I did notice the reboot always happens following a `hrtimer: interrupt took #####ns` message. Sometimes I see a kernel `Oops: general protection fault, probably for non-canonical address blablabla PREEMPT SMP NOPTI` and sometimes just a kernel panic right after the message. Sometimes there's only the `BdsDxe: loading Boot0008 "proxmox" from HD(2,GPT,5B5D98BC-45F7-4251-8318-0A3EC1E71133,0x800,0x100000)/\EFI\proxmox\shimx64.efi` reboot message saying it's loading efi, right after the hrtimer` message.

To setup the serial terminal so you can use `qm terminal`, just add a serial port to the PVE vm in the host PVE, and add:

GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX console=tty0 console=ttyS0,115200"
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial --unit=0 --speed=115200"

to /etc/default/grub, and run `update-grub`.
In the next reboot, you will see grub showing up in `qm terminal <guest pve vm id>`, followed by the kernel output and the login prompt.

btw, don't use the browser webui xterm.js console... it resets the text buffer when the VM reboots, so it's completely useless. Log in the host PVE via ssh, and run `qm terminal <guest vpe vm id>` in a byobu/tmux session, so byobu will keep the text buffer for you and you can come back to it later over ssh again.

I'm going to disable SWAP on my host PVE to see if that improves the problem for me, and I'll update this thread with the result asap.
Hope this helps you guys debug the reboots.

-H

hradec · Oct 23, 2025

I have tested disabling the swap on booth host and guest, and I still keep seeing crashes.
Unfortunately, the guest without SWAP gives me a different crash which essentially freezes the guest PVE without a panic, so it won't reboot, I can't connect over ssh, and I can't nicely reboot it.
I could still ping it's IP, but apart from that, it was inaccessible... which is actually worse because I have to manually run "qm reset <guest pve id>" on the host to reboot it. (crash log attached)

So I have re-enabled SWAP on the guest, NO SWAP on HOST, and we are back to the same:

"Oops: general protection fault, probably for non-canonical address blablabla PREEMPT SMP NOPTI"

kernel panic. (also attached)

So, I don't think the cause for the "Oops" is the host swap, as I got the locked down server without it on booth host and guest.

btw, I run windows 10 with nested virtualization for WSL on the same host, no crashes.

hradec · Tuesday at 22:23

Just a quick update on this issue: As a test, I installed the latest "linux-image-cloud-amd64" package, which is kernel version 6.12.48+deb13-cloud-amd64 as of 2025-Nov-10 (there is no 6.14 that I could find, unfortunately), and the Proxmox 9 nested VM has been up without Oops, protection faults, panics or reboots for about a little more than 22 hours. (first boot with the cloud kernel!)

Using the Proxmox 9 kernels (6.14.11-4-pve, 6.14.11-1-pve, 6.14.8-2-pve and 6.14.8-1-pve) I couldn't ever get more than 5-6 hours without a Oops/protection fault/panic/reboot.

The kernel cmdline: "BOOT_IMAGE=/boot/vmlinuz-6.12.48+deb13-cloud-amd64 root=/dev/mapper/pve-root ro console=tty0 console=ttyS0,115200 no-kvmapf systemd.debug-shell=1 panic=10 kvm-intel.nested=1 kvm-amd.nested=1"

I do have "no-kvmapf" in the kernel command line still, but I'm planning on remove it to test if this cloud kernel stays stable for a week.

The host is running the same Promox 8 with kernel version 6.14.4-1-pve, and SWAP is ENABLE on the host, as well as on the nested virt vm running Proxmox 9 with the cloud image.

There are no "systemd-journald[####]: Under memory pressure, flushing caches." messages anymore!

The only messages I'm seeing now are:
hrtimer: interrupt took ###### ns
perf: interrupt took too long (#### > ####), lowering kernel.perf_event_max_sample_rate to #####

Got 3 of these since rebooting with cloud kernel, the first 2 about one hour apart right after the reboot, and the third one about 3 hours after. It's been about 18 hours and no other message came up. (I'm talking about dmesg messages)

I'll keep you guys up2date on how the standard debian 13 cloud kernel behaves... Fingers crossed it will stay stable from now on, so we will have a working nested virt kernel config of kernel 6.12 as a basis for comparison with the 6.14-pve kernel cfg to figure out what's going on when running in a nested virtualization VM.

Search

Search

Nested PVE (on PVE host) Kernel panic Host injected async #PF in kernel mode

Darkangeel_hd

New Member

guerby

Active Member

guerby

Active Member

hradec

Active Member

l.leahu-vladucu

Proxmox Staff Member

akxx

Active Member

hradec

Active Member

hradec

Active Member

Attachments

hradec

Active Member

We value your privacy