walk_pgd_range crash pve9.1 on 6.18+

tytanick

Renowned Member
Feb 25, 2013
130
8
83
Europe/Poland
Recently i have reported slab memory leak and it was fixed.

I am having yet another issue and wondering where to write with it.
Would you be able to tell me if this is the right place or should i send it to someone else ?
The issue seems also like memory leak.

It happens on multiple servers (less on 6.18.6, more on 6.19-rc4+) (i actually did not tested older kernels as they do not work that well with passthrough and blackwells)
All servers are doing KVM with vfio GPU PCIE passthrough and it happens when i am using HUGEPAGE 1GB + qemu
Basically i am allocating 970GB into hugepages, leaving 37GB to kvm.
In normal operation i have about 20GB free space but when this issue occurs, all RAM is taken and even when i have added 100GB swap, it was also consumed.
It can work for days or week without issue and

I did not seen that issue when i had hugepages disabled (on normal 2KB pages allocation in kvm).
And i am using hugepages as it is impossible to boot VM with >200GB ram.

Linux pve14 6.18.6-pbk #1 SMP PREEMPT_DYNAMIC Mon Jan 19 20:59:46 UTC 2026 x86_64 GNU/Linux
pve9.1


Code:
[171053.341288] BUG: unable to handle page fault for address: ff469ae640000000
[171053.341310] #PF: supervisor read access in kernel mode
[171053.341319] #PF: error_code(0x0000) - not-present page
[171053.341328] PGD 4602067 P4D 0
[171053.341337] Oops: Oops: 0000 [#1] SMP NOPTI
[171053.341348] CPU: 16 UID: 0 PID: 3250869 Comm: qm Not tainted 6.18.6-pbk #1 PREEMPT(voluntary)
[171053.341362] Hardware name: TURIN2D24G-2L+/500W/TURIN2D24G-2L+/500W, BIOS 10.20 05/05/2025
[171053.341373] RIP: 0010:walk_pgd_range+0x6ff/0xbb0
[171053.341386] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43 dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
[171053.341406] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
[171053.341416] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX: 0000000000000000
[171053.341425] RDX: 0000000000000000 RSI: 00007a227fffffff RDI: 800008dfc00002b7
[171053.341435] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09: 0000000000000000
[171053.341444] R10: ffffffff8de588c0 R11: 0000000000000000 R12: ff469ae640000000
[171053.341454] R13: 00007a2280000000 R14: 00007a2240000000 R15: ff59d95d70e6b8a8
[171053.341464] FS:  00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000) knlGS:0000000000000000
[171053.341476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171053.341485] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4: 0000000000f71ef0
[171053.341495] PKRU: 55555554
[171053.341501] Call Trace:
[171053.341508]  <TASK>
[171053.341518]  __walk_page_range+0x8e/0x220
[171053.341529] ? sysvec_apic_timer_interrupt+0x57/0xc0
[171053.341541]  walk_page_vma+0x92/0xe0
[171053.341551] smap_gather_stats.part.0+0x8c/0xd0
[171053.341563]  show_smaps_rollup+0x258/0x420
[171053.341577]  seq_read_iter+0x137/0x4c0
[171053.341588]  seq_read+0xf5/0x140
[171053.341596]  ? __x64_sys_move_mount+0x11/0x40
[171053.341607]  vfs_read+0xbb/0x350
[171053.341617]  ? do_syscall_64+0xb8/0xd00
[171053.341627] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.341637]  ? strncpy_from_user+0x27/0x130
[171053.341649]  ksys_read+0x69/0xf0
[171053.341658]  __x64_sys_read+0x19/0x30
[171053.341666]  x64_sys_call+0x2180/0x25a0
[171053.341855]  do_syscall_64+0x80/0xd00
[171053.342029] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.342198]  ? __x64_sys_ioctl+0x83/0x100
[171053.342367] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.342532] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.342696]  ? x64_sys_call+0xac0/0x25a0
[171053.342857] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.343019]  ? do_syscall_64+0xb8/0xd00
[171053.343181]  ? seq_read+0xf5/0x140
[171053.343341]  ? __x64_sys_move_mount+0x11/0x40
[171053.343504] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.343662]  ? vfs_read+0xbb/0x350
[171053.343819] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.343973]  ? ksys_read+0x69/0xf0
[171053.344126] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.344280]  ? generic_file_llseek+0x21/0x40
[171053.344432] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.344582]  ? kernfs_fop_llseek+0x7b/0xd0
[171053.344730] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.344873]  ? ksys_lseek+0x4f/0xd0
[171053.345010] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345144]  ? __x64_sys_lseek+0x18/0x30
[171053.345275] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345407]  ? x64_sys_call+0x1abe/0x25a0
[171053.345535] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345665]  ? do_syscall_64+0xb8/0xd00
[171053.345792] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.345919]  ? irqentry_exit+0x43/0x50
[171053.346044] ? srso_alias_return_thunk+0x5/0xfbef5
[171053.346169] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[171053.346292] RIP: 0033:0x7d4e8ed61687
[171053.346417] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
[171053.346687] RSP: 002b:00007ffdd7c76000 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
[171053.346828] RAX: ffffffffffffffda RBX: 00007d4e8ec94b80 RCX: 00007d4e8ed61687
[171053.346969] RDX: 0000000000002000 RSI: 000061ff84297ce0 RDI: 0000000000000006
[171053.347111] RBP: 0000000000002000 R08: 0000000000000000 R09: 0000000000000000
[171053.347253] R10: 0000000000000000 R11: 0000000000000202 R12: 000061ff84297ce0
[171053.347394] R13: 000061ff7d3d62a0 R14: 0000000000000006 R15: 000061ff842478c0
[171053.347542]  </TASK>
[171053.347684] Modules linked in: sctp ip6_udp_tunnel udp_tunnel nf_tables bridge stp llc softdog bonding sunrpc binfmt_misc nfnetlink_log amd_atl intel_rapl_msr intel_rapl_common nls_iso8859_1 amd64_edac edac_mce_amd kvm_amd snd_pcm snd_timer dax_hmem ipmi_ssif kvm cxl_acpi snd polyval_clmulni ghash_clmulni_intel cxl_port soundcore aesni_intel rapl cxl_core acpi_ipmi einj pcspkr ast ipmi_si spd5118 ipmi_devintf k10temp ccp ipmi_msghandler joydev input_leds mac_hid sch_fq_codel msr vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd efi_pstore nfnetlink dmi_sysfs autofs4 btrfs blake2b_generic xor raid6_pq mlx5_ib ib_uverbs macsec ib_core dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme mlx5_core nvme_core cdc_ether nvme_keyring igb mlxfw psample usbnet nvme_auth i2c_algo_bit usbkbd mii hid_generic hkdf tls dca ahci i2c_piix4 libahci i2c_smbus usbmouse usbhid hid
[171053.349092] CR2: ff469ae640000000
[171053.349269] ---[ end trace 0000000000000000 ]---
[171054.248409] RIP: 0010:walk_pgd_range+0x6ff/0xbb0
[171054.248750] Code: 08 49 39 dd 0f 84 8c 01 00 00 49 89 de 49 8d 9e 00 00 20 00 48 8b 75 b8 48 81 e3 00 00 e0 ff 48 8d 43 ff 48 39 f0 49 0f 43 dd <49> f7 04 24 9f ff ff ff 0f 84 e2 fd ff ff 48 8b 45 c0 41 c7 47 20
[171054.249177] RSP: 0018:ff59d95d70e6b748 EFLAGS: 00010287
[171054.249392] RAX: 00007a22401fffff RBX: 00007a2240200000 RCX: 0000000000000000
[171054.249820] RDX: 0000000000000000 RSI: 00007a227fffffff RDI: 800008dfc00002b7
[171054.250036] RBP: ff59d95d70e6b828 R08: 0000000000000080 R09: 0000000000000000
[171054.250253] R10: ffffffff8de588c0 R11: 0000000000000000 R12: ff469ae640000000
[171054.250471] R13: 00007a2280000000 R14: 00007a2240000000 R15: ff59d95d70e6b8a8
[171054.250691] FS:  00007d4e8ec94b80(0000) GS:ff4692876ae7e000(0000) knlGS:0000000000000000
[171054.250914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[171054.251137] CR2: ff469ae640000000 CR3: 0000008241eed006 CR4: 0000000000f71ef0
[171054.251375] PKRU: 55555554
[171054.251601] note: qm[3250869] exited with irqs disabled
 
Last edited:
I’m curious about how you’re creating those kernels! Google seems to be having trouble finding “6.18.6-pbk,” so it looks like you might be using a custom build.

Proxmox typically uses a modified and patched version of Ubuntu kernels. I have https://github.com/jaminmc/pve-kernel, and it includes 6.18.0 based on Ubuntu-6.18.0-9.9 in the release section. However, I just checked https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/resolute?h=master-next, and it now has Ubuntu-6.19.0-3.3, which is based on v6.19-rc6.

I’m not sure if the Proxmox Team is currently testing 6.18 or even 6.19. They currently have 6.17.9-1-pve in their testing repository. I’ll update the /submodules/ubuntu-kernel to the latest commit on my machine and give it a try when I have some free time. If it works well for me, I’ll update my GitHub.

EDIT: After looking at some of your other posts, it looks like you are using https://prebuiltkernels.com/ for the other kernels. Those are compiled for Ubuntu 24.04. While they will work on Proxmox, They don't have the patches and compile optimizations for Virtualization, or LXC containers. And also, they don't have the firmware package that goes along with the kernel deb's. Try my 6.18 version, and see if it works better. https://github.com/jaminmc/pve-kernel/releases. There is also a link in the release page for updated fimware for 6.18.
 
Last edited:
I’m curious about how you’re creating those kernels! …

Hehe so google does not see everything
I am using this:
https://prebuiltkernels.com/

When made small script to install kernels from there:

Code:
kernel="6.18.7-pbk"; deb="${kernel%-*}"; deb="${deb/-rc/~rc}-1"; rm -f ./linux-*.deb; wget "https://prebuiltkernels.com/download/linux-image-${kernel}_${deb}_amd64.deb" "https://prebuiltkernels.com/download/linux-headers-${kernel}_${deb}_amd64.deb" && apt install ./linux-hea* && apt install ./linux-ima* && proxmox-boot-tool kernel pin "$kernel"

Do you think yours has some patch that has this issue fixed?
I really do not want to use 6.17 or 6.14 as i see my passthrough GPUs stable only on 6.18.4+ kernels.
 
I’m not sure if your particular issue is fixed in the Ubuntu or Proxmox patches. It more than likely is. Try my Kernel and see!

Ubuntu frequently updates and patches the Linux kernel, and Proxmox then applies its own patches to it for VM and LXC containers. https://prebuiltkernels.com/ uses the Linux mainline kernel, so it doesn’t include Ubuntu’s patches and optimizations, nor Proxmox’s. Since I use ZFS for my filesystem, I need ZFS included in the kernel.

I managed to make my own 6.19.0-jaminmc-pve kernel based on Ubuntu-6.19.0-3.3, which itself is based on Linux 6.19-rc6. With Proxmox Patches and Configuration, I was alse able to get OpenZFS 2.4.0 to work with 6.19 with my own patches. So far, it’s been stable, and GPU passthrough is working well (Vega 64).

I had Grok created an explanation of the differences between the kernels, even though they’re the same version number.
https://grok.com/share/bGVnYWN5LWNvcHk_033a5261-0b4a-4e06-b9f7-ad608f1534a4

https://github.com/jaminmc/pve-kernel/releases/tag/v6.19.0

There are 2 versions. One where Proxmox rolled back some of the new stuff in TCP in 6.17 that was causing problems with the PBS server, and one where those were not done to the TCP, so it is native to 6.19.

Here is the one that has the patch from proxmox that rolls them back:
Code:
# Download Kernel & Headers
wget https://github.com/jaminmc/pve-kernel/releases/download/v6.19.0/proxmox-{kernel,headers}-6.19.0-1-jaminmc-tcp-pve_6.19.0-1_amd64.deb

# Download Firmware
wget https://github.com/jaminmc/pve-kernel/releases/download/v6.19.0/pve-firmware_3.19-1-jaminmc_all.deb

# Install Them
apt install ./proxmox-{kernel,headers}-6.19.0-1-jaminmc-tcp-pve_6.19.0-1_amd64.deb ./pve-firmware_3.19-1-jaminmc_all.deb

# Unpin your current kernel. Since it is the Newest and highest PVE kernel, it should be the default.
proxmox-boot-tool kernel unpin

And here is the one that doesn't:
Code:
# Download Kernel & Headers
wget https://github.com/jaminmc/pve-kernel/releases/download/v6.19.0-No-TCP-6.17-Regressions/proxmox-{kernel,headers}-6.19.0-1-jaminmc-pve_6.19.0-1_amd64.deb
# Download Firmware
wget https://github.com/jaminmc/pve-kernel/releases/download/v6.19.0/pve-firmware_3.19-1-jaminmc_all.deb

# Install Them
apt install ./proxmox-{kernel,headers}-6.19.0-1-jaminmc-pve_6.19.0-1_amd64.deb ./pve-firmware_3.19-1-jaminmc_all.deb

# Unpin your current kernel. Since it is the Newest and highest PVE kernel, it should be the default.
proxmox-boot-tool kernel unpin
 
Last edited: