[SOLVED] kvm_nx_huge_page_recovery_worker message in log...

AddiCgn

New Member
May 22, 2023
18
2
3
Hi,

we found this log entry on a host updated to PVE 8 beta and now PVE 8 release:

Code:
------------[ cut here ]------------
WARNING: CPU: 13 PID: 2578 at arch/x86/kvm/mmu/mmu.c:6949 kvm_nx_huge_page_recovery_worker+0x3c4/0x410 [kvm]
Modules linked in: tcp_diag inet_diag veth rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs e>
drm_display_helper btusb snd_hda_codec crypto_simd btrtl cec mac80211 cryptd btbcm snd_hda_core rc_core bti>
wmi pinctrl_cannonlake
CPU: 13 PID: 2578 Comm: kvm-nx-lpage-re Tainted: P           O       6.2.16-3-pve #1
Hardware name: Intel(R) Client Systems NUC9i9QNX/NUC9i9QNB, BIOS QXCFL579.0072.2023.0418.1511 04/18/2023
RIP: 0010:kvm_nx_huge_page_recovery_worker+0x3c4/0x410 [kvm]
Code: ff 48 8b 45 c0 4c 39 e0 0f 85 e6 fd ff ff 48 89 df e8 e0 e7 f9 ff e9 ed fd ff ff 49 bc ff ff ff ff ff >
RSP: 0018:ffffad26c6297e40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffad26c68a9000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffad26c6297ec0 R08: 0000000000000000 R09: 0000000000000000
R10: ffff93b98c053e30 R11: 0000000000000000 R12: ffffad26c6297e80
R13: 0000000000000001 R14: 000000000000001d R15: ffff93b98c053ec0
FS:  0000000000000000(0000) GS:ffff93c120f40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd4f689e3e0 CR3: 000000035b210004 CR4: 00000000003726e0
Call Trace:
<TASK>
? __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
kvm_vm_worker_thread+0x9d/0x1b0 [kvm]
? __pfx_kvm_vm_worker_thread+0x10/0x10 [kvm]
kthread+0xe6/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x29/0x50
</TASK>
 ---[ end trace 0000000000000000 ]---

I'm not sure if we had this message also in PVE 7.4, but at least I did not see this before.

Should we be concerned? Anything to try?

Thanks for pointers.
 
Sorry, I'm not in this matter. What are hugepages or how/where are they controlled? (Just installed PVE and no other modifications)
 
Perhaps this?
Code:
cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
 
Thanks for all the feedback. So, either
  • is it possible to use kernel 6.3 somehow with PVE 8 or
  • is it possible to disable the feature in the kernel until fixed?
Thanks.
 
Thanks for all the feedback. So, either
  • is it possible to use kernel 6.3 somehow with PVE 8 or
if you find a .deb, it should work (maybe in ubuntu mainline ?)
  • is it possible to disable the feature in the kernel until fixed?
Thanks.
you need to patch the kernel and rebuild the kernel ;)
 
So, this would not work for disabling the feature until fixed: echo never > /sys/kernel/mm/transparent_hugepage/enabled (Or is this something different?)
 
Last edited:
And as we have seen this on two nodes now, is this something in our local setup only or anyone else seeing this?

As this is a warning, we could ignore and we are safe? Thanks.
 
Thanks for the help, but while the flag is present on the command line, NX is still active.

Code:
kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-3-pve root=/dev/mapper/pve-root ro quiet noexec=off
...
kernel: NX (Execute Disable) protection: active

As we are not sure how serious the issue is, we will wait until a fix is out. Thanks for all the help again.
 
Thanks for the help, but while the flag is present on the command line, NX is still active.

Code:
kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-3-pve root=/dev/mapper/pve-root ro quiet noexec=off
...
kernel: NX (Execute Disable) protection: active

As we are not sure how serious the issue is, we will wait until a fix is out. Thanks for all the help again.
Has this caused any issues for you? Wondering if safe to ignore...
 
We did not put many VMs on the host which was showing this message and there was not high load on the system also. But from these few days I would say that there was no issue (but I can also not say if this RIP: 0010:kvm_nx_huge_page_recovery_worker meant that the thread was dead and is causing a resource leak etc)

One test we never tried was to use echo "never" > /sys/kernel/mm/transparent_hugepage/enabled right after the reboot to disable hugepage and the message (without knowing what this would cause for a performance implication)
 
Last edited:
  • Like
Reactions: davemcl
I just upgraded my host from 7.4 to 8 and now I'm getting this same error a few minutes after boot, I assume when starting the 2 VMs (plus 11 LXCs). I did not see this error in 7.4 and I was running the optional 6.2 in that version as well.
 
Seeing same behavior on 6.2.16-3.

Code:
Jul 02 14:34:07 maverick kernel: ------------[ cut here ]------------
Jul 02 14:34:07 maverick kernel: WARNING: CPU: 21 PID: 26736 at arch/x86/kvm/mmu/mmu.c:6949 kvm_nx_huge_page_recovery_worker+0x3c4/0x410 [kvm]
Jul 02 14:34:07 maverick kernel: Modules linked in: nf_tables veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables scsi_transport_iscsi iptable_filter bpfilter nvme_fabrics sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit x86_pkg_temp_thermal coretemp kvm_intel snd_hda_codec_hdmi kvm irqbypass crct10dif_pclmul snd_hda_intel polyval_clmulni snd_intel_dspcfg polyval_generic ghash_clmulni_intel snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm sha512_ssse3 aesni_intel crypto_simd cryptd cmdlinepart snd_timer rapl acpi_ipmi snd mei_me spi_nor intel_cstate pcspkr ast soundcore ipmi_si mtd drm_shmem_helper mei ioatdma intel_pch_thermal ipmi_devintf ipmi_msghandler joydev input_leds acpi_power_meter mac_hid vhost_net vhost vhost_iotlb tap nvidia_uvm(PO) nvidia_drm(PO) nvidia_modeset(PO) video wmi drm_kms_helper syscopyarea sysfillrect
Jul 02 14:34:07 maverick kernel:  sysimgblt nvidia(PO) efi_pstore drm dmi_sysfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor raid6_pq libcrc32c simplefb hid_generic ses enclosure usbmouse usbkbd usbhid hid mpt3sas xhci_pci xhci_pci_renesas raid_class igb nvme crc32_pclmul scsi_transport_sas spi_intel_pci nvme_core i2c_i801 i2c_algo_bit ahci spi_intel i2c_smbus lpc_ich nvme_common dca xhci_hcd libahci
Jul 02 14:34:07 maverick kernel: CPU: 21 PID: 26736 Comm: kvm-nx-lpage-re Tainted: P           O       6.2.16-3-pve #1
Jul 02 14:34:07 maverick kernel: Hardware name: Supermicro Super Server/X11SPL-F, BIOS 3.9 03/15/2023
Jul 02 14:34:07 maverick kernel: RIP: 0010:kvm_nx_huge_page_recovery_worker+0x3c4/0x410 [kvm]
Jul 02 14:34:07 maverick kernel: Code: ff 48 8b 45 c0 4c 39 e0 0f 85 e6 fd ff ff 48 89 df e8 e0 e7 f9 ff e9 ed fd ff ff 49 bc ff ff ff ff ff ff ff 7f e9 c6 fc ff ff <0f> 0b e9 01 ff ff ff 48 8b 45 d0 65 48 2b 04 25 28 00 00 00 75 27
Jul 02 14:34:07 maverick kernel: RSP: 0018:ffffb6b60ec53e40 EFLAGS: 00010246
Jul 02 14:34:07 maverick kernel: RAX: 0000000000000000 RBX: ffffb6b60ec3d000 RCX: 0000000000000000
Jul 02 14:34:07 maverick kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 02 14:34:07 maverick kernel: RBP: ffffb6b60ec53ec0 R08: 0000000000000000 R09: 0000000000000000
Jul 02 14:34:07 maverick kernel: R10: ffff9cf4717d7088 R11: 0000000000000000 R12: ffffb6b60ec53e80
Jul 02 14:34:07 maverick kernel: R13: 0000000000000001 R14: 0000000000000004 R15: ffff9cf4717d7118
Jul 02 14:34:07 maverick kernel: FS:  0000000000000000(0000) GS:ffff9d4e40b40000(0000) knlGS:0000000000000000
Jul 02 14:34:07 maverick kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 02 14:34:07 maverick kernel: CR2: 00007f407d94b8c8 CR3: 00000002565fa005 CR4: 00000000007726e0
Jul 02 14:34:07 maverick kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 02 14:34:07 maverick kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 02 14:34:07 maverick kernel: PKRU: 55555554
Jul 02 14:34:07 maverick kernel: Call Trace:
Jul 02 14:34:07 maverick kernel:  <TASK>
Jul 02 14:34:07 maverick kernel:  ? __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
Jul 02 14:34:07 maverick kernel:  kvm_vm_worker_thread+0x9d/0x1b0 [kvm]
Jul 02 14:34:07 maverick kernel:  ? __pfx_kvm_vm_worker_thread+0x10/0x10 [kvm]
Jul 02 14:34:07 maverick kernel:  kthread+0xe6/0x110
Jul 02 14:34:07 maverick kernel:  ? __pfx_kthread+0x10/0x10
Jul 02 14:34:07 maverick kernel:  ret_from_fork+0x29/0x50
Jul 02 14:34:07 maverick kernel:  </TASK>
Jul 02 14:34:07 maverick kernel: ---[ end trace 0000000000000000 ]---
 
Last edited:
Hi,

can you open a bugzilla.proxmox.com ?

I can try to backport the small patch and do build of th kernel.

Are you able to test if I send a patched deb ? (Not sure if you are in production, but I can't give any guarantee.)

The patch is only 4 lines changes, so I don't think it could break something.
 
I've built the newest pve kernel (6.2.16-3-pve) with the mentioned patch and can confirm that it works,
error message is dissapered.
 
  • Like
Reactions: AddiCgn
Good news that the patch will fix it.

So, any way to test/use this "fixed" kernel (as an opt-in?) or do I need to rebuild the kernel myself (which I never did before..)

Thanks.
 
I have the same problem and I'm also not sure how to achieve or apply the mentioned kernel patch. I opened a bugzilla report: HERE