Opt-in Linux 6.11 Kernel for Proxmox VE 8 available on test & no-subscription

t.lamprecht · Nov 2, 2024

We recently uploaded a 6.11 kernel into our repositories. The current 6.8 kernel will stay the default on the Proxmox VE 8 series, the newly introduced 6.11 kernel is an option.
The 6.11 based kernel may be useful for some (especially newer) setups, for example if there is improved hardware support that has not yet been backported to 6.8.
This follows our tradition of upgrading the Proxmox VE kernel to match the current Ubuntu version until we reach an (Ubuntu) LTS release, like the 6.8 kernel is, and then provide newer kernels as opt-in. The 6.11 kernel is based on the Ubuntu 24.10 Oracular release.

We have run this kernel on some parts of our test setups over the last few days without any notable issues, for production setups we still recommend keeping the 6.8 based kernel, or test on similar hardware/setups before moving all your production nodes up to 6.11.

How to install:

Ensure that either the pve-no-subscription or pvetest repository is set up correctly.
You can do so via CLI text-editor or using the web UI under Node -> Repositories.
Open a shell as root, e.g. through SSH or using the integrated shell on the web UI.
apt update
apt install proxmox-kernel-6.11
reboot

Future updates to the 6.11 kernel will now be installed automatically when upgrading a node.

Please note:

The current 6.8 kernel is still supported and will stay the default kernel.
There were many changes, for improved hardware support and performance improvements all over the place.
For a good overview of prominent changes, we recommend checking out the kernel-newbies site for 6.9, 6.10, and 6.11 (in progress).
For those depending on Realtek's r8125 out-of-tree driver, we also uploaded a newer r8125-dkms package in version 9.013.02-1~bpo12+1 to fix support for that driver when used with 6.8+ kernels.
The kernel is also available on the test and no-subscription repositories of Proxmox Backup Server and Proxmox Mail Gateway.
If you're unsure, we recommend continuing to use the 6.8-based kernel for now.

Feedback about how the new kernel performs in any of your setups is welcome!
Please provide basic details like CPU model, storage types used, ZFS as root file system, and the like, for both positive feedback or if you ran into some issues, where using the opt-in 6.11 kernel seems to be the likely cause.

toe · Nov 2, 2024

Three machines upgraded

Zen 2 and 3 (One Epyc) - ZFS as root and datapools.

No problems so far. All went smoothly.

ram75 · Nov 2, 2024

hola, después de instalar el kernel 6.11 no inician los LXC, probé generar uno nuevo pero tuve el mismo error

cgfsng_setup_limits_legacy: 3442 No such file or directory - Failed to set "memory.limit_in_bytes" to "536870912"
lxc_spawn: 1802 Failed to setup cgroup limits for container "122"
TASK ERROR: startup for container '122' failed

t.lamprecht · Nov 2, 2024

ram75 said:
hola, después de instalar el kernel 6.11 no inician los LXC, probé generar uno nuevo pero tuve el mismo error

cgfsng_setup_limits_legacy: 3442 No such file or directory - Failed to set "memory.limit_in_bytes" to "536870912"
lxc_spawn: 1802 Failed to setup cgroup limits for container "122"
TASK ERROR: startup for container '122' failed

Can you please open a new thread and post the container config there (pct config VMID)?

vesalius · Nov 2, 2024

Small 3 node cluster. ZFS on root. Ceph 18.2. Have x553 SFP+ based networking and no issues so far.

bbgeek17 · Nov 3, 2024

We will get it into our QA pipeline asap!

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Firebat · Nov 3, 2024

t.lamprecht said:
For those depending on Realtek's r8125 out-of-tree driver, we also uploaded a newer r8125-dkms package in version 9.013.02-1~bpo12+1 to fix support for that driver when used with 6.8+ kernels.

Does it support r8126, too?

Seed · Nov 3, 2024

I updated to the 6.11 kernel and my Intel® Killer™ E5000B 5G LAN r8126 isn't detected. I also installed the dkms package and no go as well. pve 8.2.7. I'm on a very new z890 chipset mother board with Intel Ultra 265k

Code:

        *-network UNCLAIMED
             description: Ethernet controller
             product: Realtek Semiconductor Co., Ltd.
             vendor: Realtek Semiconductor Co., Ltd.
             physical id: 0
             bus info: pci@0000:84:00.0
             version: 04
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi pciexpress msix vpd cap_list
             configuration: latency=0
             resources: ioport:5000(size=256) memory:b8800000-b880ffff memory:b8810000-b8813fff

I was able to get the NIC working, it's techinicall the: (Intel® Killer™ E5000B 5G LAN r8126) << for search doing the following:

grab the driver here:

https://www.realtek.com/Download/List?cate_id=584 (5G Ethernet LINUX driver r8126 for kernel up to 6.4)

or

https://www.realtek.com/Download/ToDownload?type=direct&downloadid=4445

move the source to your proxmox host, chmod executable the autorun.sh and let er rip. I did this on the newer 6.11 kernel and it works great.

Code:

       *-network DISABLED
             description: Ethernet interface
             product: Realtek Semiconductor Co., Ltd.
             vendor: Realtek Semiconductor Co., Ltd.
             physical id: 0
             bus info: pci@0000:84:00.0
             logical name: enp132s0
             version: 04
             serial: 34:5a:60:07:1d:ae
             capacity: 1Gbit/s
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi pciexpress msix vpd cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=r8126 driverversion=10.014.01-NAPI latency=0 link=no multicast=yes port=twisted pair
             resources: irq:18 ioport:5000(size=256) memory:b8800000-b880ffff memory:b8810000-b8813fff

I have a feeling maybe, the module didn't load after installing the package noted here. Not sure though, as lsmod | grep r8126 was empty after installing the dkms driver, but is now showing when built from source.

Code:

lsmod | grep r8126
r8126                 204800  0

mac.linux.free · Nov 4, 2024

4 hosts upgraded mostly AMD Epyc. No problems so far.

fiona · Nov 4, 2024

ram75 said:
hola, después de instalar el kernel 6.11 no inician los LXC, probé generar uno nuevo pero tuve el mismo error

cgfsng_setup_limits_legacy: 3442 No such file or directory - Failed to set "memory.limit_in_bytes" to "536870912"
lxc_spawn: 1802 Failed to setup cgroup limits for container "122"
TASK ERROR: startup for container '122' failed

t.lamprecht said:
Can you please open a new thread and post the container config there (pct config VMID)?

For reference: https://forum.proxmox.com/threads/linux-6-11-kernel-failure-to-start-lxc.156830/

ram75 · Nov 4, 2024

kernel 6.11 OK on 6 IBM blade

proxmox-ve: 8.2.0 (running kernel: 6.11.0-1-pve)
pve-manager: 8.2.7 (running version: 8.2.7/3e0176e6bb2ade3b)

blade1: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz (2 Sockets)
blade2: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz (2 Sockets)
blade3: Intel(R) Xeon(R) CPU E5540 @ 2.53GHz (2 Sockets)
blade4: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz (2 Sockets)
blade5: Intel(R) Xeon(R) CPU X5670 @ 2.93GHz (2 Sockets)

PBS 3.2-7 : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz (2 Sockets)

SCSI storage controller Broadcom/LSI SAS1064ET PCI-Express Fusion-MPT SAS (rev 10) -- fails in kernel-6.8
all hosts with LVM storage

pschneider1968 · Nov 5, 2024

With the new kernel 6.11.0-1-pve, I see hung tasks in the dmesg output of my Proxmox host, causing hangs and disconnected RDP sessions to my VMs.

E.g. like this one on the host:

Code:

[10201.836445] INFO: task kcompactd0:317 blocked for more than 122 seconds.
[10201.836455]       Tainted: P           O       6.11.0-1-pve #1
[10201.836457] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10201.836458] task:kcompactd0      state:D stack:0     pid:317   tgid:317   ppid:2      flags:0x00004000
[10201.836464] Call Trace:
[10201.836468]  <TASK>
[10201.836473]  __schedule+0x400/0x15d0
[10201.836489]  schedule+0x29/0x130
[10201.836492]  io_schedule+0x4c/0x80
[10201.836495]  folio_wait_bit_common+0x138/0x310
[10201.836504]  ? __pfx_wake_page_function+0x10/0x10
[10201.836508]  folio_wait_bit+0x18/0x30
[10201.836512]  folio_wait_writeback+0x2b/0xa0
[10201.836515]  nfs_wb_folio+0xa5/0x1f0 [nfs]
[10201.836579]  nfs_release_folio+0x75/0x140 [nfs]
[10201.836606]  filemap_release_folio+0x68/0xa0
[10201.836609]  split_huge_page_to_list_to_order+0x1f1/0xea0
[10201.836615]  migrate_pages_batch+0x580/0xce0
[10201.836619]  ? __pfx_compaction_alloc+0x10/0x10
[10201.836624]  ? __pfx_compaction_free+0x10/0x10
[10201.836627]  ? __mod_memcg_lruvec_state+0x9f/0x190
[10201.836631]  ? __pfx_compaction_free+0x10/0x10
[10201.836633]  migrate_pages+0xabb/0xd50
[10201.836636]  ? __pfx_compaction_free+0x10/0x10
[10201.836638]  ? __pfx_compaction_alloc+0x10/0x10
[10201.836642]  compact_zone+0xad3/0x1140
[10201.836646]  compact_node+0xa4/0x120
[10201.836651]  kcompactd+0x2cf/0x460
[10201.836654]  ? __pfx_autoremove_wake_function+0x10/0x10
[10201.836658]  ? __pfx_kcompactd+0x10/0x10
[10201.836661]  kthread+0xe4/0x110
[10201.836664]  ? __pfx_kthread+0x10/0x10
[10201.836666]  ret_from_fork+0x47/0x70
[10201.836677]  ? __pfx_kthread+0x10/0x10
[10201.836679]  ret_from_fork_asm+0x1a/0x30
[10201.836683]  </TASK>
[10201.836783] INFO: task task UPID:linus:96733 blocked for more than 122 seconds.
[10201.836785]       Tainted: P           O       6.11.0-1-pve #1
[10201.836786] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10201.836787] task:task UPID:linus state:D stack:0     pid:96733 tgid:96733 ppid:2110   flags:0x00000002
[10201.836790] Call Trace:
[10201.836791]  <TASK>
[10201.836792]  __schedule+0x400/0x15d0
[10201.836797]  schedule+0x29/0x130
[10201.836800]  io_schedule+0x4c/0x80
[10201.836803]  folio_wait_bit_common+0x138/0x310
[10201.836807]  ? __pfx_wake_page_function+0x10/0x10
[10201.836809]  __folio_lock+0x17/0x30
[10201.836813]  writeback_iter+0x1ee/0x2d0
[10201.836815]  ? __pfx_nfs_writepages_callback+0x10/0x10 [nfs]
[10201.836860]  write_cache_pages+0x4c/0xb0
[10201.836863]  nfs_writepages+0x17b/0x310 [nfs]
[10201.836890]  ? crypto_shash_update+0x19/0x30
[10201.836895]  ? ext4_inode_csum+0x1f8/0x270
[10201.836914]  do_writepages+0x7e/0x270
[10201.836917]  ? jbd2_journal_stop+0x155/0x2f0
[10201.836922]  filemap_fdatawrite_wbc+0x75/0xb0
[10201.836924]  __filemap_fdatawrite_range+0x6d/0xa0
[10201.836929]  filemap_write_and_wait_range+0x59/0xc0
[10201.836932]  nfs_wb_all+0x27/0x120 [nfs]
[10201.836960]  nfs4_file_flush+0x7b/0xd0 [nfsv4]
[10201.837018]  filp_flush+0x38/0x90
[10201.837021]  __x64_sys_close+0x33/0x90
[10201.837023]  x64_sys_call+0x1a84/0x24e0
[10201.837026]  do_syscall_64+0x7e/0x170
[10201.837032]  ? __f_unlock_pos+0x12/0x20
[10201.837035]  ? ksys_write+0xd9/0x100
[10201.837039]  ? syscall_exit_to_user_mode+0x4e/0x250
[10201.837043]  ? do_syscall_64+0x8a/0x170
[10201.837046]  ? syscall_exit_to_user_mode+0x4e/0x250
[10201.837049]  ? do_syscall_64+0x8a/0x170
[10201.837051]  ? ptep_set_access_flags+0x4a/0x70
[10201.837057]  ? wp_page_reuse+0x97/0xc0
[10201.837059]  ? do_wp_page+0x84b/0xb90
[10201.837062]  ? __pte_offset_map+0x1c/0x1b0
[10201.837067]  ? __handle_mm_fault+0xbdc/0x1120
[10201.837071]  ? __count_memcg_events+0x7d/0x130
[10201.837074]  ? count_memcg_events.constprop.0+0x2a/0x50
[10201.837077]  ? handle_mm_fault+0xaf/0x2e0
[10201.837080]  ? do_user_addr_fault+0x5ec/0x830
[10201.837083]  ? irqentry_exit_to_user_mode+0x43/0x250
[10201.837086]  ? irqentry_exit+0x43/0x50
[10201.837088]  ? exc_page_fault+0x96/0x1e0
[10201.837105]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[10201.837109] RIP: 0033:0x71103703a8e0
[10201.837122] RSP: 002b:00007fff544adda8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[10201.837124] RAX: ffffffffffffffda RBX: 000059c36bd312a0 RCX: 000071103703a8e0
[10201.837125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000013
[10201.837127] RBP: 0000000000000013 R08: 0000000000000000 R09: 0000000000000000
[10201.837128] R10: 0000000000000000 R11: 0000000000000202 R12: 000059c37401c890
[10201.837129] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[10201.837131]  </TASK>
[10201.837134] INFO: task zstd:98699 blocked for more than 122 seconds.

[...]

[12536.577551] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

causing something like this in e.g. a VM with Redhat Enterprise Linux:

Code:

[root@rh92 ~]# dnf update
Subscription Management Repositorys werden aktualisiert.

Message from syslogd@rh92 at Nov  5 09:51:51 ...
 kernel:Uhhuh. NMI received for unknown reason 30 on CPU 0.

Message from syslogd@rh92 at Nov  5 09:51:51 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@rh92 at Nov  5 09:51:51 ...
 kernel:Dazed and confused, but trying to continue

Message from syslogd@rh92 at Nov  5 10:01:48 ...
 kernel:Uhhuh. NMI received for unknown reason 20 on CPU 1.

Message from syslogd@rh92 at Nov  5 10:01:48 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@rh92 at Nov  5 10:01:48 ...
 kernel:Dazed and confused, but trying to continue

Message from syslogd@rh92 at Nov  5 10:02:09 ...
 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [in:imjournal:1051]

Message from syslogd@rh92 at Nov  5 10:02:29 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:03:49 ...
 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 33s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:04:14 ...
 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:04:42 ...
 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:06:07 ...
 kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 28s! [khugepaged:51]
Letzte Prüfung auf abgelaufene Metadaten: vor 0:45:38 am Di 05 Nov 2024 09:28:25 CET.
Abhängigkeiten sind aufgelöst.
==========================================================================================================================================================================
 Paket                                   Architektur                Version                                    Paketquelle                                          Größe
==========================================================================================================================================================================
Installieren:
 kernel                                  x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.6 M
 kernel-core                             x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                         19 M
 kernel-modules                          x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                         38 M
 kernel-modules-core                     x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                         33 M
Aktualisieren:
 bpftool                                 x86_64                     7.3.0-427.42.1.el9_4                       rhel-9-for-x86_64-baseos-rpms                        5.4 M
 firefox                                 x86_64                     128.4.0-1.el9_4                            rhel-9-for-x86_64-appstream-rpms                     123 M
 kernel-headers                          x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-appstream-rpms                     6.3 M
 kernel-tools                            x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.8 M
 kernel-tools-libs                       x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.6 M
 python3-perf                            x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.7 M
 thunderbird                             x86_64                     128.4.0-1.el9_4                            rhel-9-for-x86_64-appstream-rpms                     118 M
 tzdata                                  noarch                     2024b-2.el9                                rhel-9-for-x86_64-baseos-rpms                        841 k
 tzdata-java                             noarch                     2024b-2.el9                                rhel-9-for-x86_64-appstream-rpms                     228 k
Entfernen:
 kernel                                  x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                         0
 kernel-core                             x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                        64 M
 kernel-modules                          x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                        33 M
 kernel-modules-core                     x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                        27 M

Transaktionszusammenfassung
==========================================================================================================================================================================
Installieren   4 Pakete
Aktualisieren  9 Pakete
Entfernen      4 Pakete

Gesamte Downloadgröße: 362 M
Ist dies in Ordnung? [j/N]:

My host is an older server machine: 2-socket Ivy Bridge Xeon E5-2697 v2 (24C/48T) in an Asus Z9PE-D16/2L motherboard (Intel C-602A chipset); BIOS patched to the latest available from Asus. All memory slots occupied, so 256 GB RAM in total.

At the time of this happening, my workload is 17 VMs running, and I am performing a Proxmox VM backup (to NFS mounted storage) of the Linux VMs of this server (hence the ZSTD process in the hung task report above).

Code:

root@linus:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 black                running    8192              50.00 98672
       101 w11                  running    16384            300.00 8361
       102 w10                  running    16384            300.00 8021
       103 srv19e               running    16384            256.00 2917
       104 srv22                running    16384            256.00 2866
       105 ucs50                running    16384             80.00 103582
       106 kali                 running    8192              50.00 116175
       107 fed41-xfce           running    16384             80.00 126410
       108 emcc                 running    24576             80.00 158724
       109 rh92                 running    16384             64.00 173195
       110 db1                  running    24576             80.00 177487
       111 db2                  running    24576             80.00 194532
       112 ora-appsrv           running    24576             80.00 4859
       113 ora-VM-tmpl          stopped    24576             80.00 0
       114 fed41-mate           running    16384             80.00 4623
       115 fed41-kde            running    16384             80.00 4206
       116 srv25                running    16384            256.00 3046
       117 osus-tumble          running    8192              64.00 3679
root@linus:~#

King Tiger · Nov 6, 2024

Testserver: CPU(s) 12 x Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz (1 Socket)
PVE 8.2.8
ZFS
One Windows 2012r2 VM running oke on kernel 6.11.0-1

jason.houston · Nov 11, 2024

I have gotten quite a few of the random lockups on 6.8.x kernel so I'm anxious to test this one. I have installed the 6.11 kernel on one host in my three-host cluster, it is stable so far other than one CT which will not properly start. I will look into that issue and share any further details.

Sniper86 · Nov 11, 2024

pschneider1968 said:

With the new kernel 6.11.0-1-pve, I see hung tasks in the dmesg output of my Proxmox host, causing hangs and disconnected RDP sessions to my VMs.

E.g. like this one on the host:

Code:

[10201.836445] INFO: task kcompactd0:317 blocked for more than 122 seconds.
[10201.836455]       Tainted: P           O       6.11.0-1-pve #1
[10201.836457] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10201.836458] task:kcompactd0      state:D stack:0     pid:317   tgid:317   ppid:2      flags:0x00004000
[10201.836464] Call Trace:
[10201.836468]  <TASK>
[10201.836473]  __schedule+0x400/0x15d0
[10201.836489]  schedule+0x29/0x130
[10201.836492]  io_schedule+0x4c/0x80
[10201.836495]  folio_wait_bit_common+0x138/0x310
[10201.836504]  ? __pfx_wake_page_function+0x10/0x10
[10201.836508]  folio_wait_bit+0x18/0x30
[10201.836512]  folio_wait_writeback+0x2b/0xa0
[10201.836515]  nfs_wb_folio+0xa5/0x1f0 [nfs]
[10201.836579]  nfs_release_folio+0x75/0x140 [nfs]
[10201.836606]  filemap_release_folio+0x68/0xa0
[10201.836609]  split_huge_page_to_list_to_order+0x1f1/0xea0
[10201.836615]  migrate_pages_batch+0x580/0xce0
[10201.836619]  ? __pfx_compaction_alloc+0x10/0x10
[10201.836624]  ? __pfx_compaction_free+0x10/0x10
[10201.836627]  ? __mod_memcg_lruvec_state+0x9f/0x190
[10201.836631]  ? __pfx_compaction_free+0x10/0x10
[10201.836633]  migrate_pages+0xabb/0xd50
[10201.836636]  ? __pfx_compaction_free+0x10/0x10
[10201.836638]  ? __pfx_compaction_alloc+0x10/0x10
[10201.836642]  compact_zone+0xad3/0x1140
[10201.836646]  compact_node+0xa4/0x120
[10201.836651]  kcompactd+0x2cf/0x460
[10201.836654]  ? __pfx_autoremove_wake_function+0x10/0x10
[10201.836658]  ? __pfx_kcompactd+0x10/0x10
[10201.836661]  kthread+0xe4/0x110
[10201.836664]  ? __pfx_kthread+0x10/0x10
[10201.836666]  ret_from_fork+0x47/0x70
[10201.836677]  ? __pfx_kthread+0x10/0x10
[10201.836679]  ret_from_fork_asm+0x1a/0x30
[10201.836683]  </TASK>
[10201.836783] INFO: task task UPID:linus:96733 blocked for more than 122 seconds.
[10201.836785]       Tainted: P           O       6.11.0-1-pve #1
[10201.836786] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10201.836787] task:task UPID:linus state:D stack:0     pid:96733 tgid:96733 ppid:2110   flags:0x00000002
[10201.836790] Call Trace:
[10201.836791]  <TASK>
[10201.836792]  __schedule+0x400/0x15d0
[10201.836797]  schedule+0x29/0x130
[10201.836800]  io_schedule+0x4c/0x80
[10201.836803]  folio_wait_bit_common+0x138/0x310
[10201.836807]  ? __pfx_wake_page_function+0x10/0x10
[10201.836809]  __folio_lock+0x17/0x30
[10201.836813]  writeback_iter+0x1ee/0x2d0
[10201.836815]  ? __pfx_nfs_writepages_callback+0x10/0x10 [nfs]
[10201.836860]  write_cache_pages+0x4c/0xb0
[10201.836863]  nfs_writepages+0x17b/0x310 [nfs]
[10201.836890]  ? crypto_shash_update+0x19/0x30
[10201.836895]  ? ext4_inode_csum+0x1f8/0x270
[10201.836914]  do_writepages+0x7e/0x270
[10201.836917]  ? jbd2_journal_stop+0x155/0x2f0
[10201.836922]  filemap_fdatawrite_wbc+0x75/0xb0
[10201.836924]  __filemap_fdatawrite_range+0x6d/0xa0
[10201.836929]  filemap_write_and_wait_range+0x59/0xc0
[10201.836932]  nfs_wb_all+0x27/0x120 [nfs]
[10201.836960]  nfs4_file_flush+0x7b/0xd0 [nfsv4]
[10201.837018]  filp_flush+0x38/0x90
[10201.837021]  __x64_sys_close+0x33/0x90
[10201.837023]  x64_sys_call+0x1a84/0x24e0
[10201.837026]  do_syscall_64+0x7e/0x170
[10201.837032]  ? __f_unlock_pos+0x12/0x20
[10201.837035]  ? ksys_write+0xd9/0x100
[10201.837039]  ? syscall_exit_to_user_mode+0x4e/0x250
[10201.837043]  ? do_syscall_64+0x8a/0x170
[10201.837046]  ? syscall_exit_to_user_mode+0x4e/0x250
[10201.837049]  ? do_syscall_64+0x8a/0x170
[10201.837051]  ? ptep_set_access_flags+0x4a/0x70
[10201.837057]  ? wp_page_reuse+0x97/0xc0
[10201.837059]  ? do_wp_page+0x84b/0xb90
[10201.837062]  ? __pte_offset_map+0x1c/0x1b0
[10201.837067]  ? __handle_mm_fault+0xbdc/0x1120
[10201.837071]  ? __count_memcg_events+0x7d/0x130
[10201.837074]  ? count_memcg_events.constprop.0+0x2a/0x50
[10201.837077]  ? handle_mm_fault+0xaf/0x2e0
[10201.837080]  ? do_user_addr_fault+0x5ec/0x830
[10201.837083]  ? irqentry_exit_to_user_mode+0x43/0x250
[10201.837086]  ? irqentry_exit+0x43/0x50
[10201.837088]  ? exc_page_fault+0x96/0x1e0
[10201.837105]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[10201.837109] RIP: 0033:0x71103703a8e0
[10201.837122] RSP: 002b:00007fff544adda8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
[10201.837124] RAX: ffffffffffffffda RBX: 000059c36bd312a0 RCX: 000071103703a8e0
[10201.837125] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000013
[10201.837127] RBP: 0000000000000013 R08: 0000000000000000 R09: 0000000000000000
[10201.837128] R10: 0000000000000000 R11: 0000000000000202 R12: 000059c37401c890
[10201.837129] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[10201.837131]  </TASK>
[10201.837134] INFO: task zstd:98699 blocked for more than 122 seconds.

[...]

[12536.577551] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

causing something like this in e.g. a VM with Redhat Enterprise Linux:

Code:

[root@rh92 ~]# dnf update
Subscription Management Repositorys werden aktualisiert.

Message from syslogd@rh92 at Nov  5 09:51:51 ...
 kernel:Uhhuh. NMI received for unknown reason 30 on CPU 0.

Message from syslogd@rh92 at Nov  5 09:51:51 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@rh92 at Nov  5 09:51:51 ...
 kernel:Dazed and confused, but trying to continue

Message from syslogd@rh92 at Nov  5 10:01:48 ...
 kernel:Uhhuh. NMI received for unknown reason 20 on CPU 1.

Message from syslogd@rh92 at Nov  5 10:01:48 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@rh92 at Nov  5 10:01:48 ...
 kernel:Dazed and confused, but trying to continue

Message from syslogd@rh92 at Nov  5 10:02:09 ...
 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [in:imjournal:1051]

Message from syslogd@rh92 at Nov  5 10:02:29 ...
 kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:03:49 ...
 kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 33s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:04:14 ...
 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:04:42 ...
 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [khugepaged:51]

Message from syslogd@rh92 at Nov  5 10:06:07 ...
 kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 28s! [khugepaged:51]
Letzte Prüfung auf abgelaufene Metadaten: vor 0:45:38 am Di 05 Nov 2024 09:28:25 CET.
Abhängigkeiten sind aufgelöst.
==========================================================================================================================================================================
 Paket                                   Architektur                Version                                    Paketquelle                                          Größe
==========================================================================================================================================================================
Installieren:
 kernel                                  x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.6 M
 kernel-core                             x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                         19 M
 kernel-modules                          x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                         38 M
 kernel-modules-core                     x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                         33 M
Aktualisieren:
 bpftool                                 x86_64                     7.3.0-427.42.1.el9_4                       rhel-9-for-x86_64-baseos-rpms                        5.4 M
 firefox                                 x86_64                     128.4.0-1.el9_4                            rhel-9-for-x86_64-appstream-rpms                     123 M
 kernel-headers                          x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-appstream-rpms                     6.3 M
 kernel-tools                            x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.8 M
 kernel-tools-libs                       x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.6 M
 python3-perf                            x86_64                     5.14.0-427.42.1.el9_4                      rhel-9-for-x86_64-baseos-rpms                        4.7 M
 thunderbird                             x86_64                     128.4.0-1.el9_4                            rhel-9-for-x86_64-appstream-rpms                     118 M
 tzdata                                  noarch                     2024b-2.el9                                rhel-9-for-x86_64-baseos-rpms                        841 k
 tzdata-java                             noarch                     2024b-2.el9                                rhel-9-for-x86_64-appstream-rpms                     228 k
Entfernen:
 kernel                                  x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                         0
 kernel-core                             x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                        64 M
 kernel-modules                          x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                        33 M
 kernel-modules-core                     x86_64                     5.14.0-427.35.1.el9_4                      @rhel-9-for-x86_64-baseos-rpms                        27 M

Transaktionszusammenfassung
==========================================================================================================================================================================
Installieren   4 Pakete
Aktualisieren  9 Pakete
Entfernen      4 Pakete

Gesamte Downloadgröße: 362 M
Ist dies in Ordnung? [j/N]:

My host is an older server machine: 2-socket Ivy Bridge Xeon E5-2697 v2 (24C/48T) in an Asus Z9PE-D16/2L motherboard (Intel C-602A chipset); BIOS patched to the latest available from Asus. All memory slots occupied, so 256 GB RAM in total.

At the time of this happening, my workload is 17 VMs running, and I am performing a Proxmox VM backup (to NFS mounted storage) of the Linux VMs of this server (hence the ZSTD process in the hung task report above).

Code:

root@linus:~# qm list
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       100 black                running    8192              50.00 98672
       101 w11                  running    16384            300.00 8361
       102 w10                  running    16384            300.00 8021
       103 srv19e               running    16384            256.00 2917
       104 srv22                running    16384            256.00 2866
       105 ucs50                running    16384             80.00 103582
       106 kali                 running    8192              50.00 116175
       107 fed41-xfce           running    16384             80.00 126410
       108 emcc                 running    24576             80.00 158724
       109 rh92                 running    16384             64.00 173195
       110 db1                  running    24576             80.00 177487
       111 db2                  running    24576             80.00 194532
       112 ora-appsrv           running    24576             80.00 4859
       113 ora-VM-tmpl          stopped    24576             80.00 0
       114 fed41-mate           running    16384             80.00 4623
       115 fed41-kde            running    16384             80.00 4206
       116 srv25                running    16384            256.00 3046
       117 osus-tumble          running    8192              64.00 3679
root@linus:~#

Same hungs here with a Windows 11 VM. After few days without resolving the issue I rolled back to 6.8.12-3.
I'll wait for a mature release.

My host is a Supermicro sys-6018u-tr4t+ with 2x E5-2697A v4, 128GB ram, Nvidia P100 16Gb. As storage I've a separate server running Truenas 24.10 with all vdisks on full NVME pool connected with NFS.
The W11 is using the vGPU with P100-2B profile. All the other VMs on the same host were running Debian without noticed problems.

bbgeek17 · Nov 11, 2024

bbgeek17 said:
We will get it into our QA pipeline asap!

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

We've had PVE8 with QEMU9 and Kernel 6.11 in our QA infrastructure for over a week and have not experienced any issues. Keep in mind, that our testing is focused on storage-related functions with iSCSI and NVMe/TCP.

At this time, we are running PVE6, PVE7, PVE8+QEMU8, PVE8+QEMU9, and PVE8+QEMU9+Kernel6.11. All without issues.

Thank you for the great work PVE team!

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

jason.houston · Nov 11, 2024

jason.houston said:
I have gotten quite a few of the random lockups on 6.8.x kernel so I'm anxious to test this one. I have installed the 6.11 kernel on one host in my three-host cluster, it is stable so far other than one CT which will not properly start. I will look into that issue and share any further details.

Any known issues with running a CT from NFS storage with the 6.11 kernel? It seems my CT issue is isolated to one that is running from NFS storage on Truenas.

ivenae · Nov 11, 2024

Intel i3-10110U with
USB-C Ethernet based on : 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
and ZFS

upgrades with no issues so far.

Uturn · Nov 12, 2024

Updated to 6.11 about four days ago and see zero issues here. Everything runs smooth.

Systems:

EPYC 7402p (Zen2) DDR4
EPYC 9474F (Zen4) DDR5

VMs:

mostly OpenBSD
Linux
Windows 10

Configuration:

HA
ZFS Pool
Backup via Proxmox Backup

funtowne · Nov 12, 2024

Installed kernel 6.11 across a few different hosts without issues, but I am not doing anything too fancy beyond the callouts below:

Compulab Fitlet2 J3455 -- running opnsense as a VM
Supermicro X10SDV-4C+-TLN4F -- SATA controller passththrough to Openmediavalut VM
Supermicro A2SDi-2C-HLN4F -- SATA controller passthrough to Openmediavault VM

The hosts seem a bit snappier, but that may just be the side effect of a reboot. Regardless, delightfully uneventful!

Opt-in Linux 6.11 Kernel for Proxmox VE 8 available on test & no-subscription

Proxmox Staff Member

Member

New Member

Proxmox Staff Member

Renowned Member

Distinguished Member

Active Member

Renowned Member

Renowned Member

Proxmox Staff Member

New Member

Member

New Member

Member

New Member

Distinguished Member

Member

Member

Member

Member

We value your privacy