Slow memory leak in 6.8.12-13-pve

Just to confirm what others are seeing, the 6.14.11-2-ice-fix-1-pve kernel shows a similar rate of buffer_head slab leaking as other 6.14 kernels (with 9k MTU ceph on an ice card).

Might be worth testing the regular 6.14.11-3-pve (2025-09-22T10:13Z) kernel instead of the -ice-fix builds, since it already carries relevant net/mm changes. I’ve since moved from Intel E810 to Mellanox, so can’t verify current ice behavior myself.
 
Might be worth testing the regular 6.14.11-3-pve (2025-09-22T10:13Z) kernel instead of the -ice-fix builds, since it already carries relevant net/mm changes. I’ve since moved from Intel E810 to Mellanox, so can’t verify current ice behavior myself.
As I posted last Thursday, I tested that as well with no change. That kernel does *not* contain the "ice: fix Rx page leak on multi-buffer frames" fix (which is the -ice-fix builds), though of course those also don't fix the issue.
 
Apparently I spoke too soon on the -ice-fix builds. The buffer_head slab allocations do rise for the first few hours, but eventually seem to flatten out. I see something like 300MB of used buffer_head slab allocations (per slabtop) which of course seems pretty excessive, but total system memory usage appears to be flat over the last six hours with the 6.14 ice-fix build.asdf.png
 
Last edited:
Hi,
I have been testing 6.14.11-3-pve since one week onto a 5 nodes cluster with jumbo frames (9000) on Ceph interfaces, and I can still see a memory leak. It depends on each node, one of them is eating his memory much faster than others, but I can't explain why (they are all similar in hardware and configuration). But they are all leaking anyway.
 

Attachments

  • Capture d’écran 2025-10-08 à 07.23.00.png
    Capture d’écran 2025-10-08 à 07.23.00.png
    35 KB · Views: 5
  • Capture d’écran 2025-10-08 à 07.22.49.png
    Capture d’écran 2025-10-08 à 07.22.49.png
    40.7 KB · Views: 5
  • Capture d’écran 2025-10-08 à 07.22.35.png
    Capture d’écran 2025-10-08 à 07.22.35.png
    36.2 KB · Views: 5
  • Capture d’écran 2025-10-08 à 07.21.31.png
    Capture d’écran 2025-10-08 à 07.21.31.png
    34.9 KB · Views: 4
  • Capture d’écran 2025-10-08 à 07.20.05.png
    Capture d’écran 2025-10-08 à 07.20.05.png
    39.4 KB · Views: 5
Hi,
I have been testing 6.14.11-3-pve since one week onto a 5 nodes cluster with jumbo frames (9000) on Ceph interfaces, and I can still see a memory leak. It depends on each node, one of them is eating his memory much faster than others, but I can't explain why (they are all similar in hardware and configuration). But they are all leaking anyway.

On a 4-node Ceph cluster (dual 100 GbE Intel E810, MTU 9000) I ran ~16 Windows Server 2025 VMs under continuous diskspd.exe load for a full week (~1 % wear on PM9A3 drives). With kernels 6.8.12-11-pve and 6.14.11-3-pve I saw no memory leaks at all, while the “-icefix” builds still showed a slow increase. I eventually replaced all E810s with Mellanox, and since then memory usage has remained stable.
 
  • Like
Reactions: waltar
On a 4-node Ceph cluster (dual 100 GbE Intel E810, MTU 9000) I ran ~16 Windows Server 2025 VMs under continuous diskspd.exe load for a full week (~1 % wear on PM9A3 drives). With kernels 6.8.12-11-pve and 6.14.11-3-pve I saw no memory leaks at all, while the “-icefix” builds still showed a slow increase. I eventually replaced all E810s with Mellanox, and since then memory usage has remained stable.

I will replace 6.14.11-3-pve with 6.8.12-11-pve tomorrow morning on all nodes. Since I have no plan to add charges in the next few days, it will be a good way to compare.
 
The opt-in kernel 6.17 is now available for Proxmox VE 9.0 in the pve-test and pve-no-subscription repositories, if you would like to test it (official announcement of kernel 6.17 will come soon EDIT: added link to announcement post). Currently, version 6.17.1-1-pve is available, which contains all ICE driver fixes released upstream. If you have the possibility (and especially for non-production and/or non-critical systems), feel free to try the latest kernel 6.17 and let us know whether it improves the situation in any way (or whether it's the same, or worse).

Since it's an opt-in kernel, you will need to install it manually:
Code:
apt update && apt install proxmox-kernel-6.17

If you have any DKMS modules (e.g. Nvidia driver or other external kernel modules), you will also need to install the headers for the new kernel:
Code:
apt update && apt install proxmox-kernel-6.17 proxmox-headers-6.17

While kernels 6.8 and 6.14 also contain backported fixes from newer kernels (backported both by Ubuntu and by us), it might be possible that some further fixes might be available in the newest kernel that have not yet been backported, and might improve the situation.
 
Last edited: