Slow memory leak in 6.8.12-13-pve

Hi,

Same thing here:
  • Ceph for VM disks and ISO images
  • Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
  • Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02)
    Subsystem: Intel Corporation Device [8086:0000]
    Kernel driver in use: ice
    Kernel modules: ice

Code:
root@pve3:~# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.14.11-2-pve root=UUID=d31b7e52-6351-474c-b8e8-e757557089ac ro nomodeset iommu=pt console=tty0 console=ttyS0,115200n8
 
Hi all,

maybe late, but I tested our clusters - they are not affected...

Network cards:
Mellanox MT27800, Kernel module: mlx5_core
Intel LAN X722, Kernel module: i40e
No ZFS.
Tested versions:
PVE 8.4 - kernels: 6.8.12-11, 6.8.12-14, 6.8.12-15
PVE 9.0 - kernels: 6.14.11-1, 6.14.11-2
 
  • Like
Reactions: waltar
https://lore.kernel.org/all/20250825-jk-ice-fix-rx-mem-leak-v2-1-5afbb654aebb@intel.com/ seems like a likely fix for this memory leak, a test kernel is available here and we'd appreciate feedback on whether it fixes your issues:

http://download.proxmox.com/temp/kernel-6.8-ice-memleak-fix-1/

thanks!

thank you for providing the kernel build with the Intel E810 ICE driver fix. I deployed it on four nodes equipped with E810 NICs. Over the weekend the systems were generally running, but I observed that memory usage on the E810 nodes was not stable. The curve was flatter compared to kernel 6.14.11-2, but memory consumption still increased steadily over time.

To compare more directly, I installed 6.8.12-15-ice-fix-1 on node1 and node2, while node3 and node4 stayed on the older 6.8.12-11 kernel which in my setup does not show memory leaks. After about one hour node1 crashed completely and required a cold reset.

Right before the crash the logs showed Ceph RBD lock messages (“no lock owners detected”, “breaking header lock”, “breaking object map lock”), indicating that the client session was interrupted. Simultaneously the Proxmox firewall bridge devices reported state changes (“fwbr entered blocking and forwarding state”, “fwln entered disabled state”, “fwtap entered promiscuous mode”). In my environment this sequence of Ceph lock takeovers combined with bridge port reinitialization is a reliable indicator of an E810 NIC reset: the interface drops link, the Linux bridge cycles its ports, and Ceph connectivity is lost, forcing RBD locks to be reassigned. This points to a driver-level instability rather than expected cluster behaviour.

In summary:

  • Kernel 6.8.12-15-ice-fix-1 reduces but does not eliminate the memory leak.
  • With a mixed setup (two nodes on the patched kernel, two nodes on 6.8.12-11), node1 crashed hard within an hour — something I have not experienced with 6.8.12-11.

Has anyone else observed similar behaviour with this patched kernel?
 
For your information, I updated my previous post with the latest information we currently have. Thank you everyone for your valuable feedback! Thanks to this, we were able to identify that the issue is related to the ICE driver and/or Intel E810 NIC, possibly also related to Jumbo Frames / MTU 9000. I thus also updated my post asking for more information, now also asking for the MTU used for Ceph.
 
After about one hour node1 crashed completely and required a cold reset.

Right before the crash the logs showed Ceph RBD lock messages (“no lock owners detected”, “breaking header lock”, “breaking object map lock”), indicating that the client session was interrupted. Simultaneously the Proxmox firewall bridge devices reported state changes (“fwbr entered blocking and forwarding state”, “fwln entered disabled state”, “fwtap entered promiscuous mode”).
Just to confirm, did you also see any "out of memory" messages before the crash? I would like to know whether you experienced the same issue, or a different one. If in doubt, feel free to share the journal of node1 with us by executing the following command and attaching the generated file (please adapt the date and time to the time around the beginning of the issues, and until after the crash):
Code:
journalctl --since="2025-09-xx xx:xx" --until="2025-09-xx xx:xx" | gzip > $(hostname)-$(date -Is)-journal.txt.gz
 
Just to confirm, did you also see any "out of memory" messages before the crash? I would like to know whether you experienced the same issue, or a different one. If in doubt, feel free to share the journal of node1 with us by executing the following command and attaching the generated file (please adapt the date and time to the time around the beginning of the issues, and until after the crash):
Code:
journalctl --since="2025-09-xx xx:xx" --until="2025-09-xx xx:xx" | gzip > $(hostname)-$(date -Is)-journal.txt.gz

I also always use MTU 9000. Before the crash I did not observe any OOM messages. Node1 completely froze and required a hard power cycle to recover.

Right before the crash I saw the Ceph RBD lock break messages and Proxmox firewall bridge state changes, as shown in the attached screenshots. I’m also attaching the journal log from node1 for the relevant period.

Please note that I don’t know the exact crash time on Monday morning, as I had too much work at once, so the log covers a broader interval around the incident.

Best regards, Martin
 

Attachments

Two days ago, I installed kernel Linux 6.14.11-2-ice-fix-1-pve from http://download.proxmox.com/temp/kernel-6.14-ice-memleak-fix-1/ on one of my 4 nodes cluster. After 36 hours, the usage of the memory of this node is much more stable compared to the 3 other nodes. His memory seems to be eating but very slowly.

I also use MTU 9000 for my Ceph network interfaces.

Sorry, I couldn't let my nodes go to OOM so I can't tell about this point.
 
Just to confirm, did you also see any "out of memory" messages before the crash? I would like to know whether you experienced the same issue, or a different one. If in doubt, feel free to share the journal of node1 with us by executing the following command and attaching the generated file (please adapt the date and time to the time around the beginning of the issues, and until after the crash):
Code:
journalctl --since="2025-09-xx xx:xx" --until="2025-09-xx xx:xx" | gzip > $(hostname)-$(date -Is)-journal.txt.gz

Tested both temporary kernels with ICE driver fix (6.8 and 6.14) on Intel E810 NICs (MTU 9000, Ceph backend). Heavy load: 10–15 Windows Server 2025 VMs running DiskSpd I/O stress.

Results:

  • 6.8.12-15-ice-fix-1 → node001 froze after ~1h, required cold reset. Logs showed Ceph RBD lock break + firewall bridge state changes → typical NIC reset pattern.
  • 6.14.11-2-ice-fix-1 → after longer runtime ended in kernel panic (penguin with exclamation mark).


MTU 1500 made the memory usage curve flatter (slower leak), but IOPS and throughput dropped sharply, so not a solution. Trend: the heavier the network traffic, the faster the leak.

On 6.8.12-11 cluster runs stable without leaks. July 2025 ICE driver from GitHub could help, but rebuilding after every kernel update is impractical. For future hardware I’ll use Mellanox ConnectX-5 Ex NICs.

Thanks for providing the test kernels — all results may be influenced by my environment or by mistakes on my side.



 

Attachments

  • IMG_0208.jpeg
    IMG_0208.jpeg
    388.3 KB · Views: 16
Last edited:
> Given the setup and the fact that it directly impacts the Intel E810 NIC, I also suspect it’s tied to the ice driver changes introduced in those newer kernels

yes, seems so:

https://securityvulnerability.io/vulnerability/CVE-2025-21981

https://securityvulnerability.io/vulnerability/CVE-2025-38417


maybe disabling aRFS via "ethtool -K <ethX> ntuple off" ( https://github.com/intel/ethernet-linux-ice ) is worth a try on affected system !?

Thank you for the suggestion. I tried disabling ntuple on the Intel E810 NICs, but it did not help. For now, it seems that downgrading and pinning the kernel to 6.8.12-11 is the only workaround that works in my setup. I have also ordered a Mellanox ConnectX-5 Ex EN 100 GbE Dual-Port QSFP28 PCIe 4.0 x16 and will test replacing the E810 on one node to see if this changes the behavior. I will share the outcome once tested.
 
  • Like
Reactions: RolandK
Is any one successfully running in proxmox 9 with the 6.8.12-11 kernel? Is there a sanctioned way to get back to that kernel other than hand jamming the deb packages in from the 8.4 repo?

We started fresh into 9.0 in a pre-prod cluster so we don't have the old kernels lying around after an upgrade.
 
Is any one successfully running in proxmox 9 with the 6.8.12-11 kernel? Is there a sanctioned way to get back to that kernel other than hand jamming the deb packages in from the 8.4 repo?

We started fresh into 9.0 in a pre-prod cluster so we don't have the old kernels lying around after an upgrade.

1758902450904.png
 
Is any one successfully running in proxmox 9 with the 6.8.12-11 kernel? Is there a sanctioned way to get back to that kernel other than hand jamming the deb packages in from the 8.4 repo?

We started fresh into 9.0 in a pre-prod cluster so we don't have the old kernels lying around after an upgrade.
Hi,

I just did it this morning onto one node of my 5 nodes cluster Proxmox 9.0.10.

This is how I did it:
Code:
curl -O http://download.proxmox.com/debian/pve/dists/bookworm/pve-no-subscription/binary-amd64/proxmox-kernel-6.8.12-11-pve-signed_6.8.12-11_amd64.deb
curl -O http://download.proxmox.com/debian/pve/dists/bookworm/pve-no-subscription/binary-amd64/proxmox-kernel-6.8_6.8.12-11_all.deb
dpkg -i proxmox-kernel-6.8.12-11-pve-signed_6.8.12-11_amd64.deb proxmox-kernel-6.8_6.8.12-11_all.deb
proxmox-boot-tool kernel pin 6.8.12-11-pve
shutdown -r now

I only have 2 virtual machines onto this node to reduce the risk.
Attached is the graph of used memory. You can clearly see before (6.14.11-2) and after (6.8.12-11) after the reboot.
Memory seems to be eaten slowly, but maybe normal. Too early to say.
 

Attachments

  • Capture d’écran 2025-09-26 à 17.25.46.png
    Capture d’écran 2025-09-26 à 17.25.46.png
    101.2 KB · Views: 14
Thank you very much, rolled one of our nodes as well. It's a big sucker with 128 cores, 1.1TB of RAM, and a ton of load. If it blows up I'll report back, if not I'll roll the others and still report back.

Thank everyone for the information.

Update: everything is running great with hundreds of VMs on TBs of memory. I think its safe to say that this kernel runs great on 9.0.

Thank you again for the info.
 
Last edited:
Thank you very much, rolled one of our nodes as well. It's a big sucker with 128 cores, 1.1TB of RAM, and a ton of load. If it blows up I'll report back, if not I'll roll the others and still report back.

Thank everyone for the information.

Update: everything is running great with hundreds of VMs on TBs of memory. I think its safe to say that this kernel runs great on 9.0.

Thank you again for the info.

I adjusted my cluster to test the memory leak issue under controlled conditions:

  • node001: Intel E810-CQDA2 (100 GbE) removed and replaced with Mellanox ConnectX-5 VPI MCX556A-EDAT (dual-port IB-EDR/100 GbE). Kernel upgraded to Linux 6.14.11-3-pve (2025-09-22T10:13Z).
  • node002: still equipped with Intel E810-CQDA2, kernel also upgraded to 6.14.11-3-pve.
  • node003 + node004: both still equipped with Intel E810-CQDA2 but deliberately pinned to the last verified stable kernel 6.8.12-11-pve (2025-05-22T09:39Z).

All nodes are running the latest Proxmox VE 9.0 and Ceph packages.

Workload: 16 Windows Server 2025 VMs executing sustained DiskSpd I/O tests (high IOPS and throughput) for 24+ hours with MTU 9000.

Expected behavior:

  • node003 + node004 should remain stable (pinned 6.8.12-11, known to be unaffected).
  • node001 (Mellanox + latest kernel) should also remain stable.
  • node002 (Intel E810 + latest kernel) was expected to show the known memory leak.

Actual behavior:

  • node003 + node004 behaved as expected (stable, no leaks).
  • node001 (Mellanox + 6.14.11-3) stable, no leaks.
  • node002 (Intel E810 + 6.14.11-3) also remained stable, with no slab growth, no abnormal SUnreclaim increase, and no OOM events.

This is unexpected. Until now, I have not seen any changelog or forum report indicating that the Intel E810/ICE memory leak has been fixed. It is therefore unclear whether the improvement is due to:

  • changes introduced in kernel 6.14.11-3-pve,
  • updates in pve-firmware,
  • or another component.

Conclusion: On my cluster, three Intel E810 nodes and one Mellanox node all show no memory leaks under 24+ hours of sustained heavy load. I will continue long-term stress testing for several more days to confirm.

Can other users running Intel E810 on 6.14.11-3-pve confirm whether they observe the same behavior?



 
  • Like
Reactions: waltar
yes, the 6.14.11-3-pve kernel updates to a new kernel base (Ubuntu-6.14.0-34.34) which contains a few ice related fixes:

Code:
56b3db68fb111318fa34236523a042a2fa0fbc03 ice: fix eswitch code memory leak in reset scenario
3f250a7d1701b7f96a5592e2a551c626de66e3c9 net: ice: Perform accurate aRFS flow match
d0cfedc0a44126e8ffa3c2f947ad6107ecb707f5 ice: fix check for existing switch rule
7576066f2eebfed9f9b67dac3372b1bc3cc0dfff ice/ptp: fix crosstimestamp reporting
e8d5430b1ada0d4ce9c6947697a684c1aa366a2f ice: fix rebuilding the Tx scheduler tree for large queue counts
91fc42bd2bdb615524a61957d7d7233831a00c99 ice: create new Tx scheduler nodes for new queues only
ba262e2fd6db63069fb16457237b9bbd4ece9c4d ice: fix Tx scheduler error handling in XDP callback
c8a0748d2b88c02b73548687e5e53fe9f797ece2 ice: Fix LACP bonds without SRIOV environment
7fb05265383e0f6749adba44395bafc3d999df38 ice: fix vf->num_mac count with port representors
6d2cbb450c4fd0b06b963feccd0354d1db86fb23 ice: count combined queues using Rx/Tx count
bf9dfc985cf7fff593ac571fad74bf9f0550b5e0 ice: treat dyn_allowed only as suggestion
91cb76eab07de066c07bc2068ca3e334f5ab333d ice: init flow director before RDMA
 
  • Like
Reactions: waltar
yes, the 6.14.11-3-pve kernel updates to a new kernel base (Ubuntu-6.14.0-34.34) which contains a few ice related fixes:

Code:
56b3db68fb111318fa34236523a042a2fa0fbc03 ice: fix eswitch code memory leak in reset scenario
3f250a7d1701b7f96a5592e2a551c626de66e3c9 net: ice: Perform accurate aRFS flow match
d0cfedc0a44126e8ffa3c2f947ad6107ecb707f5 ice: fix check for existing switch rule
7576066f2eebfed9f9b67dac3372b1bc3cc0dfff ice/ptp: fix crosstimestamp reporting
e8d5430b1ada0d4ce9c6947697a684c1aa366a2f ice: fix rebuilding the Tx scheduler tree for large queue counts
91fc42bd2bdb615524a61957d7d7233831a00c99 ice: create new Tx scheduler nodes for new queues only
ba262e2fd6db63069fb16457237b9bbd4ece9c4d ice: fix Tx scheduler error handling in XDP callback
c8a0748d2b88c02b73548687e5e53fe9f797ece2 ice: Fix LACP bonds without SRIOV environment
7fb05265383e0f6749adba44395bafc3d999df38 ice: fix vf->num_mac count with port representors
6d2cbb450c4fd0b06b963feccd0354d1db86fb23 ice: count combined queues using Rx/Tx count
bf9dfc985cf7fff593ac571fad74bf9f0550b5e0 ice: treat dyn_allowed only as suggestion
91cb76eab07de066c07bc2068ca3e334f5ab333d ice: init flow director before RDMA

Based on Fabian’s confirmation and my own stress tests (Intel E810 with MTU 9000 on kernel 6.14.11-3-pve, no leaks observed), can we assume that the memory leak issue is now resolved and mark this thread as [SOLVED]?

Best regards,
Martin
 
After upgrading to 6.14.11-3-pve I still see `buffer_head` slab increasing slowly over time. Its obvious, but might be worth noting that the ice memory leak fix previously identified isn't in the above list.
 
After upgrading to 6.14.11-3-pve I still see `buffer_head` slab increasing slowly over time. Its obvious, but might be worth noting that the ice memory leak fix previously identified isn't in the above list.
Thanks for the info, I'll wait a bit longer before doing anything. At the moment, I have to restart the nodes once a week. If necessary, I'll have to revert to the old kernel very soon.
 
Upgraded a 10 Node cluster to the ice-fix kernel for 6.8.12-15-ice-fix-1-pve. All E810 NIC's (2x Per Node) and still showing memory climbing over time. Going to continue to monitor
 
Just to confirm what others are seeing, the 6.14.11-2-ice-fix-1-pve kernel shows a similar rate of buffer_head slab leaking as other 6.14 kernels (with 9k MTU ceph on an ice card).