Slow memory leak in 6.8.12-13-pve

Hi,

Same thing here:
  • Ceph for VM disks and ISO images
  • Ethernet controller: Intel Corporation Ethernet Controller E810-XXV for SFP (rev 02)
  • Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02)
    Subsystem: Intel Corporation Device [8086:0000]
    Kernel driver in use: ice
    Kernel modules: ice

Code:
root@pve3:~# cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-6.14.11-2-pve root=UUID=d31b7e52-6351-474c-b8e8-e757557089ac ro nomodeset iommu=pt console=tty0 console=ttyS0,115200n8
 
Hi all,

maybe late, but I tested our clusters - they are not affected...

Network cards:
Mellanox MT27800, Kernel module: mlx5_core
Intel LAN X722, Kernel module: i40e
No ZFS.
Tested versions:
PVE 8.4 - kernels: 6.8.12-11, 6.8.12-14, 6.8.12-15
PVE 9.0 - kernels: 6.14.11-1, 6.14.11-2
 
  • Like
Reactions: waltar
https://lore.kernel.org/all/20250825-jk-ice-fix-rx-mem-leak-v2-1-5afbb654aebb@intel.com/ seems like a likely fix for this memory leak, a test kernel is available here and we'd appreciate feedback on whether it fixes your issues:

http://download.proxmox.com/temp/kernel-6.8-ice-memleak-fix-1/

thanks!

thank you for providing the kernel build with the Intel E810 ICE driver fix. I deployed it on four nodes equipped with E810 NICs. Over the weekend the systems were generally running, but I observed that memory usage on the E810 nodes was not stable. The curve was flatter compared to kernel 6.14.11-2, but memory consumption still increased steadily over time.

To compare more directly, I installed 6.8.12-15-ice-fix-1 on node1 and node2, while node3 and node4 stayed on the older 6.8.12-11 kernel which in my setup does not show memory leaks. After about one hour node1 crashed completely and required a cold reset.

Right before the crash the logs showed Ceph RBD lock messages (“no lock owners detected”, “breaking header lock”, “breaking object map lock”), indicating that the client session was interrupted. Simultaneously the Proxmox firewall bridge devices reported state changes (“fwbr entered blocking and forwarding state”, “fwln entered disabled state”, “fwtap entered promiscuous mode”). In my environment this sequence of Ceph lock takeovers combined with bridge port reinitialization is a reliable indicator of an E810 NIC reset: the interface drops link, the Linux bridge cycles its ports, and Ceph connectivity is lost, forcing RBD locks to be reassigned. This points to a driver-level instability rather than expected cluster behaviour.

In summary:

  • Kernel 6.8.12-15-ice-fix-1 reduces but does not eliminate the memory leak.
  • With a mixed setup (two nodes on the patched kernel, two nodes on 6.8.12-11), node1 crashed hard within an hour — something I have not experienced with 6.8.12-11.

Has anyone else observed similar behaviour with this patched kernel?
 
For your information, I updated my previous post with the latest information we currently have. Thank you everyone for your valuable feedback! Thanks to this, we were able to identify that the issue is related to the ICE driver and/or Intel E810 NIC, possibly also related to Jumbo Frames / MTU 9000. I thus also updated my post asking for more information, now also asking for the MTU used for Ceph.
 
After about one hour node1 crashed completely and required a cold reset.

Right before the crash the logs showed Ceph RBD lock messages (“no lock owners detected”, “breaking header lock”, “breaking object map lock”), indicating that the client session was interrupted. Simultaneously the Proxmox firewall bridge devices reported state changes (“fwbr entered blocking and forwarding state”, “fwln entered disabled state”, “fwtap entered promiscuous mode”).
Just to confirm, did you also see any "out of memory" messages before the crash? I would like to know whether you experienced the same issue, or a different one. If in doubt, feel free to share the journal of node1 with us by executing the following command and attaching the generated file (please adapt the date and time to the time around the beginning of the issues, and until after the crash):
Code:
journalctl --since="2025-09-xx xx:xx" --until="2025-09-xx xx:xx" | gzip > $(hostname)-$(date -Is)-journal.txt.gz
 
Just to confirm, did you also see any "out of memory" messages before the crash? I would like to know whether you experienced the same issue, or a different one. If in doubt, feel free to share the journal of node1 with us by executing the following command and attaching the generated file (please adapt the date and time to the time around the beginning of the issues, and until after the crash):
Code:
journalctl --since="2025-09-xx xx:xx" --until="2025-09-xx xx:xx" | gzip > $(hostname)-$(date -Is)-journal.txt.gz

I also always use MTU 9000. Before the crash I did not observe any OOM messages. Node1 completely froze and required a hard power cycle to recover.

Right before the crash I saw the Ceph RBD lock break messages and Proxmox firewall bridge state changes, as shown in the attached screenshots. I’m also attaching the journal log from node1 for the relevant period.

Please note that I don’t know the exact crash time on Monday morning, as I had too much work at once, so the log covers a broader interval around the incident.

Best regards, Martin
 

Attachments