Many thanks, the patches appear to have worked for us!
We last rebooted the host on Sunday night, so the memory utilisation graph doesn't show a massive dip but still approximately 8GB during the day when running pve-kernel-4.13.8-2-pve:
Note max and last available memory reduced by 2723.84 MB.
When running pve-kernel-4.13.8-3-pve:
Note max and last available memory only reduced by 348.16 MB.
We previously had an issue with the Intel ixgbe driver on PVE 4.4 when running two 10GbE ports as an active/backup bond, where the bond was then connected to a legacy Linux bridge (aka not OVS). The ports would flap until we disabled offload acceleration settings on the actual ports themselves:
Code:
/etc/rc.local:
# Buggy Intel network card drivers:
ethtool -K eth0 tso off gso off gro off;
ethtool -K eth1 tso off gso off gro off;
I assume the updated ixgbe drivers (v5.3.3) in 4.13.8-27 and later and/or kernel changes fixed the issue so we now no longer have to disable 'tcp-segmentation-offload', 'generic-segmentation-offload' or 'generic-receive-offload' acceleration features. This subsequently results in less buffer overruns and hence less memory leaking, than when we were running 4.13.4-26.
To summarise:
- 4.13.8-27 includes updated ixgbe drivers which reduce the underlying memory leak problem from occuring
- 4.13.8-29 includes patches which addresses a memory leak on vhost network traffic when buffers overrun
Again, many thanks for the prompt attention and buliding a kernel with the patches!
PS: The memory leaks were relatively gradual so I'll try to remember to post a memory/network utilisation graph which demonstrates the problem in a more pronounced way in a couple of days.