Negative number of orphaned TCP sockets

psokolovas · Dec 21, 2010

Hi

I`ve started to use VE chkpnt/restore, but after several days I noticed, that sometimes, in random VEs after restoring, number of TCP orphaned sockets becomes negative (e.g. -1, or -4). And this is system wide, because HN dmesg shows:

TCP: too many of orphaned sockets (-1 in CT1060)
printk: 52 messages suppressed.
TCP: too many of orphaned sockets (-3 in CT1105)
printk: 37 messages suppressed.

vzctl restart 1105 does not help. Count still comes back to negative value. And this causes lots of troubles to VE users, because TCP connections start to drop, resulting in not loaded pictures, terminated html etc.

I use kernel 2.6.18-4-pve. There are no beancounters overused.

As long as I know programming, there should be check in kernel code whether number is negative, and if yes, should be counted as 0. But as long as -4 means = 65532 or even more, if double int is used, I think problem is there.

All my tries to solve this problem without rebooting HN failed. Only reboot clears these counters - but it is not acceptable solution.
EDIT: Just found one more solution:

1. vzctl stop 1105
2. wait for dmesg on HN: Ub 1105 helds 31192 in tcpsndbuf on put
3. vzctl start 1105

Waited about 30 seconds. Cool. But still not acceptable solution

Questions:

1. Have anyone experienced the same problem, and if Yes - what was the solution. E.g. maybe it is possible to reset all open/orphaned sockets and their counters by issuing some kind of cat smth > /proc/somewhere to do this?

2. Maybe it is possible to patch the kernel to behave as I stated above, in case number gets negative? If Yes - maybe we should patch PVE kernel?

Thanks!

dietmar · Dec 21, 2010

Please can you report a bug to the openvz bug tracker - including detailed instruction howto reproduce the bug.

psokolovas · Jan 4, 2011

I`ve reported this to openvz kernel maintainers:
http://bugzilla.openvz.org/show_bug.cgi?id=1735

Also, I believe I found the workaround, can You tell Your kernel maintainers to include this fix to the current 2.6.18 kernel release? I tried to do this myself, but kernel is hard to compile - gcc 4.1.2 fails, newer also fails, it is hard to get working 4.1.3 compiler including quilt etc - i think You better recompile it and include as update. Thanks!

Workaround:
in net/ipv4/tcp.c we replace:

if (ub_too_many_orphans(sk, orphans)) {
with:
if ((ub_too_many_orphans(sk, orphans)) && (orphans > 0)) {
This will eliminate false TCP resets if inaccuracy happens in counting orphans below zero. I never seen this negative number goes below -7, so, +-7 orphans will not play a role in UB, but will fix false negative TCP resets.

If You want, I can help with testing, please, compile 2.6.18-4-pve with my patch and send it out to me, or provide me with its FTP location, I will test it out on my production HNs (because I am 99% sure it will help, and 99,99% sure it wont hurt

After that, You can release it as update.

Thanks again!

dietmar · Jan 5, 2011

psokolovas said:
I`ve reported this to openvz kernel maintainers:
http://bugzilla.openvz.org/show_bug.cgi?id=1735

I will include that when the OpenVZ team confirms that bug and incude a fix in there official release.

Search

Search

Negative number of orphaned TCP sockets

psokolovas

Guest

dietmar

Proxmox Staff Member

psokolovas

Guest

dietmar

Proxmox Staff Member

We value your privacy