P
psokolovas
Guest
Hi
I`ve started to use VE chkpnt/restore, but after several days I noticed, that sometimes, in random VEs after restoring, number of TCP orphaned sockets becomes negative (e.g. -1, or -4). And this is system wide, because HN dmesg shows:
TCP: too many of orphaned sockets (-1 in CT1060)
printk: 52 messages suppressed.
TCP: too many of orphaned sockets (-3 in CT1105)
printk: 37 messages suppressed.
vzctl restart 1105 does not help. Count still comes back to negative value. And this causes lots of troubles to VE users, because TCP connections start to drop, resulting in not loaded pictures, terminated html etc.
I use kernel 2.6.18-4-pve. There are no beancounters overused.
As long as I know programming, there should be check in kernel code whether number is negative, and if yes, should be counted as 0. But as long as -4 means = 65532 or even more, if double int is used, I think problem is there.
All my tries to solve this problem without rebooting HN failed. Only reboot clears these counters - but it is not acceptable solution.
EDIT: Just found one more solution:
1. vzctl stop 1105
2. wait for dmesg on HN: Ub 1105 helds 31192 in tcpsndbuf on put
3. vzctl start 1105
Waited about 30 seconds. Cool. But still not acceptable solution
Questions:
1. Have anyone experienced the same problem, and if Yes - what was the solution. E.g. maybe it is possible to reset all open/orphaned sockets and their counters by issuing some kind of cat smth > /proc/somewhere to do this?
2. Maybe it is possible to patch the kernel to behave as I stated above, in case number gets negative? If Yes - maybe we should patch PVE kernel?
Thanks!
I`ve started to use VE chkpnt/restore, but after several days I noticed, that sometimes, in random VEs after restoring, number of TCP orphaned sockets becomes negative (e.g. -1, or -4). And this is system wide, because HN dmesg shows:
TCP: too many of orphaned sockets (-1 in CT1060)
printk: 52 messages suppressed.
TCP: too many of orphaned sockets (-3 in CT1105)
printk: 37 messages suppressed.
vzctl restart 1105 does not help. Count still comes back to negative value. And this causes lots of troubles to VE users, because TCP connections start to drop, resulting in not loaded pictures, terminated html etc.
I use kernel 2.6.18-4-pve. There are no beancounters overused.
As long as I know programming, there should be check in kernel code whether number is negative, and if yes, should be counted as 0. But as long as -4 means = 65532 or even more, if double int is used, I think problem is there.
All my tries to solve this problem without rebooting HN failed. Only reboot clears these counters - but it is not acceptable solution.
EDIT: Just found one more solution:
1. vzctl stop 1105
2. wait for dmesg on HN: Ub 1105 helds 31192 in tcpsndbuf on put
3. vzctl start 1105
Waited about 30 seconds. Cool. But still not acceptable solution

Questions:
1. Have anyone experienced the same problem, and if Yes - what was the solution. E.g. maybe it is possible to reset all open/orphaned sockets and their counters by issuing some kind of cat smth > /proc/somewhere to do this?
2. Maybe it is possible to patch the kernel to behave as I stated above, in case number gets negative? If Yes - maybe we should patch PVE kernel?
Thanks!
Last edited by a moderator: