Something bad is happening with OpenVZ!

SamTzu

Renowned Member
Mar 27, 2009
521
17
83
Helsinki, Finland
sami.mattila.eu
Ok. Something bad seems to be happening with OpenVZ.
I noticed an error messages on one of our Proxmox servers.

Oct 10 22:28:54 vip1 kernel: Neighbour table overflow.
Oct 10 22:28:54 vip1 kernel: Neighbour table overflow.
Oct 10 22:28:54 vip1 kernel: Neighbour table overflow.
Oct 10 22:28:54 vip1 kernel: Neighbour table overflow.
Oct 10 22:28:54 vip1 kernel: Neighbour table overflow.
Oct 10 22:28:58 vip1 corosync[2580]: [TOTEM ] A processor failed, forming new configuration.
Oct 10 22:28:58 vip1 corosync[2580]: [CLM ] CLM CONFIGURATION CHANGE
Oct 10 22:28:58 vip1 corosync[2580]: [CLM ] New Configuration:
Oct 10 22:28:58 vip1 corosync[2580]: [CLM ] Members Left:
Oct 10 22:28:58 vip1 corosync[2580]: [CLM ] Members Joined:
Oct 10 22:28:58 vip1 corosync[2580]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 10 22:28:59 vip1 corosync[2580]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 10 22:29:17 vip1 kernel: __ratelimit: 391 callbacks suppressed
Oct 10 22:29:17 vip1 kernel: Neighbour table overflow.
Oct 10 22:29:17 vip1 kernel: Neighbour table overflow.

etc...

The problem became clear when I tried to see how there can be so many connections on the server without me seeing them normally.
(This is what Neighbour table overflow normally means, lots of connections. Or IPv6 problems.)

Doing
revealed that there were many ghost connections from internal network that simply did not exist.

? (10.9.141.139) at <incomplete> on vmbr0? (10.9.129.77) at <incomplete> on vmbr0
? (10.9.143.131) at <incomplete> on vmbr0
? (10.9.143.73) at <incomplete> on vmbr0
? (10.9.143.209) at <incomplete> on vmbr0
? (10.9.142.213) at <incomplete> on vmbr0
? (10.9.143.111) at <incomplete> on vmbr0
? (10.9.140.48) at <incomplete> on vmbr0
? (10.9.139.207) at <incomplete> on vmbr0
? (10.9.139.224) at <incomplete> on vmbr0
? (10.9.131.219) at <incomplete> on vmbr0
? (10.9.129.33) at <incomplete> on vmbr0
? (10.9.143.89) at <incomplete> on vmbr0

There are no such IP's in use in our network.

When I tried to restart the network with
/etc/init.d/networking restart
I got even weirder error.

root@vip1:/var/log# /etc/init.d/networking restartRunning /etc/init.d/networking restart is deprecated because it may not re-enable some interfaces ... (warning).
Reconfiguring network interfaces...
Waiting for vmbr0 to get ready (MAXWAIT is 2 seconds).
grep: unrecognized option '--all'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
done.

I'm at a loss here. Who is doing what here?
 
Last edited:
how many entries do you have in your arp cache?
have you checked the network settings on your host an vm's? maybe your broadcast domain is to large
 
it's hard to say something without details about your network environment but i would do tcpdumps on your switch/host/vm interfaces to find out from which direction they are coming in...

i had a similar case where a customer had a layer-2 connection between two sites where the connection were connected to the one of the switches on each site - but the problem was caused by the isp as he had not isolated this layer-2 connection and so the customer had the isp's backbone connected on both sites....
 
That particular host had 1 Bind DNS server, 2 Zimbra and 1 ZenOss VM's running on it. They were all behind external firewall. I fail to see how this could happen unless there was something seriously wrong.
 
kernel: neighbour table overflow is usually a common problem for a large network.

If you run :
# sysctl net.ipv4.neigh.default.gc_thresh1
# sysctl net.ipv4.neigh.default.gc_thresh2
# sysctl net.ipv4.neigh.default.gc_thresh3

You will see the following output respectively:
# net.ipv4.neigh.default.gc_thresh1 = 128
# net.ipv4.neigh.default.gc_thresh1 = 512
# net.ipv4.neigh.default.gc_thresh1 = 1024

For large network with table overflow issue, the following entries in /etc/sysctl.conf can prevent the errors:
# Force gc to clean-up quickly
net.ipv4.neigh.default.gc_interval = 3600

# Set ARP cache entry timeout
net.ipv4.neigh.default.gc_stale_time = 3600

# Setup DNS threshold for arp
net.ipv4.neigh.default.gc_thresh3 = 4096
net.ipv4.neigh.default.gc_thresh2 = 2048
net.ipv4.neigh.default.gc_thresh1 =1024

To load new changes without restart:
# sysctl -p

The issue can also occur in a small network where subnet mask is very small.

 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!