Bridge freeze and odd arping answer

dfateyev

New Member
May 25, 2015
3
0
1
I have several Proxmox nodes connected via OpenVPN with TAP interfaces. All TAP-connections on nodes bridged with their `vmbr1` bridges where are also VETH-interfaces from OpenVZ VMs. So it represents distributed L2 network among VMs.

At the first glance, all works like charm, but some time ago with increasing servers count I noticed that there were some freezes in the bridge. They may happen once or twice per day and last 10-120 seconds. It looks like the mutual ping or tcp/udp-connections can't pass through VMs located on the same node/different nodes. Meanwhile, they see each other by ARP (`ip neighbor` says 'reachable').

I enabled STP on `vmbr1` just to check if it fixes anything, but noticed that the ping results changed:
Code:
root@ovz1:~# ping 172.16.7.2
PING 172.16.7.2 (172.16.7.2) 56(84) bytes of data.
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=0.024 ms
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=20.8 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=20.8 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=41.7 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=0.040 ms
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=20.9 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=20.9 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=41.7 ms (DUP!)
Checked with arping:
Code:
root@ovz1:~# arping -I eth0 -c 4 172.16.7.2
ARPING 172.16.7.2 from 172.16.7.1 eth0
Unicast reply from 172.16.7.2 [C6:76:4B:05:3E:33]  0.534ms
Unicast reply from 172.16.7.2 [C6:76:4B:05:3E:33]  21.677ms
Unicast reply from 172.16.7.2 [8E:3A:1A:68:77:24]  40.754ms
Unicast reply from 172.16.7.2 [CA:45:11:D4:38:03]  85.768ms
Unicast reply from 172.16.7.2 [6A:74:CF:8C:08:22]  101.030ms
Unicast reply from 172.16.7.2 [76:1B:99:84:9F:1A]  104.720ms
Unicast reply from 172.16.7.2 [76:1B:99:84:9F:1A]  125.585ms
Unicast reply from 172.16.7.2 [F6:C1:90:7B:73:34]  451.747ms
Unicast reply from 172.16.7.2 [B6:22:8A:69:E9:72]  604.148ms
Unicast reply from 172.16.7.2 [5E:5B:49:85:3C:A3]  716.400ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11]  811.795ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11]  21.896ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11]  21.866ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11]  21.836ms
Sent 4 probes (1 broadcast(s))
Received 14 response(s)
First two answers are valid, all other seems from random VMs from other nodes in my network.

Of course, when I disconnect VPN connection from the node, everything is back to normal:
Code:
root@ovz1:~# ping 172.16.7.2
PING 172.16.7.2 (172.16.7.2) 56(84) bytes of data.
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=0.020 ms
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=0.031 ms
64 bytes from 172.16.7.2: icmp_req=3 ttl=64 time=0.039 ms
64 bytes from 172.16.7.2: icmp_req=4 ttl=64 time=0.035 ms

There are neither duplicate MAC nor IP addresses on the nodes. The VMs which answer on ARP all have different and unique IP addresses.
Which can be a reason of bridge freeze and strange arping results?

P.S. On the nodes installed Proxmox version from 3.2-4 to 3.3-5.
 
Last edited:
Well, I reverted bridge settings to default to not to confuse other Proxmox nodes and get rid of DUP ping results.
But, still have these arping answers:
Code:
root@ovz2:/# arping -I eth0 -c 4 172.16.7.1
ARPING 172.16.7.1 from 172.16.7.3 eth0
Unicast reply from 172.16.7.1 [EE:18:2F:D7:D2:63]  0.540ms
Unicast reply from 172.16.7.1 [B6:22:8A:69:E9:72]  42.296ms
Unicast reply from 172.16.7.1 [4E:0F:18:93:5B:11]  50.368ms
Unicast reply from 172.16.7.1 [8E:3A:1A:68:77:24]  166.388ms
Unicast reply from 172.16.7.1 [6A:74:CF:8C:08:22]  172.183ms
Unicast reply from 172.16.7.1 [CA:45:11:D4:38:03]  207.392ms
Unicast reply from 172.16.7.1 [F6:C1:90:7B:73:34]  271.387ms
Unicast reply from 172.16.7.1 [5E:5B:49:85:3C:A3]  303.805ms
Unicast reply from 172.16.7.1 [76:1B:99:84:9F:1A]  407.756ms
Unicast reply from 172.16.7.1 [76:1B:99:84:9F:1A]  0.534ms
Unicast reply from 172.16.7.1 [76:1B:99:84:9F:1A]  0.545ms
Unicast reply from 172.16.7.1 [76:1B:99:84:9F:1A]  0.534ms
Sent 4 probes (1 broadcast(s))
Received 12 response(s)
Perhaps anybody has an idea why I'm seeing these results?
 
Check proxy arp settings
I do really use proxy arp:
Code:
net.ipv4.conf.all.proxy_arp = 1
net.ipv4.conf.default.proxy_arp = 1


# Enables source route verification
net.ipv4.conf.all.rp_filter = 0
but seems it's needed to interact all VMs located on different nodes.
Anyway, I'll try to disable arp proxying on node's `vmbr1` which is linked with VPN TAP and see if it helps.

UPD: Fixed with more precised `proxy_arp` and `rp_filter` values.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!