I have several Proxmox nodes connected via OpenVPN with TAP interfaces. All TAP-connections on nodes bridged with their `vmbr1` bridges where are also VETH-interfaces from OpenVZ VMs. So it represents distributed L2 network among VMs.
At the first glance, all works like charm, but some time ago with increasing servers count I noticed that there were some freezes in the bridge. They may happen once or twice per day and last 10-120 seconds. It looks like the mutual ping or tcp/udp-connections can't pass through VMs located on the same node/different nodes. Meanwhile, they see each other by ARP (`ip neighbor` says 'reachable').
I enabled STP on `vmbr1` just to check if it fixes anything, but noticed that the ping results changed:
Checked with arping:
First two answers are valid, all other seems from random VMs from other nodes in my network.
Of course, when I disconnect VPN connection from the node, everything is back to normal:
There are neither duplicate MAC nor IP addresses on the nodes. The VMs which answer on ARP all have different and unique IP addresses.
Which can be a reason of bridge freeze and strange arping results?
P.S. On the nodes installed Proxmox version from 3.2-4 to 3.3-5.
At the first glance, all works like charm, but some time ago with increasing servers count I noticed that there were some freezes in the bridge. They may happen once or twice per day and last 10-120 seconds. It looks like the mutual ping or tcp/udp-connections can't pass through VMs located on the same node/different nodes. Meanwhile, they see each other by ARP (`ip neighbor` says 'reachable').
I enabled STP on `vmbr1` just to check if it fixes anything, but noticed that the ping results changed:
Code:
root@ovz1:~# ping 172.16.7.2
PING 172.16.7.2 (172.16.7.2) 56(84) bytes of data.
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=0.024 ms
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=20.8 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=20.8 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=41.7 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=0.040 ms
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=20.9 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=20.9 ms (DUP!)
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=41.7 ms (DUP!)
Code:
root@ovz1:~# arping -I eth0 -c 4 172.16.7.2
ARPING 172.16.7.2 from 172.16.7.1 eth0
Unicast reply from 172.16.7.2 [C6:76:4B:05:3E:33] 0.534ms
Unicast reply from 172.16.7.2 [C6:76:4B:05:3E:33] 21.677ms
Unicast reply from 172.16.7.2 [8E:3A:1A:68:77:24] 40.754ms
Unicast reply from 172.16.7.2 [CA:45:11:D4:38:03] 85.768ms
Unicast reply from 172.16.7.2 [6A:74:CF:8C:08:22] 101.030ms
Unicast reply from 172.16.7.2 [76:1B:99:84:9F:1A] 104.720ms
Unicast reply from 172.16.7.2 [76:1B:99:84:9F:1A] 125.585ms
Unicast reply from 172.16.7.2 [F6:C1:90:7B:73:34] 451.747ms
Unicast reply from 172.16.7.2 [B6:22:8A:69:E9:72] 604.148ms
Unicast reply from 172.16.7.2 [5E:5B:49:85:3C:A3] 716.400ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11] 811.795ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11] 21.896ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11] 21.866ms
Unicast reply from 172.16.7.2 [4E:0F:18:93:5B:11] 21.836ms
Sent 4 probes (1 broadcast(s))
Received 14 response(s)
Of course, when I disconnect VPN connection from the node, everything is back to normal:
Code:
root@ovz1:~# ping 172.16.7.2
PING 172.16.7.2 (172.16.7.2) 56(84) bytes of data.
64 bytes from 172.16.7.2: icmp_req=1 ttl=64 time=0.020 ms
64 bytes from 172.16.7.2: icmp_req=2 ttl=64 time=0.031 ms
64 bytes from 172.16.7.2: icmp_req=3 ttl=64 time=0.039 ms
64 bytes from 172.16.7.2: icmp_req=4 ttl=64 time=0.035 ms
There are neither duplicate MAC nor IP addresses on the nodes. The VMs which answer on ARP all have different and unique IP addresses.
Which can be a reason of bridge freeze and strange arping results?
P.S. On the nodes installed Proxmox version from 3.2-4 to 3.3-5.
Last edited: