OVS intport - You can't ping me unless I ping you first

virtualbitz

Member
Nov 6, 2020
39
5
13
33
This is the strangest networking problem I've had in a long time.

I have a single host with an OVS bridge and a couple of interfaces, one 10G, one 1G. I had a single IP bound directly to the bridge. This problem started after I deleted that IP and created a new IP with a different subnet on an ovsint port, this time using a VLAN ID instead of none. I created the OVS Intport, move the gateway over to the new OVS Intport, validated that I could reach the internet from the host, then deleted the old IP from the bridge. VM/CT networking is great, nothing wrong there.

At first everything seemed okay. Internet access from the host was fine. Then I started to notice that several client machines couldn't get to the host. They were on the same subnet, all the correct MAC addresses were in the tables on the physical switches and OVS. Then I tried pinging a client machine from the host. Once I did that the client was instantly able to access the host.

I checked over my switches thoroughly for spanning tree, or any other problems (I'm running RSTP on all switches, including the OVS bridge). I couldn't find anything wrong.
 
Last edited:
Bump, does anyone know how I should go about troubleshooting this? I noticed that the ovsintport IP becomes inaccessible once again (no ARP) after rebooting the host
 
Bump, does anyone know how I should go about troubleshooting this? I noticed that the ovsintport IP becomes inaccessible once again (no ARP) after rebooting the host
Might be helpful if you could post some of your config files, etc/network/interfaces ?
 
Hint: NDP/ARP
I mean ARP certain seems to be a factor. There's no ARP entry in OVS before I try pinging out, and after I do there's an ARP entry. That doesn't explain why OVS doesn't seem to be replying to ARP broadcasts.

Not following where you're going with NDP
 
I'm facing this issue - cannot connect to host1 from host2 over vlan20 until ping host2 first from host1.

Any solution yet?

Some links with same question without answers other than "this is something wrong with ARP":
/etc/network/interfaces
Code:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr1
    ovs_mtu 9000
    ovs_options other_config:rstp-path-cost=40000 other_config:rstp-port-admin-edge=false other_config:rstp-port-mcheck=true other_config:rstp-enable=true other_config:rstp-port-auto-edge=false
#1G

auto enp1s0
iface enp1s0 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr1
    ovs_mtu 9000
    ovs_options other_config:rstp-port-admin-edge=false other_config:rstp-path-cost=2000 other_config:rstp-enable=true other_config:rstp-port-mcheck=true other_config:rstp-port-auto-edge=false
#10G

iface eth1 inet manual

iface eth2 inet manual

iface eth3 inet manual

auto vlan11
iface vlan11 inet static
    address 192.168.11.3/24
    gateway 192.168.11.1
    ovs_type OVSIntPort
    ovs_bridge vmbr1
    ovs_mtu 1500
    ovs_options vlan_mode=access tag=11
    ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
#Mgmt

auto vlan20
iface vlan20 inet static
    address 10.10.20.3/24
    ovs_type OVSIntPort
    ovs_bridge vmbr1
    ovs_mtu 9000
    ovs_options tag=20
    ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif
#Data

auto vmbr1
iface vmbr1 inet manual
    ovs_type OVSBridge
    ovs_ports eth0 enp1s0 vlan11 vlan20
    ovs_mtu 9000
    up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
    post-up sleep 10
#LAN

auto vmbr2
iface vmbr2 inet manual
    ovs_type OVSBridge
    ovs_mtu 9000
#Internal

pveversion --verbose
Code:
proxmox-ve: 6.4-1 (running kernel: 5.4.189-1-pve)
pve-manager: 6.4-15 (running version: 6.4-15/af7986e6)
pve-kernel-5.4: 6.4-18
pve-kernel-helper: 6.4-18
pve-kernel-5.4.189-2-pve: 5.4.189-2
pve-kernel-5.4.189-1-pve: 5.4.189-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-4
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.14-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-2
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.7-pve1
 
Last edited:
I'm facing this issue - cannot connect to host1 from host2 over vlan20 until ping host2 first from host1.

Any solution yet?
I'm still having the problem. I gave up and kept the host interface / gateway on the native VLAN.

Good to know I'm not the only one. I'm doing some rathe esoteric stuff in this environment, it's a lab.

Ooooh it's tied to RSTP. That explains why a near identical environment didn't have this problem, the only difference being RSTP on the OVS bridge
 
Last edited:
I'm still not sure what this bug is exactly, but I was thinking about it the other day and I'm pretty sure whatever the issue was it ended up crashing our entire L2 infrastructure for a whole datacenter a while back. I was running Proxmox with OVS on a VFIO workstation / server at my desk and set it up to participate in RSTP to 2 different switches. The issue happened twice, seemingly at random. Hundreds of customers were down for almost an hour. All L2 communication was out. The first time it cleared up on it's own after 30 minutes or so, we didn't end up changing anything. The second time about an hour in out of pure suspicion I ended up unplugging the box in question and the issue cleared up instantly. All L2 forwarding resumed instantly. There were no topology changes and I couldn't see any loops. VLANs were pruned to a handfull on the upstream switch, but allowed VLAN1 for RSTP. I disabled all spanning tree on OVS and fell back to using a single cable, plugged the host back in and everything has been fine since.

I know this is extremely anecdotal, but this issue that I've been having with a totally separate environment made me reflect back on that other incident.
 
  • Like
Reactions: bobmc
I have a similar problem. OVSIntPort is not responding to arp-replies. As a workaround, I added static APR entries on the required devices. But is there a good solution for this?
 
I was able to replicate this specific issue in a completely separate environment on brand new hardware with different NICs, running a very current version of proxmox. It only affects OVSInt ports when RSTP is enabled.


Code:
root@J4125-01:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp9s0
iface enp9s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0

auto enp1s0
iface enp1s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr1
        ovs_mtu 9000

auto enp2s0
iface enp2s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr1
        ovs_mtu 9000

iface eno1 inet manual

iface enp7s0 inet manual

auto enp8s0
iface enp8s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0

auto management
iface management inet static
        address 192.168.15.71/24
        gateway 192.168.15.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=15

auto ceph
iface ceph inet static
        address 192.168.11.71/24
        ovs_type OVSIntPort
        ovs_bridge vmbr1
        ovs_mtu 9000

auto vmbr1
iface vmbr1 inet manual
        ovs_type OVSBridge
        ovs_ports enp1s0 enp2s0 ceph
        ovs_mtu 9000
        ovs_options rstp_enable=true
#Ceph

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports enp8s0 enp9s0 management
        ovs_options rstp_enable=true

root@J4125-01:~#
 
I'm facing the same issue.
The VLAN Interface ist not responding to arp requests. RSTP is enabled.

Does anybody found a solution for this?

André

Code:
auto lo
iface lo inet loopback

auto enp0s31f6
iface enp0s31f6 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr0
    ovs_mtu 9000
    ovs_options vlan_mode=trunk other_config:rstp-port-mcheck=true other_config:rstp-port-auto-edge=false other_config:rstp-path-cost=40001 other_config:rstp-port-admin-edge=false other_config:rstp-enable=true
#Eth0

iface enp2s0 inet manual
#Mellanox Port 1

auto enp2s0d1
iface enp2s0d1 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr0
    ovs_mtu 9000
    ovs_options vlan_mode=trunk other_config:rstp-port-mcheck=true other_config:rstp-path-cost=2001 other_config:rstp-port-auto-edge=false other_config:rstp-port-admin-edge=false other_config:rstp-enable=true
#Mellanox Port 2

auto vlan10
iface vlan10 inet static
    address 10.10.10.12/24
    gateway 10.10.10.1
    ovs_type OVSIntPort
    ovs_bridge vmbr0
    ovs_mtu 9000
    ovs_options tag=10
    ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)
#Server VLAN

auto vmbr0
iface vmbr0 inet manual
    ovs_type OVSBridge
    ovs_ports enp0s31f6 enp2s0d1 vlan10
    ovs_mtu 9000
    up ovs-vsctl set Bridge ${IFACE} rstp_enable=true other_config:rstp-priority=32768 other_config:rstp-forward-delay=4 other_config:rstp-max-age=6
    post-up sleep 10
 
Another one with the same problem. Can confirm its arp requests that are getting ignored. Can also confirm that this is specifically related to the combination of proxmox + openvswitch + rstp + ifupdown2.
To dig in to this further (I really need to have this work) I set up several tests.

the test setup consists of:
2 'vanilla' debian 11 machines. on top of that I added openvswitch, ifupdown2 and lldpd (all latest as of writing)
2 proxmox machines, not configured, but installed openvswitch, ifupdown2, and lldpd (all latest - from no-support repo - as of writing)
2 switches, in the current test Arista but the switch layer does not matter in this case, as long as it supports rstp.

the 2 debian machines have no issues at all, they can communicate with each other, they can communicate with the switch management ip and reach the proxmox machines.

the 2 proxmox machines can not communicate with each other, they can however communicate with anything else on the network when reaching out, and after initial contact other machine can reach them.

Now if I either disable rstp (and 1 redundant interface). Or install 'ifupdown' rather then 'ifupdown2' everything start working normally.

I have left the test setup intact so I can run more tests if wanted. At this moment though everything points to a proxmox patch on the debian packages of either kernel, ovs, or ifupdown2.

Kind regards.
 
  • Like
Reactions: AndreKW
Another one with the same problem. Can confirm its arp requests that are getting ignored. Can also confirm that this is specifically related to the combination of proxmox + openvswitch + rstp + ifupdown2.
To dig in to this further (I really need to have this work) I set up several tests.

the test setup consists of:
2 'vanilla' debian 11 machines. on top of that I added openvswitch, ifupdown2 and lldpd (all latest as of writing)
2 proxmox machines, not configured, but installed openvswitch, ifupdown2, and lldpd (all latest - from no-support repo - as of writing)
2 switches, in the current test Arista but the switch layer does not matter in this case, as long as it supports rstp.

the 2 debian machines have no issues at all, they can communicate with each other, they can communicate with the switch management ip and reach the proxmox machines.

the 2 proxmox machines can not communicate with each other, they can however communicate with anything else on the network when reaching out, and after initial contact other machine can reach them.

Now if I either disable rstp (and 1 redundant interface). Or install 'ifupdown' rather then 'ifupdown2' everything start working normally.

I have left the test setup intact so I can run more tests if wanted. At this moment though everything points to a proxmox patch on the debian packages of either kernel, ovs, or ifupdown2.

Kind regards.
Hi Michiel,

any updates on this?
Are all the versions the same or are there any version differences?

I'm really looking forward for a solution for this.....

Best regards
André
 
Issue still exists in PVE 8.1 with kernel 6.5. However I also tested this setup only debian-provided packages: ifupdown2, ovs, kernel. Issue exists also on Debian 12 clean install without Proxmox repositories.
 
Last edited:
Hello, the above issue has still not been fixed, I did find out with the ovs devs that this is a problem with ovs which occurs when you have multiple instances (switches) running on the same machine with rstp.. Now this should just work however rstp is a 'unsupported' feature in ovs so I doubt anything will be done to fix this.

For ourselves in the meantime we switched to using lacp for the hosts, either trough ovs or with the linux network options which works without problems. On the switch side this is configured as a mlag to keep the high availability. - Probably not what you want to hear to solve your issue but it is an alternative.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!