Linux Bridge (vmbr0) silently drops unicast traffic to specific TAP interfaces after corosync-qdevice setup incident — PVE 9.2.3, kernel 7.0.6-2-pve

Apr 29, 2024
5
0
6

Summary​


Since installing corosync-qdevice on our 4-node cluster (an installation that involved several hard VM stops/restarts due to a botched netplan config), the Linux bridge vmbr0 on multiple nodes intermittently and persistently fails to forward unicast traffic to specific VM TAP interfaces — even though:


  • the destination MAC address in the incoming Ethernet frame is correct
  • the bridge's FDB (bridge fdb show) has a permanent entry mapping that MAC to the correct TAP device
  • the VM itself is healthy, has the correct MAC configured in QEMU, and is actively sending ARP requests
  • arping from the host to the VM IP succeeds (Layer 2 reachable from the host's own stack)
  • there is no firewall (Proxmox firewall fully disabled, ebtables/nftables empty)
  • proxy_arp is disabled everywhere
  • the problem is not VM-specific: it follows whichever VM is "unlucky," moves between nodes after migration, and affects both Windows (virtio) and Linux (e1000) guests
  • it affects multiple nodes in the same cluster independently

Environment​


  • Proxmox VE 9.2.3, kernel 7.0.6-2-pve
  • pve-qemu-kvm 11.0.0-4
  • 4-node cluster (wecon-cluster), Ceph storage, bond0 (active-backup) → vmbr0 for LAN, separate vmbr1 for Corosync/Ceph
  • /etc/network/interfaces excerpt:


auto vmbr0
iface vmbr0 inet static
address 192.168.77.5/24
gateway 192.168.77.1
bridge-ports bond0
bridge-stp off
bridge-fd 0

  • No VLANs on vmbr0 itself (vlan-aware not set), no SDN zones in use for this bridge
  • No Proxmox firewall (pve-firewall status → disabled/running, /etc/pve/firewall/cluster.fw doesn't exist)

Timeline / Trigger​


  1. Built a small dedicated VM for corosync-qdevice with two NICs (vmbr1 for corosync net, vmbr0 for LAN/updates).
  2. During Ubuntu Server install, the install process hung after a DHCP failure on the LAN NIC; the VM was hard-stopped (qm stop) and the install repeated from scratch on the same VMID.
  3. Shortly after, multiple unrelated VMs (Windows 11 guests, Linux guests) on other nodes started showing the Windows "no internet" globe icon / general unreachability over LAN.
  4. Two days of troubleshooting later, root cause was narrowed down to: the Linux bridge on the affected node(s) stops delivering unicast frames to specific TAP ports, despite a 100% correct, permanent FDB entry.

Diagnostic evidence​


1. TAP MAC never matches the VM's configured MAC​



# qm config 2406 | grep net0
net0: virtio=BC:24:11:E9:D2:24,bridge=vmbr0,mtu=1500

# ip link show tap2406i0
57: tap2406i0: ...
link/ether be:b4:f2:c2:d2:d6 brd ff:ff:ff:ff:ff:ff

This appears to be normal/expected Proxmox behavior (confirmed via code review of pve-bridge / PVE::Network::tap_plug — the MAC is never explicitly set on the host-side TAP via ip link set ... address; QEMU is expected to "own" the MAC inside the guest only). Under normal circumstances the bridge's MAC-learning is supposed to make this irrelevant. In our case, it doesn't.


2. FDB has the correct, permanent entry — but the bridge ignores it​



# bridge fdb show | grep bc:24:11:e9:d2:24
bc:24:11:e9:d2:24 dev tap2406i0 vlan 1 master vmbr0 permanent
bc:24:11:e9:d2:24 dev tap2406i0 master vmbr0 permanent

3. Frames arrive at the bond/bridge with the correct destination MAC, but never reach the TAP​


From pfSense (gateway, 192.168.77.1), pinging the affected VM (192.168.77.106):



# tcpdump -i bond0 -n -e 'icmp and host 192.168.77.106'
a8:b8:e0:02:e8:51 > bc:24:11:e9:d2:24, ethertype IPv4 ...: 192.168.77.1 > 192.168.77.106: ICMP echo request
```//confirmed identical capture on vmbr0 itself

```
# tcpdump -i tap2406i0 -n icmp
(nothing — zero packets, ever)

So: frame enters via bond0 → vmbr0 with the exact MAC that the FDB says belongs to tap2406i0 → frame is simply never forwarded to that port. No drops are logged anywhere (ebtables counters all zero, no nft rules, no dmesg errors).


4. arping from the host succeeds (host's own ARP stack can reach the VM), but the VM never receives unicast replies from the gateway​



# arping -I vmbr0 -c3 192.168.77.106
Unicast reply from 192.168.77.106 [BC:24:11:E9:D2:24] 0.7ms (x3)

But ip neigh show for that IP shows STALE, and the VM's own outgoing ARP requests for the gateway are visible on the TAP, yet replies from the gateway never arrive at the TAP (same symptom as #3).


5. Workaround found: tearing down and rebuilding the bridge (​


After ifdown vmbr0 && ifup vmbr0, all FDB entries are cleared. Restarting the affected VMs causes the bridge to relearn the MACs and traffic flows correctly — for a while. The problem reappears later (sometimes within hours, sometimes after a day) for a subset of VMs again, without any further configuration changes.


6. Not a Proxmox firewall / ebtables / nftables / proxy_arp / STP issue​



# ebtables -L
Bridge table: filter
Bridge chain: INPUT, entries: 0, policy: ACCEPT
Bridge chain: FORWARD, entries: 0, policy: ACCEPT
Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

# nft list ruleset
(empty)

# sysctl net.ipv4.conf.vmbr0.proxy_arp
net.ipv4.conf.vmbr0.proxy_arp = 0

# cat /sys/class/net/vmbr0/bridge/stp_state
0

7. Cluster-wide, not single-node​


The same symptom (host cannot ping its own locally-running VMs/CTs over vmbr0, despite correct FDB) was independently reproduced on a second node in the same cluster, with completely different VMs/MACs.


What we've already tried (no lasting effect)​


  • Manually correcting TAP MAC via ip link set tapXi0 address <correct-mac> (works for ~minutes, doesn't fix the actual forwarding issue, which is bridge-internal, not MAC-mismatch related)
  • A post-start hookscript that sets TAP/fwln/fwpr interface MACs to match the VM's configured MAC immediately after start (helps temporarily, same as above)
  • Manually adding static FDB entries (bridge fdb add ... master static) — no effect, bridge still doesn't forward
  • Disabling Proxmox firewall entirely on affected VMs and cluster-wide
  • ip link set tapXi0 nomaster && ip link set tapXi0 master vmbr0 — no effect
  • Full ifdown vmbr0 && ifup vmbr0 — works temporarily, not a real fix, and momentarily drops the node off the network
  • apt full-upgrade to latest 9.2.3 / pve-qemu-kvm 11.0.0-4 — no change
  • Ruled out: STP, proxy_arp, ebtables, nftables, rp_filter (set to 2/loose), VLAN filtering (off), bridge ageing_time (default 300s), Ceph/Corosync network is unaffected and stable throughout (separate vmbr1, separate physical NIC)

Ruled out (additional)​


  • bridge kernel module is built-in (modinfo bridge → filename: (builtin)), confirmed present in modules.builtin along with br_netfilter — not a missing-module issue.

Questions for the community / devs​


  1. Is there a known kernel bug (possibly related to bridge FDB / br_fdb_update or netfilter bridge hooks) in kernel 7.0.6-2-pve that can cause the bridge to silently stop forwarding to a port despite a correct, permanent FDB entry?
  2. Could the qdevice VM's second NIC (also on vmbr0, used only for LAN/internet access for apt) have somehow corrupted the bridge's internal port/FDB state during the botched install (multiple hard qm stop cycles while the TAP was mid-negotiation)? We have since stopped that VM/NIC entirely and the issue persists on completely unrelated VMs/ports, so this seems unlikely to be the direct ongoing cause, but might have been the trigger.
  3. Anyone else seen a Linux bridge "lose" a working FDB entry's actual forwarding behavior (entry stays in bridge fdb show, but frames to that MAC are silently dropped) under PVE 9.x / kernel 7.0.x?
  4. Is ethtool -K bond0 tx-scatter-gather-fraglist off (currently shows [requested on] but effectively off [fixed] upstream — seen in our ethtool -k output) or any other offload setting on the bonded NICs (Intel, mixed 10G/1G slaves in active-backup mode) known to interact badly with bridging in this kernel?

Happy to provide full qm config, ethtool -i/-k for both bond members, dmesg, and pveversion -v if useful. This has been ongoing for over a week and is affecting production VMs across the cluster intermittently in a way that is very hard to pin down because the only reliable "fix" (bridge teardown/rebuild) is itself disruptive.

Important addendum:

Re-reading our timeline more carefully, I want to flag a possible earlier trigger:

A few days before the corosync-qdevice installation, we performed a full cluster upgrade from PVE 9.x to PVE 9.2.3 / kernel 7.0.6-2-pve. In retrospect, the Windows guest VMs had already begun showing intermittent network issues (the "no internet" globe icon) shortly after that upgrade — but at the time we attributed it to other causes and it seemed minor.

The corosync-qdevice installation (which involved several hard qm stop cycles on a VM with a dual-NIC config on vmbr0) may have merely accelerated or made fully visible a pre-existing bridge forwarding regression introduced by the kernel/package upgrade. Key observations supporting this:

  • The Linux VMs were initially not affected at all — only Windows guests showed the globe icon at first
  • Over the following days, the problem spread progressively to Linux VMs as well (Wazuh, Mailrelay/Monitoring, CheckMK)
  • A "creeping" failure pattern like this is more consistent with a kernel regression triggered under certain traffic conditions than with a one-time configuration corruption event
So the actual root cause may be a bug in kernel 7.0.6-2-pve (or one of the updated pve packages) rather than — or in addition to — the qdevice incident. The qdevice VM's hard stops may have been the straw that broke the camel's back, not the root cause itself.

Has anyone else observed bridge forwarding degradation after upgrading to kernel 7.0.6-2-pve?
 

Attachments

Last edited:
Update (July 2nd, 2026) — Problem further narrowed down


After extensive additional troubleshooting, we have isolated the issue to a single, clear symptom:


The Linux bridge silently drops incoming unicast frames to TAP interfaces — despite a correct, permanent FDB entry. Outgoing traffic from VMs works correctly.


Concrete evidence:



Wazuh VM (192.168.77.99, MAC bc:24:11:59:93:1d) running on PVE3:



# tcpdump -i tap1920i0 -n arp -c 10
ARP, Request who-has 192.168.77.99 tell 192.168.77.106 ← another host looking for Wazuh
ARP, Reply 192.168.77.99 is-at bc:24:11:59:93:1d ← Wazuh replies correctly ✓
ARP, Request who-has 192.168.77.1 tell 192.168.77.99 ← Wazuh asks for gateway
(no ARP reply from gateway arrives at the TAP) ← bridge does not forward reply ✗

FDB is correct:



# bridge fdb show | grep bc:24:11:59:93:1d
bc:24:11:59:93:1d dev tap1920i0 vlan 1 master vmbr0 permanent
bc:24:11:59:93:1d dev tap1920i0 master vmbr0 permanent

Frames arrive at bond0/vmbr0 with the correct destination MAC — but never reach tap1920i0:



# tcpdump -i bond0 -n -e 'arp host 192.168.77.99'
a8:b8:e0:02:e8:51 > bc:24:11:59:93:1d ARP Reply 192.168.77.1 is-at a8:b8:e0:02:e8:51

# tcpdump -i tap1920i0 -n 'arp and src host 192.168.77.1'
(nothing — zero packets)

Summary:


  • Outgoing traffic (VM → bridge → physical network): works ✓
  • Incoming unicast (physical network → bridge → TAP): does NOT work ✗
  • Incoming broadcasts (e.g. foreign ARP requests): do reach the TAP ✓
  • Only unicast frames addressed to the VM's MAC are not forwarded ✗

Affects: all VMs on both remaining nodes (PVE0 and PVE3), Windows and Linux guests alike.


Temporary workaround: ifdown vmbr0 && ifup vmbr0 followed by VM restarts restores connectivity briefly (minutes to hours), then the problem recurs.


Key observation: Broadcast frames reach the TAP correctly, but unicast frames do not — even though the FDB has a correct permanent entry for that MAC on that TAP port. This suggests the bridge is either flooding to the wrong port or silently dropping unicast frames despite a valid FDB lookup.


Does anyone have a pointer to a known kernel bug in 7.0.6-2-pve that produces exactly this behavior? Or a more durable workaround than a full bridge teardown/rebuild?
 
I don't recall any issues with the 7.0 kernel and bridge forwarding. You could try out the old 6.17 kernel and see if it works there (https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_kernel_pin).

Otherwise please post:

Code:
ip -d link show vmbr0
bridge -d link show master vmbr0
bridge fdb show br vmbr0
bridge vlan show
qm config <broken-VMID>
 
Reply to Gabriel's request — diagnostic output + important observation


Thank you Gabriel, here is the requested output from both affected nodes.


PVE0 (VM 1405 / Win11-BH as broken example):
ip -d link show vmbr0
bridge -d link show master vmbr0
bridge fdb show br vmbr0
bridge vlan show
qm config 1405
91: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a0:ad:9f:78:76:87 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
bridge forward_delay 0 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.a0:ad:9f:78:76:87 designated_root 8000.a0:ad:9f:78:76:87 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 97.98 fdb_n_learned 38 fdb_max_learned 0 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mst_enabled 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_querier_ipv4_addr 192.168.77.200 mcast_querier_ipv4_port 8 mcast_querier_ipv4_other_timer 15.7us mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 3 mcast_max_groups 0
92: tap1405i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 0 mcast_max_groups 0
93: tap2406i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 0 mcast_max_groups 0
94: tap3407i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 0 mcast_max_groups 0
ec:a9:07:15:61:3f dev bond0 master vmbr0
94:83:c4:af:9c:6b dev bond0 master vmbr0
b2:bc:ad:59:91:70 dev bond0 master vmbr0
38:ca:84:a5:a1:db dev bond0 master vmbr0
fe:25:f2:f4:26:56 dev bond0 master vmbr0
22:f9:6d:25:67:46 dev bond0 master vmbr0
76:25:aa:9b:c6:d2 dev bond0 master vmbr0
d0:11:e5:f4:09:b0 dev bond0 master vmbr0
fe:ca:c5:f6:00:d9 dev bond0 master vmbr0
36:08:8f:72:55:98 dev bond0 master vmbr0
10:e1:8e:00:b6:84 dev bond0 master vmbr0
f6:5e:10:e4:aa:a9 dev bond0 master vmbr0
5c:31:92:a2:2d:70 dev bond0 master vmbr0
cc:3b:fb:e4:ce:2e dev bond0 master vmbr0
00:4b:12:ec:83:d7 dev bond0 master vmbr0
04:7b:cb:67:ee:30 dev bond0 master vmbr0
bc:24:11:7b:41:db dev bond0 master vmbr0
bc:24:11:59:93:1d dev bond0 master vmbr0
cc:3b:fb:5f:05:16 dev bond0 master vmbr0
8c:94:61:53:88:62 dev bond0 master vmbr0
78:9a:18:28:aa:2c dev bond0 master vmbr0
90:bf:d9:63:e5:29 dev bond0 master vmbr0
bc:d7:a5:7d:af:40 dev bond0 master vmbr0
a4:0e:75:fb:f4:49 dev bond0 master vmbr0
90:09:d0:4c:26:29 dev bond0 master vmbr0
88:25:10:2e:d9:80 dev bond0 master vmbr0
7c:a8:ec:91:8a:1c dev bond0 master vmbr0
00:c8:4e:f7:bd:58 dev bond0 master vmbr0
ec:50:aa:47:fd:65 dev bond0 master vmbr0
04:f4:1c:ec:c4:66 dev bond0 master vmbr0
64:e8:81:34:97:8f dev bond0 master vmbr0
50:41:1c:8b:84:74 dev bond0 master vmbr0
8c:f8:c5:ce:a4:6e dev bond0 master vmbr0
bc:d7:a5:7d:af:42 dev bond0 master vmbr0
a8:b8:e0:02:e8:51 dev bond0 master vmbr0
a0:ad:9f:78:78:12 dev bond0 master vmbr0
04:7c:16:9d:91:06 dev bond0 master vmbr0
10:7c:61:2f:56:29 dev bond0 master vmbr0
a0:ad:9f:78:76:87 dev bond0 vlan 1 master vmbr0 permanent
a0:ad:9f:78:76:87 dev bond0 master vmbr0 permanent
01:00:5e:00:00:01 dev bond0 self permanent
33:33:00:00:00:01 dev vmbr0 self permanent
33:33:00:00:00:02 dev vmbr0 self permanent
01:00:5e:00:00:6a dev vmbr0 self permanent
33:33:00:00:00:6a dev vmbr0 self permanent
01:00:5e:00:00:01 dev vmbr0 self permanent
33:33:ff:78:76:87 dev vmbr0 self permanent
33:33:ff:00:00:00 dev vmbr0 self permanent
bc:24:11:e2:2e:ec dev tap1405i0 vlan 1 master vmbr0 permanent
bc:24:11:e2:2e:ec dev tap1405i0 master vmbr0 permanent
33:33:00:00:00:01 dev tap1405i0 self permanent
01:00:5e:00:00:01 dev tap1405i0 self permanent
01:80:c2:00:00:0e dev tap1405i0 self permanent
01:80:c2:00:00:03 dev tap1405i0 self permanent
01:80:c2:00:00:00 dev tap1405i0 self permanent
bc:24:11:e9:d2:24 dev tap2406i0 vlan 1 master vmbr0 permanent
bc:24:11:e9:d2:24 dev tap2406i0 master vmbr0 permanent
33:33:00:00:00:01 dev tap2406i0 self permanent
01:00:5e:00:00:01 dev tap2406i0 self permanent
01:80:c2:00:00:0e dev tap2406i0 self permanent
01:80:c2:00:00:03 dev tap2406i0 self permanent
01:80:c2:00:00:00 dev tap2406i0 self permanent
bc:24:11:2e:61:ec dev tap3407i0 vlan 1 master vmbr0 permanent
bc:24:11:2e:61:ec dev tap3407i0 master vmbr0 permanent
33:33:00:00:00:01 dev tap3407i0 self permanent
01:00:5e:00:00:01 dev tap3407i0 self permanent
01:80:c2:00:00:0e dev tap3407i0 self permanent
01:80:c2:00:00:03 dev tap3407i0 self permanent
01:80:c2:00:00:00 dev tap3407i0 self permanent
port vlan-id
eno2 1 PVID Egress Untagged
vmbr1 1 PVID Egress Untagged
bond0 1 PVID Egress Untagged
tap1950i0 1 PVID Egress Untagged
vmbr0 1 PVID Egress Untagged
tap1405i0 1 PVID Egress Untagged
tap2406i0 1 PVID Egress Untagged
tap3407i0 1 PVID Egress Untagged
agent: 1
balloon: 12288
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: host
cpuunits: 1024
efidisk0: ceph-pool:vm-1405-disk-0,efitype=4m,ms-cert=2023k,pre-enrolled-keys=1,size=528K
hookscript: local:snippets/fix-tap-mac.pl
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.5,ctime=1714039752
name: Win11-BH
net0: virtio=BC:24:11:E2:2E:EC,bridge=vmbr0,mtu=1500
numa: 0
onboot: 1
ostype: win11
scsi0: ceph-pool:vm-1405-disk-1,cache=writeback,discard=on,iothread=1,size=256G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=7458cec0-ac35-41f1-a9e1-9f9d18aed061
sockets: 1
tpmstate0: ceph-pool:vm-1405-disk-2,size=4M,version=v2.0
vmgenid: 3869a96d-a7b9-46b8-a8cf-df38a2732999
root@pve0:~#



PVE3 (VM 1920 / Wazuh as broken example):

ssh root@192.168.77.8 "
ip -d link show vmbr0
bridge -d link show master vmbr0
bridge fdb show br vmbr0
bridge vlan show
qm config 1920
"
58: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a0:ad:9f:78:78:12 brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
bridge forward_delay 0 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.a0:ad:9f:78:78:12 designated_root 8000.a0:ad:9f:78:78:12 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 17.77 fdb_n_learned 40 fdb_max_learned 0 vlan_default_pvid 1 vlan_stats_enabled 0 vlan_stats_per_port 0 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 no_linklocal_learn 0 mcast_vlan_snooping 0 mst_enabled 0 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_querier_ipv4_addr 192.168.77.200 mcast_querier_ipv4_port 6 mcast_querier_ipv4_other_timer 20.1us mcast_hash_elasticity 16 mcast_hash_max 4096 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3125 mcast_stats_enabled 0 mcast_igmp_version 2 mcast_mld_version 1 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 gso_ipv4_max_size 65536 gro_ipv4_max_size 65536
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 3 mcast_max_groups 0
59: tap1920i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 0 mcast_max_groups 0
60: tap1960i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 0 mcast_max_groups 0
61: tap1940i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master vmbr0 state forwarding priority 32 cost 2
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on bcast_flood on mcast_router 1 mcast_to_unicast off neigh_suppress off neigh_vlan_suppress off vlan_tunnel off isolated off locked off mab off mcast_n_groups 0 mcast_max_groups 0
9c:2d:cd:ad:ce:e8 dev bond0 master vmbr0
ec:a9:07:15:61:3f dev bond0 master vmbr0
94:83:c4:af:9c:6b dev bond0 master vmbr0
b2:bc:ad:59:91:70 dev bond0 master vmbr0
38:ca:84:a5:a1:db dev bond0 master vmbr0
fe:25:f2:f4:26:56 dev bond0 master vmbr0
22:f9:6d:25:67:46 dev bond0 master vmbr0
76:25:aa:9b:c6:d2 dev bond0 master vmbr0
d0:11:e5:f4:09:b0 dev bond0 master vmbr0
fe:ca:c5:f6:00:d9 dev bond0 master vmbr0
36:08:8f:72:55:98 dev bond0 master vmbr0
cc:3b:fb:e4:ce:2e dev bond0 master vmbr0
f6:5e:10:e4:aa:a9 dev bond0 master vmbr0
00:4b:12:ec:83:d7 dev bond0 master vmbr0
04:f4:1c:ec:c4:66 dev bond0 master vmbr0
5c:31:92:a2:2d:70 dev bond0 master vmbr0
04:7b:cb:67:ee:30 dev bond0 master vmbr0
cc:3b:fb:5f:05:16 dev bond0 master vmbr0
bc:24:11:e2:2e:ec dev bond0 master vmbr0
90:09:d0:4c:26:29 dev bond0 master vmbr0
8c:94:61:53:88:62 dev bond0 master vmbr0
8c:f8:c5:ce:a4:6e dev bond0 master vmbr0
bc:d7:a5:7d:af:40 dev bond0 master vmbr0
a4:0e:75:fb:f4:49 dev bond0 master vmbr0
7c:a8:ec:91:8a:1c dev bond0 master vmbr0
88:25:10:2e:d9:80 dev bond0 master vmbr0
00:c8:4e:f7:bd:58 dev bond0 master vmbr0
ec:50:aa:47:fd:65 dev bond0 master vmbr0
64:e8:81:34:97:8f dev bond0 master vmbr0
78:9a:18:28:aa:2c dev bond0 master vmbr0
04:7c:16:9d:91:06 dev bond0 master vmbr0
10:e1:8e:00:b6:84 dev bond0 master vmbr0
90:bf:d9:63:e5:29 dev bond0 master vmbr0
10:7c:61:2f:56:29 dev bond0 master vmbr0
bc:d7:a5:7d:af:44 dev bond0 master vmbr0
50:41:1c:8b:84:74 dev bond0 master vmbr0
bc:24:11:2e:61:ec dev bond0 master vmbr0
bc:24:11:e9:d2:24 dev bond0 master vmbr0
a8:b8:e0:02:e8:51 dev bond0 master vmbr0
a0:ad:9f:78:76:87 dev bond0 master vmbr0
a0:ad:9f:78:78:12 dev bond0 vlan 1 master vmbr0 permanent
a0:ad:9f:78:78:12 dev bond0 master vmbr0 permanent
01:00:5e:00:00:01 dev bond0 self permanent
33:33:00:00:00:01 dev vmbr0 self permanent
33:33:00:00:00:02 dev vmbr0 self permanent
01:00:5e:00:00:6a dev vmbr0 self permanent
33:33:00:00:00:6a dev vmbr0 self permanent
01:00:5e:00:00:01 dev vmbr0 self permanent
33:33:ff:78:78:12 dev vmbr0 self permanent
33:33:ff:00:00:00 dev vmbr0 self permanent
bc:24:11:59:93:1d dev tap1920i0 vlan 1 master vmbr0 permanent
bc:24:11:59:93:1d dev tap1920i0 master vmbr0 permanent
33:33:00:00:00:01 dev tap1920i0 self permanent
01:00:5e:00:00:01 dev tap1920i0 self permanent
bc:24:11:7b:41:db dev tap1960i0 vlan 1 master vmbr0 permanent
bc:24:11:7b:41:db dev tap1960i0 master vmbr0 permanent
33:33:00:00:00:01 dev tap1960i0 self permanent
01:00:5e:00:00:01 dev tap1960i0 self permanent
bc:24:11:9a:d1:fb dev tap1940i0 vlan 1 master vmbr0 permanent
bc:24:11:9a:d1:fb dev tap1940i0 master vmbr0 permanent
33:33:00:00:00:01 dev tap1940i0 self permanent
01:00:5e:00:00:01 dev tap1940i0 self permanent
port vlan-id
nic2 1 PVID Egress Untagged
bond0 1 PVID Egress Untagged
vmbr1 1 PVID Egress Untagged
vmbr0 1 PVID Egress Untagged
tap1920i0 1 PVID Egress Untagged
tap1960i0 1 PVID Egress Untagged
tap1940i0 1 PVID Egress Untagged
agent: 1
balloon: 0
boot: order=scsi0;net0
cores: 4
cpu: x86-64-v2-AES
hookscript: local:snippets/fix-tap-mac.pl
memory: 12288
meta: creation-qemu=9.2.0,ctime=1746704188
name: Wazuh
net0: virtio=BC:24:11:59:93:1D,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
parent: pre-wazuh-update-4145
scsi0: ceph-pool:vm-1920-disk-0,iothread=1,size=256G
scsihw: virtio-scsi-single
smbios1: uuid=2b0509d8-fa45-449b-bf96-6b51ec105a4a
sockets: 2
vmgenid: 88fff15e-9535-45b6-a4dc-2af6b7ba4acf
root@pve0:~#





Important observation from analyzing the output:


Looking at the FDB of PVE3's vmbr0, I notice that the MACs of VMs running on PVE0 appear learned on bond0 of PVE3:



bc:24:11:e2:2e:ec dev bond0 master vmbr0 ← MAC of VM 1405, running on PVE0
bc:24:11:2e:61:ec dev bond0 master vmbr0 ← MAC of VM 3407, running on PVE0
bc:24:11:e9:d2:24 dev bond0 master vmbr0 ← MAC of VM 2406, running on PVE0

These MACs should never appear on PVE3's bond0 — they belong to VMs that are local to PVE0. This means the bridge on PVE3 has learned these MACs on the wrong port (the uplink to the physical switch), which would cause the physical switch to route return traffic for these VMs to PVE3 instead of PVE0 — explaining why unicast frames never reach the correct TAP.


Similarly, on PVE0's FDB we see:
bc:24:11:59:93:1d dev bond0 master vmbr0 ← MAC of Wazuh VM, running on PVE3
bc:24:11:7b:41:db dev bond0 master vmbr0 ← MAC of CheckMK VM, running on PVE3

Again, these are MACs of VMs local to PVE3 but learned on PVE0's bond0.


Possible cause — IGMP/Multicast Snooping interaction:


Both bridges show mcast_snooping 1 and both report an external IGMP querier at 192.168.77.200 (our HPE Aruba Instant On 1960 core switch):

mcast_querier_ipv4_addr 192.168.77.200

Could the interaction between the bridge's multicast snooping and the external IGMP querier on the physical switch cause MAC learning to be confused across nodes — leading to VM MACs being learned on the wrong bridge port (bond0/uplink instead of the local TAP)?


Disabling multicast snooping on vmbr0 as a test:



bash
echo 0 > /sys/class/net/vmbr0/bridge/multicast_snooping

Would this be a safe/recommended test to try? And is there a persistent way to disable it in /etc/network/interfaces?