Proxmox datacenter and Cisco nexus cluster issue

tiwang

New Member
May 20, 2025
4
0
1
hi out there
We have a annoying issue with a 4 node proxmox cluster which is spanned across 2 sites with a L2 10G dark fiber in between. We are using Cisco Nexus 9k at both sites. On site #1 we have no problems - here we have attached the 2 nodes to a single TOR switch whereas in the opposite site we have attached the nodes to 2 Nexus 9k with one node on each switch. The switches are acting as a cluster trough a VPC. But - the nexus 9k swiches there auses problems to us there - my guess is that we have mis-configured something on the VNET. We have 2 vlans there - vlan 115 for "management" and vlan 116 for vm's.

As we can see here it looks like something causing a loop:

2025 May 20 08:20:53 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 08:21:02 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 08:23:02 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 08:23:12 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 08:25:12 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:22:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:24:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:33:52 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:35:52 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:36:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:38:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:38:11 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:40:11 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:40:19 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:42:19 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115

Anyone seen similar behaviour with nexus switches?
 
Last edited:
hi out there
We have a annoying issue with a 4 node proxmox cluster which is spanned across 2 sites with a L2 10G dark fiber in between. We are using Cisco Nexus 9k at both sites. On site #1 we have no problems - here we have attached the 2 nodes to a single TOR switch whereas in the opposite site we have attached the nodes to 2 Nexus 9k with one node on each switch. The switches are acting as a cluster trough a VPC. But - the nexus 9k swiches there auses problems to us there - my guess is that we have mis-configured something on the VNET. We have 2 vlans there - vlan 115 for "management" and vlan 116 for vm's.

As we can see here it looks like something causing a loop:

2025 May 20 08:20:53 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 08:21:02 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 08:23:02 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 08:23:12 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 08:25:12 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:22:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:24:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:33:52 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:35:52 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:36:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:38:01 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:38:11 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:40:11 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115
2025 May 20 11:40:19 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_DISABLE_LEARN: Disabling learning in vlan 115 for 120s due to too many mac moves
2025 May 20 11:42:19 DKTONDAC-- %L2FM-2-L2FM_MAC_FLAP_RE_ENABLE_LEARN: Re-enabling learning in vlan 115

Anyone seen similar behaviour with nexus switches?
ok - problem solved - was not a issue in proxmox but a wrong - missing - config of a VPC - problem solved after adding the missing VPC definition
 
Just keep in mind that if you loose the link between the sites, either due to physical or software reasons, you will not have a quorum on either side and the nodes can reboot and stay in degraded state in both locations.

This point is quite important. @tiwang Consider setting up a Linux device as a quorum node: https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
An old desktop pc or raspberry pi would be enough as long as it can run Debian Bookworm
 
Last edited:
  • Like
Reactions: bbgeek17
This point is quite important. @tiwang Consider seting up a Linux device as a quorum node: https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
An old desktop pc or raspberry pi would be enough as long as it can run Debian Bookworm
It also needs to be in 3rd site to be fare to any DC failure. Otherwise one can have a failure of a site containing the quorum and the surviving 2-node site would be in minority.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S