change slave on vmbr0 -> lost connection ??

ThierryIT69

Member
Mar 28, 2024
46
4
13
Hello,
I have a cluster consisting of 3 nodes. Each node is equipped with 4 network interfaces (NICs):
  • 2 × 1G Ethernet ports
  • 2 × SFP+ ports
Current Network Usage:
  • vmbr0/nic0 (Ethernet): Used for internet access and management traffic.
  • nic3 (SFP+): Dedicated solely for migration traffic.
Objective:
I want to migrate the management interface from the Ethernet port (nic0) to the second SFP+ port (nic3), with the ultimate goal of using only the two SFP+ ports (nic2 and nic3) for all traffic on each node.

Steps Taken So Far:
  • Modified the vmbr0 bridge configuration on each node, replacing the slave interface from nic0 to nic2.
  • Applied the changes and restarted the network configuration.
  • Result: Lost internet connectivity. Had to manually revert the configuration to restore access.
Question:
Are there any cluster-specific considerations or steps I need to take when reconfiguring network interfaces in this way? For example, do I need to update the cluster configuration, ensure interface bonding, or handle failover settings differently?

Thx a lot
 
Hi @ThierryIT69

Could you please send the content of /etc/network/interfaces here for both the old configuration (working) and the new configuration (not working)?

Best regards,
NT
 
Old config working:

Code:
auto lo
iface lo inet loopback

iface nic0 inet manual

iface nic1 inet manual

auto nic2
iface nic2 inet manual

auto nic3
iface nic3 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.206.32/26
        gateway 192.168.206.11
        bridge-ports nic0
        bridge-stp off
        bridge-fd 0

auto vmbr101
iface vmbr101 inet static
        address 192.168.101.10/29
        bridge-ports nic3
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

source /etc/network/interfaces.d/*

new config not working:

Code:
auto lo
iface lo inet loopback

iface nic0 inet manual

iface nic1 inet manual

auto nic2
iface nic2 inet manual

auto nic3
iface nic3 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.206.32/26
        gateway 192.168.206.11
        bridge-ports nic2
        bridge-stp off
        bridge-fd 0

auto vmbr101
iface vmbr101 inet static
        address 192.168.101.10/29
        bridge-ports nic3
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

source /etc/network/interfaces.d/*
 
Last edited:
Although the configurations seem correct, I have had cases where the network apply/reload/restart doesn't remove the old configuration... it just applies the new one. This could lead to strange situations like two default gateways, or the same routing through different networks and so on.
IMHO, the first step to validate that the network changes are applied correctly is to check and verify the outputs of:

Code:
# ip r; echo; brctl show vmbr0

I would run this both in a working and non-working state to look for anything suspicious. Depending on the output of these commands in both states, we could decide what the next check could be. Please paste the outputs here.

Best regards,
NT
 
working one:

Code:
root@prx-node3:~# ip r; echo; brctl show vmbr0
default via 192.168.206.11 dev vmbr0 proto kernel onlink
192.168.101.8/29 dev vmbr101 proto kernel scope link src 192.168.101.13
192.168.206.0/26 dev vmbr0 proto kernel scope link src 192.168.206.32

bridge name     bridge id               STP enabled     interfaces
vmbr0           8000.4c52622b23cf       no              nic0
 
only this part has changed.

Code:
bridge name     bridge id               STP enabled     interfaces
vmbr0           8000.f8f21e4d7c68       no              nic2

Code:
nslookup www.google.com
communications error to 8.8.8.8#53: host unreachable
 
Last edited:
OK, nice... that means that network changes are applied properly. One reason excluded :)

One clarification here:
- The "working" configuration is with IP 192.168.206.32, while the "non-working" configuration is with IP 192.168.206.55 - is this the same physical node or are they two different?

Next, I would ping from the host (192.168.206.55) to the gateway (192.168.206.11), and in different consoles, I would check with tcpdump what happens.

Are the packages generated and sent properly through eno2?
Are the ping requests received by the gateway?|
Does the gateway server generate and send responses?
Are these responses reaching the host?

Probably, somewhere the connectivity is broken, so we need to figure out where.

One more thing. Could you please verify that with the eno2 active interface, you have the proper MAC address of the gateway interface? It could be checked by:

Code:
# arp -an

Best regards,
NT
 
eno2 ?? Did you meant nic2 ?

prx-node1 (206.55 - ethernet) ping to the GW : Ok
prx-nod5 (206.58 - ethernet} ping to the GW: Ok

There is no "arp" in any of my node.

How can you run tcpdump on a node that is offline due to an SFP module issue? ??
 
> eno2 ?? Did you meant nic2 ?
Yes, I mean nic2... so many network naming conventions and I was confused... sorry, my mistake :)

And things get more and more confusing :)
  • Once you show an example from prx-node3
  • Then you talk about ping from prx-node1 and prx-nod5
  • In the last reply, you talked about 206.55 and 206.58, which look like VLANs, but on vmbr0, in the shared configuration, there are no VLANs
There are a lot of details that are mixed here, and before we clear them out, it will be hard to continue forward :)

To not flood this thread with clearance with the small detail, I will propose that you send me a direct message where we could clear every details and when we find the reason and the solution, to write here what and how.

Best regards,
NT