No ping on the second network bridge

mindio · May 8, 2024

Dear All,

I am stuck for several days so seek for wisdom. Long story short, I had 2 node cluster, then removed second node and upgraded nic on primary node (HP dl380 gen9). Now ready to add back new node to the cluster, but secondary network link, intended for corosync on primary node is not working.
I have vlan10 on Microtik CRS309 switch for corosync network, and on the second node both links for main untaged subnet an vlan10 works perfectly fine. But I can't ping vlan10 IP on primary node.
My network settings on primary HP node:

Code:

auto vmbr0
iface vmbr0 inet static
        address 192.168.88.8/24
        gateway 192.168.88.1
        bridge-ports eno49np0
        bridge-stp off
        bridge-fd 0
#bridge-private

auto vmbr1
iface vmbr1 inet static
        address 10.10.10.8/24
        bridge-ports eno50np1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#vlan-proxmox

All the links ar UP:

From the switch side, all the links are fine, perfectly negotiated at 10gbit, ports recognized on VLAN bridge, tried to switch ports, cables, sfp modules, DAC's, but still nothing.

ethtool:

Code:

Settings for eno50np1:
        Supported ports: [ FIBRE         Backplane ]
        Supported link modes:   1000baseKX/Full
                                10000baseKR/Full
                                25000baseCR/Full
                                25000baseKR/Full
                                25000baseSR/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None        RS      BASER
        Advertised link modes:  1000baseKX/Full
                                10000baseKR/Full
                                25000baseCR/Full
                                25000baseKR/Full
                                25000baseSR/Full
        Advertised pause frame use: Symmetric
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None
        Speed: 10000Mb/s
        Duplex: Full
        Auto-negotiation: on
        Port: FIBRE
        PHYAD: 0
        Transceiver: internal
        Supports Wake-on: g
        Wake-on: d
        Link detected: yes

lshw:

Code:

  *-network:1
       description: Ethernet interface
       product: MT27710 Family [ConnectX-4 Lx]
       vendor: Mellanox Technologies
       physical id: 0.1
       bus info: pci@0000:04:00.1
       logical name: eno50np1
       version: 00
       serial: 04:09:73:dd:62:b1
       size: 10Gbit/s
       capacity: 25Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical fibre 1000bt-fd 10000bt-fd 25000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=mlx5_core driverversion=6.8.4-2-pve duplex=full firmware=14.18.2030 (HP_2690110034) latency=0 link=yes multicast=yes port=fibre speed=10Gbit/s

I daubt this is network hardware related, but I may be runing in circles and missing something obvious. So please help to debug this, clear my mind and get ping on 10.10.10.8

Thanks.

bbgeek17 · May 8, 2024

Basic stuff looks ok at first glance. I'd say move on to network traces, look at ARPs, check MTUs

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

gfngfn256 · May 8, 2024

Maybe a conflict with the IP 10.10.10.8 since it was used before on different HW/NIC maybe?
I imagine you tried a complete reboot of Router(s)/Switch(s) etc?

mindio · May 8, 2024

gfngfn256 said:
Maybe a conflict with the IP 10.10.10.8 since it was used before on different HW/NIC maybe?
I imagine you tried a complete reboot of Router(s)/Switch(s) etc?

Yes, sure. Also checked/cleaned ARP table, MTU's are default at 1500 on both ends.

gfngfn256 · May 8, 2024

mindio said:
But I can't ping vlan10 IP on primary node.

I imagine you tried that from secondary PVE node. Can you try it from a different device on the 10.10.10.0 VLAN? Then try it also from a different device to the secondary PVE node on that VLAN.

How about pinging from the Primary PVE node to other node / device on the 10.10.10.0 VLAN?

mindio · May 9, 2024

gfngfn256 said:
I imagine you tried that from secondary PVE node. Can you try it from a different device on the 10.10.10.0 VLAN? Then try it also from a different device to the secondary PVE node on that VLAN.

How about pinging from the Primary PVE node to other node / device on the 10.10.10.0 VLAN?

I can ping both nic ports of secondary pve node (192.168.88.7 and 10.10.10.7) from my PC which is on 192.168.88.0 network, also from the router. Currently don't have any more devices on 10.10.10.0 network (except both pve nodes), but will configure something on weekend.

Also just conected second nic ports of both pve nodes directly with DAC to eliminate switch. There is a link but no ping, might need reboot, but currently can't shutdown VMs, will try tomorrow.

mindio · Friday at 12:59

So I tried to connect second nic ports on both machines directly, but result is the same, there is a link, but no ping.

Now after some more debugging, I have arrived at this weird situation. If I run tcpdump on 10.10.10.8 port, then ping from 10.10.10.7 immediately start working. But as soon, as I kill tcpdump, ping also dies. WTF?

Code:

root@galadriel:~# tcpdump -i eno50np1
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eno50np1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
11:59:06.333134 STP 802.1w, Rapid STP, Flags [Proposal, Learn, Forward], bridge-id 8000.08:55:31:fb:f5:d8.8008, length 36
11:59:06.519528 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:07.521967 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:08.335515 STP 802.1w, Rapid STP, Flags [Proposal, Learn, Forward], bridge-id 8000.08:55:31:fb:f5:d8.8008, length 36
11:59:08.525072 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:09.523939 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:10.338023 STP 802.1w, Rapid STP, Flags [Proposal, Learn, Forward], bridge-id 8000.08:55:31:fb:f5:d8.8008, length 36
11:59:10.518294 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:11.517465 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:12.340534 STP 802.1w, Rapid STP, Flags [Proposal, Learn, Forward], bridge-id 8000.08:55:31:fb:f5:d8.8008, length 36
11:59:12.517277 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:12.677822 IP 10.10.10.7 > 10.10.10.8: ICMP echo request, id 6022, seq 1, length 64
11:59:12.677912 IP 10.10.10.8 > 10.10.10.7: ICMP echo reply, id 6022, seq 1, length 64
11:59:13.517399 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:13.702866 IP 10.10.10.7 > 10.10.10.8: ICMP echo request, id 6022, seq 2, length 64
11:59:13.702907 IP 10.10.10.8 > 10.10.10.7: ICMP echo reply, id 6022, seq 2, length 64
11:59:14.343045 STP 802.1w, Rapid STP, Flags [Proposal, Learn, Forward], bridge-id 8000.08:55:31:fb:f5:d8.8008, length 36
11:59:14.525971 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:14.726903 IP 10.10.10.7 > 10.10.10.8: ICMP echo request, id 6022, seq 3, length 64
11:59:14.726951 IP 10.10.10.8 > 10.10.10.7: ICMP echo reply, id 6022, seq 3, length 64
11:59:15.528842 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 62:81:d4:b8:96:bb (oui Unknown), length 300
11:59:15.750858 IP 10.10.10.7 > 10.10.10.8: ICMP echo request, id 6022, seq 4, length 64
11:59:15.750890 IP 10.10.10.8 > 10.10.10.7: ICMP echo reply, id 6022, seq 4, length 64

gfngfn256 · Friday at 13:46

I remember once reading about a similar situation. As far as I can theorize, I would say its probably got to do with the interface not being in promiscuous mode so the pings fail. However tcpdump AFAIK by default turns promiscuous mode on - so your ping packets reach their destination during this period.

mindio · Friday at 14:12

gfngfn256 said:
I remember once reading about a similar situation. As far as I can theorize, I would say its probably got to do with the interface not being in promiscuous mode so the pings fail. However tcpdump AFAIK by default turns promiscuous mode on - so your ping packets reach their destination during this period.

This theory looks promising:

Whats the best way to enable this?

mindio · Friday at 14:20

So I did ip link set eno50np1 promisc on and bingo!

Now how to make this persistent?

gfngfn256 · Friday at 15:14

mindio said:
Now how to make this persistent?

I believe you can add this:

Code:

nano /etc/network/interfaces

up ip link set eth0 promisc on
down ip link set eth0 promisc off

I hope this still works - give it a try!

bbgeek17 · Friday at 15:35

What you are doing is not normal. The promisc mode means that the card will accept all traffic, even the one that is not destined to it. This mode is normally used for sniffers.
It is also used for bridges, as the NIC has to accept IP traffic destined to VMs. The PVE journal is filled with interfaces entering/leaving the promiscuous mode during normal operations.

I'd recommend that you remove vmbr config, and test with just naked hardware NIC. And I would start with direct connection.

Good luck

I'd examine the network trace with tcpdump details or wireshark, paying particular attention to MAC addresses.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

No ping on the second network bridge

mindio

New Member

bbgeek17

Distinguished Member

gfngfn256

Well-Known Member

mindio

New Member

gfngfn256

Well-Known Member

mindio

New Member

mindio

New Member

gfngfn256

Well-Known Member

mindio

New Member

mindio

New Member

gfngfn256

Well-Known Member

bbgeek17

Distinguished Member