hi,
I added a fifth node to my cluster. All nodes have three network adapters, connected for each VLAN on the same switch (with identical port configuration on the switchtes) and all nodes are configured in same manner.
Network config template:
face lo inet loopback
iface enp12s0 inet manual
iface enp11s0f0 inet manual
iface enp11s0f1 inet manual
auto vmbr0
iface vmbr0 inet static
address ABC.DEF.XX.YY/24
gateway ABC.DEF.XX.AB
bridge-ports enp11s0f0
bridge-stp off
bridge-fd 0
auto vmbr1
iface vmbr1 inet static
address AA.BB.CC.DD/24
bridge-ports enp12s0
bridge-stp off
bridge-fd 0
#Ceph1
auto vmbr2
iface vmbr2 inet static
address AA.EE.FF.GG/24
bridge-ports enp11s0f1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 110
The fifth node has newer hardware and uses RTL8226B_RTL8221B and igb network adapter drivers.
All links are up:
dmesg | grep -E "(igb|8169)"
[ 0.920483] r8169 0000:0c:00.0 eth0: RTL8125B, d8:43:ae:b8:21:88, XID 641, IRQ 48
[ 0.920487] r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[ 0.929552] r8169 0000:0c:00.0 enp12s0: renamed from eth0
[ 0.934898] igb: Intel(R) Gigabit Ethernet Network Driver
[ 0.934900] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 0.934922] igb 0000:0b:00.0: enabling device (0000 -> 0002)
[ 1.008169] hub 4-0:1.0: 2 ports detected
[ 1.278090] igb 0000:0b:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 1.278095] igb 0000:0b:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 1c:86:0b:2b:6d:d4
[ 1.278110] igb 0000:0b:00.0: eth0: PBA No: Unknown
[ 1.278112] igb 0000:0b:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[ 1.278158] igb 0000:0b:00.1: enabling device (0000 -> 0002)
[ 1.654243] igb 0000:0b:00.1: Intel(R) Gigabit Ethernet Network Connection
[ 1.654247] igb 0000:0b:00.1: eth1: (PCIe:2.5Gb/s:Width x1) 1c:86:0b:2b:6d:d5
[ 1.654261] igb 0000:0b:00.1: eth1: PBA No: Unknown
[ 1.654263] igb 0000:0b:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[ 2.755522] igb 0000:0b:00.0 enp11s0f0: renamed from eth0
[ 2.773517] igb 0000:0b:00.1 enp11s0f1: renamed from eth1
[ 10.151592] igb 0000:0b:00.0 enp11s0f0: entered allmulticast mode
[ 10.151625] igb 0000:0b:00.0 enp11s0f0: entered promiscuous mode
[ 10.765837] r8169 0000:0c:00.0 enp12s0: entered allmulticast mode
[ 10.765882] r8169 0000:0c:00.0 enp12s0: entered promiscuous mode
[ 10.793591] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-c00:00: attached PHY driver (mii_bushy_addr=r8169-0-c00:00, irq=MAC)
[ 10.960935] r8169 0000:0c:00.0 enp12s0: Link is Down
[ 10.992939] igb 0000:0b:00.1 enp11s0f1: entered allmulticast mode
[ 12.905582] igb 0000:0b:00.0 enp11s0f0: igb: enp11s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 14.130407] r8169 0000:0c:00.0 enp12s0: Link is Up - 2.5Gbps/Full - flow control off
[ 14.145317] igb 0000:0b:00.1 enp11s0f1: igb: enp11s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1420.278387] r8169 0000:0c:00.0 enp12s0: Link is Down
[ 1481.845729] r8169 0000:0c:00.0 enp12s0: Link is Up - 2.5Gbps/Full - flow control off
[ 1521.661091] igb 0000:0b:00.1 enp11s0f1: igb: enp11s0f1 NIC Link is Down
[ 1604.449504] igb 0000:0b:00.1 enp11s0f1: igb: enp11s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The links on the switches are also up.
The first four nodes reaches each other via ping on each network. If I try to ping from one of the first four seems to ping hangs like a routing problem, the ping from the fifth node to one of the first four to the ceph1 network results in a "Destination Host Unreachable" and a ping to the corosync has a routing problem similar behaviour (hangs).
PVE version: pve-manager/8.3.1
I tried lot of different configuration variations (with vlan tagging / without), but no success. At moment I'm out of idea.
Has somebody further suggestion troubleshoot the problem?
Thanks in advance.
I added a fifth node to my cluster. All nodes have three network adapters, connected for each VLAN on the same switch (with identical port configuration on the switchtes) and all nodes are configured in same manner.
Network config template:
face lo inet loopback
iface enp12s0 inet manual
iface enp11s0f0 inet manual
iface enp11s0f1 inet manual
auto vmbr0
iface vmbr0 inet static
address ABC.DEF.XX.YY/24
gateway ABC.DEF.XX.AB
bridge-ports enp11s0f0
bridge-stp off
bridge-fd 0
auto vmbr1
iface vmbr1 inet static
address AA.BB.CC.DD/24
bridge-ports enp12s0
bridge-stp off
bridge-fd 0
#Ceph1
auto vmbr2
iface vmbr2 inet static
address AA.EE.FF.GG/24
bridge-ports enp11s0f1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 110
The fifth node has newer hardware and uses RTL8226B_RTL8221B and igb network adapter drivers.
All links are up:
dmesg | grep -E "(igb|8169)"
[ 0.920483] r8169 0000:0c:00.0 eth0: RTL8125B, d8:43:ae:b8:21:88, XID 641, IRQ 48
[ 0.920487] r8169 0000:0c:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[ 0.929552] r8169 0000:0c:00.0 enp12s0: renamed from eth0
[ 0.934898] igb: Intel(R) Gigabit Ethernet Network Driver
[ 0.934900] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 0.934922] igb 0000:0b:00.0: enabling device (0000 -> 0002)
[ 1.008169] hub 4-0:1.0: 2 ports detected
[ 1.278090] igb 0000:0b:00.0: Intel(R) Gigabit Ethernet Network Connection
[ 1.278095] igb 0000:0b:00.0: eth0: (PCIe:2.5Gb/s:Width x1) 1c:86:0b:2b:6d:d4
[ 1.278110] igb 0000:0b:00.0: eth0: PBA No: Unknown
[ 1.278112] igb 0000:0b:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[ 1.278158] igb 0000:0b:00.1: enabling device (0000 -> 0002)
[ 1.654243] igb 0000:0b:00.1: Intel(R) Gigabit Ethernet Network Connection
[ 1.654247] igb 0000:0b:00.1: eth1: (PCIe:2.5Gb/s:Width x1) 1c:86:0b:2b:6d:d5
[ 1.654261] igb 0000:0b:00.1: eth1: PBA No: Unknown
[ 1.654263] igb 0000:0b:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[ 2.755522] igb 0000:0b:00.0 enp11s0f0: renamed from eth0
[ 2.773517] igb 0000:0b:00.1 enp11s0f1: renamed from eth1
[ 10.151592] igb 0000:0b:00.0 enp11s0f0: entered allmulticast mode
[ 10.151625] igb 0000:0b:00.0 enp11s0f0: entered promiscuous mode
[ 10.765837] r8169 0000:0c:00.0 enp12s0: entered allmulticast mode
[ 10.765882] r8169 0000:0c:00.0 enp12s0: entered promiscuous mode
[ 10.793591] RTL8226B_RTL8221B 2.5Gbps PHY r8169-0-c00:00: attached PHY driver (mii_bushy_addr=r8169-0-c00:00, irq=MAC)
[ 10.960935] r8169 0000:0c:00.0 enp12s0: Link is Down
[ 10.992939] igb 0000:0b:00.1 enp11s0f1: entered allmulticast mode
[ 12.905582] igb 0000:0b:00.0 enp11s0f0: igb: enp11s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 14.130407] r8169 0000:0c:00.0 enp12s0: Link is Up - 2.5Gbps/Full - flow control off
[ 14.145317] igb 0000:0b:00.1 enp11s0f1: igb: enp11s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 1420.278387] r8169 0000:0c:00.0 enp12s0: Link is Down
[ 1481.845729] r8169 0000:0c:00.0 enp12s0: Link is Up - 2.5Gbps/Full - flow control off
[ 1521.661091] igb 0000:0b:00.1 enp11s0f1: igb: enp11s0f1 NIC Link is Down
[ 1604.449504] igb 0000:0b:00.1 enp11s0f1: igb: enp11s0f1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The links on the switches are also up.
The first four nodes reaches each other via ping on each network. If I try to ping from one of the first four seems to ping hangs like a routing problem, the ping from the fifth node to one of the first four to the ceph1 network results in a "Destination Host Unreachable" and a ping to the corosync has a routing problem similar behaviour (hangs).
PVE version: pve-manager/8.3.1
I tried lot of different configuration variations (with vlan tagging / without), but no success. At moment I'm out of idea.
Has somebody further suggestion troubleshoot the problem?
Thanks in advance.