Let me preface this by saying I'm not a networking engineer, I have a technical background but in this area I'm a hobbyist; this is my first post here, please tell me if I'm doing anything wrong, and English is not my first language, so forgive me if something is not clear or is phrased poorly.
At some point I successfully followed this guide:
https://gist.github.com/Drallas/96fa494b84af7e30b68e1dc0d177812f
All my three nodes were working great and could communicate with each other through direct links thanks to some USB ethernet interfaces I added to them. A few days ago I decided to upgrade one of my nodes. It hasn't been the first node I've upgraded in this cluster, and it hand gone smoothly in the past. This upgrade implied changing the motherboard and CPU, and, consequently, at least one of the ethernet interfaces of that node (the other two were unchanged, they were the same USB interfaces I had connected to the previous motherboard). When I booted and changed the interface name in /etc/network/interfaces and in my FRR configuration file, I was able to connect to my Proxmox instance without much issue through my management interface. Sadly, this node could not see the other two, and the other two could not see the refurbished node. After extensive tinkering, I have been unable to solve this. Thinking I must have screwed up at some point, I decided to go with a fresh install and restoration of corosync once the server was installed. This has not fixed a thing.
Allow me to provide some commands and their output:
My IPs and interfaces:
Interfaces file:
FRR config:
In case someone is wondering, yes, I have enabled IPv6 forwarding:
This is what happens when I restart FRR:
This is what I see with lldp:
Despite this I see no neighbors:
Furthermore, routing seems to be wrong:
Naturally pinging doesn't work:
If I add the routes myself I get a different but no less frustrating response:
I've tried adding similar routes to the other two nodes to no avail.
I would provide traceroiutes and tcpdumps and whatnot, but they show nothing (no OSPF traffic, packet just doesn't get out of the local host).
I have checked all cables, they work, they are functional, they are connected as I expect them to be connected.
Can someone please help me understand what I'm doing wrong? If there is any information anyone needs to help with this, I will do my best to provide it. My earnest hope is that I've done something idiotically simple and this can all be easily solved. Thanks in advance to any kind soul that takes a look at it.
At some point I successfully followed this guide:
https://gist.github.com/Drallas/96fa494b84af7e30b68e1dc0d177812f
All my three nodes were working great and could communicate with each other through direct links thanks to some USB ethernet interfaces I added to them. A few days ago I decided to upgrade one of my nodes. It hasn't been the first node I've upgraded in this cluster, and it hand gone smoothly in the past. This upgrade implied changing the motherboard and CPU, and, consequently, at least one of the ethernet interfaces of that node (the other two were unchanged, they were the same USB interfaces I had connected to the previous motherboard). When I booted and changed the interface name in /etc/network/interfaces and in my FRR configuration file, I was able to connect to my Proxmox instance without much issue through my management interface. Sadly, this node could not see the other two, and the other two could not see the refurbished node. After extensive tinkering, I have been unable to solve this. Thinking I must have screwed up at some point, I decided to go with a fresh install and restoration of corosync once the server was installed. This has not fixed a thing.
Allow me to provide some commands and their output:
My IPs and interfaces:
Code:
root@mother:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 fc00::2/128 scope global
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host noprefixroute
valid_lft forever preferred_lft forever
2: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
link/ether 0e:e6:3c:36:04:fe brd ff:ff:ff:ff:ff:ff
3: enx207bd51b0ea5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 20:7b:d5:1b:0e:a5 brd ff:ff:ff:ff:ff:ff
inet6 fe80::227b:d5ff:fe1b:ea5/64 scope link
valid_lft forever preferred_lft forever
4: enx3c18a0d4cb91: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 3c:18:a0:d4:cb:91 brd ff:ff:ff:ff:ff:ff
inet6 fe80::3e18:a0ff:fed4:cb91/64 scope link
valid_lft forever preferred_lft forever
5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 0e:e6:3c:36:04:fe brd ff:ff:ff:ff:ff:ff
inet 10.0.0.52/24 scope global vmbr0
valid_lft forever preferred_lft forever
inet6 fe80::ce6:3cff:fe36:4fe/64 scope link
valid_lft forever preferred_lft forever
Interfaces file:
Code:
root@mother:~# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
auto lo:0
iface lo:0 inet static
address fc00::2/128
iface enp8s0 inet manual
auto enx207bd51b0ea5
iface enx207bd51b0ea5 inet manual
auto enx3c18a0d4cb91
iface enx3c18a0d4cb91 inet manual
auto vmbr0
iface vmbr0 inet static
address 10.0.0.52/24
gateway 10.0.0.254
bridge-ports enp8s0
bridge-stp off
bridge-fd 0
source /etc/network/interfaces.d/*
FRR config:
Code:
root@mother:~# vtysh -c 'show running-config'
Building configuration...
Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname mother
log syslog informational
no ip forwarding
service integrated-vtysh-config
!
interface enx3c18a0d4cb91
ipv6 ospf6 area 0
ipv6 ospf6 network point-to-point
exit
!
interface enx207bd51b0ea5
ipv6 ospf6 area 0
ipv6 ospf6 network point-to-point
exit
!
interface lo
ipv6 ospf6 area 0
exit
!
router ospf6
ospf6 router-id 0.0.0.2
log-adjacency-changes
exit
!
end
In case someone is wondering, yes, I have enabled IPv6 forwarding:
Code:
root@mother:~# sysctl net.ipv6.conf.all.forwarding
net.ipv6.conf.all.forwarding = 1
This is what happens when I restart FRR:
Code:
Feb 04 01:56:25 mother systemd[1]: frr.service: Deactivated successfully.
Feb 04 01:56:25 mother systemd[1]: Stopped frr.service - FRRouting.
Feb 04 01:56:25 mother systemd[1]: frr.service: Consumed 1.091s CPU time.
Feb 04 01:56:25 mother systemd[1]: Starting frr.service - FRRouting...
Feb 04 01:56:25 mother frrinit.sh[6614]: Starting watchfrr with command: ' /usr/lib/frr/watchfrr -d -F traditional zebra bgpd ospf6d staticd bfdd'.
Feb 04 01:56:26 mother watchfrr[6624]: [T83RR-8SM5G] watchfrr 8.5.2 starting: vty@0
Feb 04 01:56:26 mother watchfrr[6624]: [ZCJ3S-SPH5S] zebra state -> down : initial connection attempt failed
Feb 04 01:56:26 mother watchfrr[6624]: [ZCJ3S-SPH5S] bgpd state -> down : initial connection attempt failed
Feb 04 01:56:26 mother watchfrr[6624]: [ZCJ3S-SPH5S] ospf6d state -> down : initial connection attempt failed
Feb 04 01:56:26 mother watchfrr[6624]: [ZCJ3S-SPH5S] staticd state -> down : initial connection attempt failed
Feb 04 01:56:26 mother watchfrr[6624]: [ZCJ3S-SPH5S] bfdd state -> down : initial connection attempt failed
Feb 04 01:56:26 mother watchfrr[6624]: [YFT0P-5Q5YX] Forked background command [pid 6625]: /usr/lib/frr/watchfrr.sh restart all
Feb 04 01:56:26 mother ospf6d[6653]: [RB0PM-VM0Y3] interface_up: Not scheduling Hello for enx3c18a0d4cb91 as there is no area assigned yet
Feb 04 01:56:26 mother ospf6d[6653]: [RB0PM-VM0Y3] interface_up: Not scheduling Hello for enx207bd51b0ea5 as there is no area assigned yet
Feb 04 01:56:26 mother ospf6d[6653]: [TGY3Y-TNWHZ] Higher order sequence number 0 read for default process Success
Feb 04 01:56:26 mother ospf6d[6653]: [H12G7-14H22] Higher order sequence number 1 update for default process
Feb 04 01:56:26 mother ospf6d[6653]: [YWKRY-MSCC3] ospf6_spf_calculation: No router LSA for area 0
Feb 04 01:56:26 mother zebra[6641]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Feb 04 01:56:26 mother ospf6d[6653]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Feb 04 01:56:26 mother bgpd[6646]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Feb 04 01:56:26 mother watchfrr[6624]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Feb 04 01:56:26 mother staticd[6656]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Feb 04 01:56:26 mother bfdd[6659]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Feb 04 01:56:26 mother watchfrr[6624]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Feb 04 01:56:26 mother watchfrr[6624]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Feb 04 01:56:26 mother watchfrr[6624]: [QDG3Y-BY5TN] ospf6d state -> up : connect succeeded
Feb 04 01:56:26 mother watchfrr[6624]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Feb 04 01:56:26 mother watchfrr[6624]: [QDG3Y-BY5TN] bfdd state -> up : connect succeeded
Feb 04 01:56:26 mother watchfrr[6624]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Feb 04 01:56:26 mother frrinit.sh[6614]: Started watchfrr.
Feb 04 01:56:26 mother systemd[1]: Started frr.service - FRRouting.
This is what I see with lldp:
Code:
root@mother:~# lldpctl
-------------------------------------------------------------------------------
LLDP neighbors:
-------------------------------------------------------------------------------
Interface: enp8s0, via: LLDP, RID: 1, Time: 0 day, 00:31:53
Chassis:
ChassisID: mac a8:5e:45:5e:18:39
SysName: maiden.grcarmenaty.home
SysDescr: Debian GNU/Linux 12 (bookworm) Linux 6.8.12-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-5 (2024-12-03T10:26Z) x86_64
MgmtIP: 10.0.0.54
MgmtIface: 5
MgmtIP: fc00::3
MgmtIface: 1
Capability: Bridge, on
Capability: Router, on
Capability: Wlan, off
Capability: Station, off
Port:
PortID: mac 00:e0:4c:68:03:0a
PortDescr: enx00e04c68030a
TTL: 120
PMD autoneg: supported: yes, enabled: yes
Adv: 10Base-T, HD: yes, FD: yes
Adv: 100Base-TX, HD: yes, FD: yes
Adv: 1000Base-T, HD: no, FD: yes
MAU oper type: 1000BaseTFD - Four-pair Category 5 UTP, full duplex mode
-------------------------------------------------------------------------------
Interface: enp8s0, via: LLDP, RID: 2, Time: 0 day, 00:31:53
Chassis:
ChassisID: mac d8:5e:d3:81:d3:d2
SysName: crone.grcarmenaty.home
SysDescr: Debian GNU/Linux 12 (bookworm) Linux 6.8.12-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-5 (2024-12-03T10:26Z) x86_64
MgmtIP: 10.0.0.50
MgmtIface: 5
MgmtIP: fc00::1
MgmtIface: 1
Capability: Bridge, on
Capability: Router, on
Capability: Wlan, on
Capability: Station, off
Port:
PortID: mac 3c:18:a0:d4:db:7c
PortDescr: enx3c18a0d4db7c
TTL: 120
PMD autoneg: supported: yes, enabled: yes
Adv: 10Base-T, HD: yes, FD: yes
Adv: 100Base-TX, HD: yes, FD: yes
Adv: 1000Base-T, HD: no, FD: yes
MAU oper type: 1000BaseTFD - Four-pair Category 5 UTP, full duplex mode
-------------------------------------------------------------------------------
Despite this I see no neighbors:
Code:
root@mother:~# vtysh -c 'show ipv6 ospf6 neighbor'
Neighbor ID Pri DeadTime State/IfState Duration I/F[State]
Furthermore, routing seems to be wrong:
Code:
root@mother:~# vtysh -c 'show ipv6 route'
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
O fc00::2/128 [110/10] is directly connected, lo, weight 1, 00:06:49
C>* fc00::2/128 is directly connected, lo, 00:06:50
C * fe80::/64 is directly connected, vmbr0, 00:06:50
C * fe80::/64 is directly connected, enx3c18a0d4cb91, 00:06:50
C>* fe80::/64 is directly connected, enx207bd51b0ea5, 00:06:50
root@mother:~# ip -6 r
fc00::2 dev lo proto kernel metric 256 pref medium
fe80::/64 dev enx3c18a0d4cb91 proto kernel metric 256 pref medium
fe80::/64 dev enx207bd51b0ea5 proto kernel metric 256 pref medium
fe80::/64 dev vmbr0 proto kernel metric 256 pref medium
Naturally pinging doesn't work:
Code:
root@mother:~# ping fc00::1
ping: connect: Network is unreachable
root@mother:~# ping fc00::3
ping: connect: Network is unreachable
If I add the routes myself I get a different but no less frustrating response:
Code:
root@mother:~# ip -6 r a fc00::3 via fe80::2e0:4cff:fe68:3b3c dev enx3c18a0d4cb91 proto ospf metric 20 pref medium
ip -6 r a fc00::1 via fe80::da5e:d3ff:fe81:d3d2 dev enx207bd51b0ea5 proto ospf metric 20 pref medium
root@mother:~# ip -6 r
fc00::1 via fe80::da5e:d3ff:fe81:d3d2 dev enx207bd51b0ea5 proto ospf metric 20 pref medium
fc00::2 dev lo proto kernel metric 256 pref medium
fc00::3 via fe80::2e0:4cff:fe68:3b3c dev enx3c18a0d4cb91 proto ospf metric 20 pref medium
fe80::/64 dev enx3c18a0d4cb91 proto kernel metric 256 pref medium
fe80::/64 dev enx207bd51b0ea5 proto kernel metric 256 pref medium
fe80::/64 dev vmbr0 proto kernel metric 256 pref medium
root@mother:~# ping fc00::1
PING fc00::1(fc00::1) 56 data bytes
From fc00::2 icmp_seq=1 Destination unreachable: Address unreachable
From fc00::2 icmp_seq=2 Destination unreachable: Address unreachable
From fc00::2 icmp_seq=3 Destination unreachable: Address unreachable
^C
--- fc00::1 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4127ms
root@mother:~# ping fc00::3
PING fc00::3(fc00::3) 56 data bytes
From fc00::2 icmp_seq=10 Destination unreachable: Address unreachable
From fc00::2 icmp_seq=11 Destination unreachable: Address unreachable
From fc00::2 icmp_seq=12 Destination unreachable: Address unreachable
From fc00::2 icmp_seq=13 Destination unreachable: Address unreachable
^C
--- fc00::3 ping statistics ---
13 packets transmitted, 0 received, +4 errors, 100% packet loss, time 16273ms
I've tried adding similar routes to the other two nodes to no avail.
I would provide traceroiutes and tcpdumps and whatnot, but they show nothing (no OSPF traffic, packet just doesn't get out of the local host).
I have checked all cables, they work, they are functional, they are connected as I expect them to be connected.
Can someone please help me understand what I'm doing wrong? If there is any information anyone needs to help with this, I will do my best to provide it. My earnest hope is that I've done something idiotically simple and this can all be easily solved. Thanks in advance to any kind soul that takes a look at it.