Intel Nuc 13 Pro Thunderbolt Ring Network Ceph Cluster

scyto · Aug 18, 2023

Same here you have been super helpful. I even tried looking at the kernel docs and driver code. Not much that is obvious to me about possible issues.

Thunderbolt-net seems cutting edge despite being in the kernel for a few years. For example thunderbolt should be using predictable network names, but isn’t. I am hoping the increase in USB4 and TB4 renews interest. For now I have an acceptable workaround in using IPv4 and I sure learnt about a bunch of things in Linux I had never really looked at before (udev, iptables, frr, fabricd).

This has take much of my vacation week when I should have been playing games. Will do that today / tomorrow then may try and repro on Sunday in pure vanilla Debian.

scyto · Aug 19, 2023

@ualex @l-koeln how have you chose to divide up ceph private, ceph public, corosync (cluster) network, migration network, across your 3 networks

ualex · Aug 19, 2023

@scyto I do not use a cluster YET, I am still in the process of getting the network stable.

I also tested with a vanilla Debian 12, and it shows the same problem as you had before in Proxmox 8. IPv4 works perfectly, IPv6 does not work (only ping works). Seems IPv6 in the thunderbolt-net is just NOT working. So the only way forward, is to use IPv4 only.

scyto · Aug 19, 2023

Wow, thanks for doing the testing. I guess the Debian maintainers have no idea thunderbolt is broken.

i got my routed network working with ipv4 last night using openfabric, it has some issues with start sequence I still need to solve (it doesn’t work unless the FRR service is restarted after full boot has occurred). Will try OSPF next to see if it is daemon specific or not.

scyto · Aug 19, 2023

Current Results - based on two different tests:

1. I have fabricd based routing with IPv4 running. Only issue is I have to manually restart the FRR services after any node boot once all the connections are up (this seems less than ideal for a network for ceph)

2. I have failed to get OSPF (IPv4) working - i can't see why. OSPF never detects neighbors. (i have no issues with the adapters coming up and having the static IP addresses applied)

ualex · Aug 19, 2023

@scyto are you sure you configure an unique /30 for each en05/en06 interface? Should it not be 3 subnets only (each host in 2 subnets)?

scyto · Aug 19, 2023

ualex said:
@scyto are you sure you configure an unique /30 for each en05/en06 interface? Should it not be 3 subnets only (each host in 2 subnets)?

I think so, each en05 and en 06 on each node is in a unique /30 subnet, assuming i used my calculator correctly?
Should i have done something else? (each interface on each node got the first useable ip of a unique range)

specifically

Code:

Node l:
lo:0 = 10.0.0.81/32
en05 = 10.0.0.5/30
en06 = 10.0.0.9/30
ospf router-id = 0.0.0.1

Node 2:
lo:0 = 10.0.0.82/32
en05 = 10.0.0.13/30
en06 = 10.0.0.17/30
ospf router-id = 0.0.0.2

Node 3:
lo:0 = 10.0.0.82/32
en05 = 10.0.0.21/30
en06 = 10.0.0.25/30
ospf router-id = 0.0.0.3

ualex · Aug 19, 2023

I believe you need to configure them as:

Node1 en05: 10.0.0.5
Node1 en06: 10.0.0.9
Node2 en05: 10.0.0.10
Node2 en06: 10.0.0.13
Node3 en05: 10.0.0.14
Node3 en05: 10.0.0.6
All /30 of course … 3 subnets, instead of 6 you have now.

scyto · Aug 19, 2023

ah ok, i see opposing nodes interfaces need to be on the same subnet - doesn't that create an issue if someone changes cable order?

scyto · Aug 20, 2023

ok that gets me further but still is not a complete mesh with fallback

vtysh -c "show ip ospf neighbor" on node 3 will show

Code:

Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
0.0.0.2           1 Full/-          5m58s             31.979s 10.0.0.13       en05:10.0.0.14                       0     0     0
0.0.0.1           1 Full/-          3m42s             37.861s 10.0.0.5        en06:10.0.0.6                        0     0     0

But if I pull en05 on node 2 (port marked 2 on the case) routing breaks when pinging from node 3 to 10.0.0.10 and 10.0.0.82 - i.e. full routing breaks and given the point is to bind ceph to the loopbacks would seem to not meet the need.

Also it means even if we use the host interface IPs one of the subnets will break... i have some wild theories to try....

scyto · Aug 20, 2023

I think i fixed it...

in the frr configs changing ip ospf network point-to-point to ip ospf network point-to-multipoint`seems to fix the issue of routing breaking on pulling cable from the ring.

ualex · Aug 20, 2023

Good to hear you got a step further/fixing it! Also, in your list IP addresses you used for node 2 and 3 the same "lo:0 = 10.0.0.82/32".

ualex · Aug 20, 2023

@scyto - question: how do you get around the en05/en06 naming, depending on who connects first?

So if I have connected:
nuc1 thunderbolt0 -> nuc2 thunderbolt1

Then on both I see them getting en05. Where I expect nuc1 get en05 and nuc2 get en06.

The dmesg output shows it sees thunderbolt0 on nuc2, where it is clearly connected to the SECOND thunderbolt:

Code:

[Sun Aug 20 09:42:53 2023] ACPI: bus type thunderbolt registered
[Sun Aug 20 09:43:01 2023] thunderbolt 0-0:1.1: new retimer found, vendor=0x8087 device=0x15ee
[Sun Aug 20 09:43:14 2023] thunderbolt 0-1: new host found, vendor=0x8086 device=0x1
[Sun Aug 20 09:43:24 2023] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0

I got the following configured to rename:

Code:

root@pve2:/etc/systemd/network# cat 10-thunderbolt0.link

[Match]
Path=pci-0000:00:0d.2
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en05

root@pve2:/etc/systemd/network# cat 11-thunderbolt1.link
[Match]
Path=pci-0000:00:0d.3
Driver=thunderbolt-net

[Link]
MACAddressPolicy=none
Name=en06

scyto · Aug 20, 2023

ualex said:
In your list IP addresses you used for node 2 and 3 the same "lo:0 = 10.0.0.82/32".

Thanks, corrected.

scyto · Aug 20, 2023

ualex said:
how do you get around the en05/en06 naming, depending on who connects first?

I don't see any such thing on my machines, but maybe we are testing differently...

i just went to the running cluster and used udevadm monitor to monitor the connections (it is udev that processes the link files).
On node 1 i pulled both thunderbolt connections.

Then i took one cable and plugged it in to each port in turn,

Port 1 (as marked on the case) always came up as en06 /devices/pci0000:00/0000:00:0d.3/domain1/1-0/1-1/1-1.0/net/en06 (net)
Port 2 (as marked on the case) always came up as en05 /devices/pci0000:00/0000:00:0d.2/domain0/0-0/0-1/0-1.0/net/en05 (net)
(the OCD in me wants to change my port naming to something like entb1 and entb2 to match the case marking)

I have also not AFAIK had any issues during reboots with numbers shifting

Code:

root@pve1:~# dmesg | grep thunderbolt-net
[   10.921947] thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
[   11.303792] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
[ 1133.531973] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
[ 1971.123133] thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
[ 2057.808336] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
[39829.992621] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
[39851.752589] thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
[39870.440922] thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
[39918.855795] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
[39940.423592] thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
[40328.162886] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0
[40351.813223] thunderbolt-net 1-1.0 en06: renamed from thunderbolt0
[40367.109435] thunderbolt-net 0-1.0 en05: renamed from thunderbolt0

tl;dr on my machine while which PCI device gets called thunderbolt0/thunderbolt1 changes, the pci device id seems to stay consistent - as such so does which port is en05 vs en06 (and why whomever wrote the driver code thought naming the interfaces inconsistently-dynamic this way is beyond me given predictable interface names as been a thing for a long while now)

scyto · Aug 20, 2023

my nodes are connected as follows FWIW (using the numbers printed on the case)

Code:

node 1 port 1 > node 2 port 2
node 2 port 1 > node 3 port 2
node 3 port 1 > node 1 port 2

I even tried connecting node 1 port 1 > node 2 port 1 to see if that changed the PCI dev numbers, it didn't.

scyto · Aug 20, 2023

oh one thing i notice about your link files... they both start with different indexes, my have the same index so:

00-thunderbolt0.link
00-thunderbolt1.link

I don't known how / if that would affect processing at boot time...

ualex · Aug 20, 2023

Good it works fine for you, so I have most likely a small typo somewhere and good you spotted the naming of the file. My original files were named 00-thunderboltX.link, but when I copied it ... I change it to 10/11, do not know why

More testing is required on my side.

scyto · Aug 20, 2023

another note about FRR OSPF setup

ip ospf network broadcast works too i am not sure which is 'better' broadcast or point-to-multipoint

scyto · Aug 20, 2023

ualex said:
Good it works fine for you, so I have most likely a small typo somewhere and good you spotted the naming of the file. My original files were named 00-thunderboltX.link, but when I copied it ... I change it to 10/11, do not know why More testing is required on my side.

Well what you posted above looks perfect to me... so maybe its the filenames? I really don't know! let me know how you get on.

BTW if you have same model as I do i wonder if BIOS changes behaviour? If you do i am on BIOS `ANRPL357.0026.2023.0314.1458`
FYI one machines BIOS got updated to `ANRPL357.0027.2023.0607.1754 by windows 11 i had done a test install on - i reverted to the 0026 variant when i was having issues with the add in ethernet board.... they seemed to go away...

Intel Nuc 13 Pro Thunderbolt Ring Network Ceph Cluster

Active Member

Active Member

Member

Active Member

Active Member

Member

Active Member

Member

Active Member

Active Member

Active Member

Member

Member

Active Member

Active Member

Active Member

Active Member

Member

Active Member

Active Member

We value your privacy