Minor Change - Thunderbolt Networking

B-C

New Member
Sep 24, 2023
14
3
3
Reviewed a bit and went for the Thunderbolt networking setup provided here for my MS-01 Cluster.

https://gist.github.com/scyto/67fdc9a517faefa68f730f82d7fa3570
&
https://gist.github.com/scyto/4c664734535da122f4ab2951b22b2085

Differences which are giving me grief is that I already had the base up and running in latest version on the 2.5g interfaces - no ring just plain + Ceph on the same network. (to get things up and running)

The isolated network is the 10.99.99.21-23 IPs

However if I change the Migration network in DataCenter > Options to the 10.99.99.0/24 network I get these errors on normal migrations
98% sure I missed something! ;p
Code:
could not get migration ip: no IP address configured on local node for network '10.99.99.21/32'
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve03' -o 'UserKnownHostsFile=/etc/pve/nodes/pve03/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@10.22.20.23 pvecm mtunnel -migration_network 10.99.99.21/32 -get_migration_ip' failed: exit code 255


Switching Migration back to 10.22.20.0/24 does work temporarily back to the slower Net

Ran these on each host to each host:
ssh -o 'HostKeyAlias=pve01' root@10.99.99.21
ssh -o 'HostKeyAlias=pve02' root@10.99.99.22
ssh -o 'HostKeyAlias=pve03' root@10.99.99.23

and they will connect without issue using keys. (just have to accept the fingerprint one time for each)

Just wondering what I need to correct to get it to correctly use that network for migrations?

Then additionally It looks like I'd follow these steps to move the Ceph network over to that same 10G+ Net as well as smoothly as possible with limited outage which one at a time seems possible if reading correctly:
https://forum.proxmox.com/threads/ceph-changing-public-network.119116/
 
Last edited:
Did you get this resolved in the end - has it been reliable for you? Am just about to setup 2 x MS-01 in a cluster and have a thunderbolt cable between them ready to set this up.
 
Nope - ended up not being good for me...
would have needed to edit / modify a bunch of configs that didn't want to do on a production cluster.

TLDR:
If I was started from scratch or in full development - I'd love to test more on this.

When I implemented the Mesh it was already setup and running and would have had to migrate machines off the Ceph and rebuild the OSDs on the correct network, so abandoned it.

--- Long version ---

subsequent updates to the MS-01s seems to have broken the USB networking and I never looked back to review.
unable to ping across those interfaces at least.

So holding on my my 2.5g interface and its been stable and no MEBx - using a ipkvmv4 mini, to get to them if they fail.

All 3 on i9-13900H 0x4121 microcode
1x is still on 1.22 vs 1.24
2x of the 3 MS-01s are running with reduced RAM speeds 4400 vs max 5200/5600 via test firmware 1.24 and now the cluster is holding well past 30+ days without failures. (currently at 28 days since last power issue at the site that outlasted the UPS, but came back up without issue on its own)

-----

Apologies I couldn't be of more help!
 
@B-C thanks for the update, that's really detailed. I'm setting these up as basically fresh so am tempted to have a play before putting it in production. Have updated both to latest 1.26 firmware and done the microcode updates (via the tteck scripts). I've not tried proxmox with Ceph before - but would be useful for a couple of VMs where i could do with HA, what's the performance like?
 
performance good - small office nothing major
Those little MS-01s scream pretty darn good - haven't done any benchmarks though.

2 Nodes is a bit of a "hack" as it requires 3x nodes, but like you saw his scripts might make it doable!

beyond that nice on the 1.26 - I'll probably need to get myself updated to that - haven't been monitoring much last couple months.!

--- One huge Favor ---
Document everything you can -
I might buy another set of 3 just for testing here in a few months - but interested in how the 2 node cluster goes
Especially with tbolt mesh - would really like to re-attempt that -

my test cluster is Optiplex 3040 i5s on 1gb nic only.

M.2 SATA 2TB and ended up using partitions with OSD on ceph (also not recommended) but HA and all is holding well and healthy just not "Optimal"
(these only have a single SATA connection - so had to do a SATA to M.2 adapter - its a dual M.2 but only sees them as a single drive so ended up populating only one m.2

have a Micro 7060 that has M.2 as #4 loaded esxi on that one to test migrations from vmware... worked well enough.
 
Hi folks,

same setup here. Thunderbolt mesh with 3x MS-01 12900H.
The Thunderbolt network mesh is pretty stable and I could join easily the nodes to my cluster.

SSH between nodes ? Yes
Ping between nodes ? Yes
Migration ? Not working:

Task viewer: CT 123 - Migrate (node01 ---> node02)



could not get migration ip: no IP address configured on local node for network 'fc00::81/128'
TASK ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=node02' -o
'UserKnownHostsFile=/etc/pve/nodes/node02/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@fc00::82 pvecm mtunnel -migration_network fc00::81/128 -get_migration_ip' failed: exit code 255

Any idea to get it up and running ?

by the way, here is my output of the /etc/pve/datacenter.cfg


Code:
keyboard: fr-ch
migration: network=fc00::81/128,type=insecure
 
Hello,
i have two ms-01 but no matter i can't get the interface to come up. Have followed the gists from scyto to set up the configuration.
This is my actual config:

Code:
=========================================
Thunderbolt Mesh Network Config Info Tool
=========================================

-----------------------------------------
Kernel Version
-----------------------------------------
Linux zep-pve-02 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux

-----------------------------------------
File: /etc/network/interfaces
-----------------------------------------
auto lo
iface lo inet loopback
iface en05 inet manual
iface en06 inet manual
iface enp87s0 inet manual
auto enp89s0
iface enp89s0 inet manual
iface enp2s0f0np0 inet manual
iface enp2s0f1np1 inet manual
auto vmbr0
iface vmbr0 inet static
        address 10.10.10.21/24
        gateway 10.10.10.1
        bridge-ports enp2s0f1np1
        bridge-stp off
        bridge-fd 0
iface wlp90s0 inet manual
post-up /usr/bin/systemctl reset-failed frr.service
post-up /usr/bin/systemctl restart frr.service
source /etc/network/interfaces.d/*

-----------------------------------------
File: /etc/network/interfaces.d/thunderbolt
-----------------------------------------
      
auto lo:6
iface lo:6 inet static
        address fc00::82/128
        
allow-hotplug en05
iface en05 inet manual
        mtu 65520

allow-hotplug en06
iface en06 inet manual
        mtu 65520

----------------------------------------
File: /usr/local/bin/pve-en05.sh
-rwxr-xr-x 1 root root 154 Dec  4 22:15 /usr/local/bin/pve-en05.sh
-----------------------------------------
/usr/sbin/ifup en05

----------------------------------------
File: /usr/local/bin/pve-en06.sh
-rwxr-xr-x 1 root root 154 Dec  4 22:15 /usr/local/bin/pve-en06.sh
-----------------------------------------
/usr/sbin/ifup en06

-----------------------------------------
File: /etc/modules
-----------------------------------------
vfio
vfio_iommu_type1
vfio_pci
thunderbolt
thunderbolt-net

-----------------------------------------
File: /etc/systemd/network/00-thunderbolt0.link
-----------------------------------------
Path=pci-0000:00:0d.2
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en05

-----------------------------------------
File: /etc/systemd/network/00-thunderbolt1.link
-----------------------------------------
Path=pci-0000:00:0d.3
Driver=thunderbolt-net
[Link]
MACAddressPolicy=none
Name=en06

-----------------------------------------
File: /etc/sysctl.conf
-----------------------------------------
net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1

-----------------------------------------
File: /etc/udev/rules.d/10-tb-en.rules
-----------------------------------------
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en05", RUN+="/usr/local/bin/pve-en05.sh"
ACTION=="move", SUBSYSTEM=="net", KERNEL=="en06", RUN+="/usr/local/bin/pve-en06.sh"

-----------------------------------------
File: /etc/frr/frr.conf
-----------------------------------------
frr version 8.5.2
frr defaults traditional
hostname pve-02
log syslog informational
service integrated-vtysh-config
!
interface en05
 ipv6 router openfabric 1
exit
!
interface en06
 ipv6 router openfabric 1
exit
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0002.00
exit
!

-----------------------------------------
File: /etc/frr/daemons
-----------------------------------------
bgpd=yes
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
pim6d=no
ldpd=no
nhrpd=no
eigrpd=no
babeld=no
sharpd=no
pbrd=no
bfdd=yes
fabricd=yes
vrrpd=no
pathd=no
vtysh_enable=yes
zebra_options="  -A 127.0.0.1 -s 90000000"
bgpd_options="   -A 127.0.0.1"
ospfd_options="  -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options="   -A 127.0.0.1"
ripngd_options=" -A ::1"
isisd_options="  -A 127.0.0.1"
pimd_options="   -A 127.0.0.1"
pim6d_options="  -A ::1"
ldpd_options="   -A 127.0.0.1"
nhrpd_options="  -A 127.0.0.1"
eigrpd_options=" -A 127.0.0.1"
babeld_options=" -A 127.0.0.1"
sharpd_options=" -A 127.0.0.1"
pbrd_options="   -A 127.0.0.1"
staticd_options="-A 127.0.0.1"
bfdd_options="   -A 127.0.0.1"
fabricd_options="-A 127.0.0.1"
vrrpd_options="  -A 127.0.0.1"
pathd_options="  -A 127.0.0.1"

-----------------------------------------
Command: vtysh -c "show openfabric topology"
-----------------------------------------
Area 1:
IS-IS paths to level-2 routers that speak IP
Vertex               Type         Metric Next-Hop             Interface Parent
pve-02                                                           

IS-IS paths to level-2 routers that speak IPv6
Vertex               Type         Metric Next-Hop             Interface Parent
pve-02                                                           
fc00::82/128         IP6 internal 0                                     pve-02(4)

IS-IS paths to level-2 routers with hop-by-hop metric
Vertex               Type         Metric Next-Hop             Interface Parent



-----------------------------------------
Command: vtysh -c "show running-config"
-----------------------------------------
Building configuration...

Current configuration:
!
frr version 8.5.2
frr defaults traditional
hostname pve-02
log syslog informational
service integrated-vtysh-config
!
interface lo
 ip router openfabric 1
 ipv6 router openfabric 1
 openfabric passive
exit
!
router openfabric 1
 net 49.0000.0000.0002.00
exit
!
end

----------------------------------------
File: /etc/network/if-up.d/thunderbolt-affinity
ls: cannot access '/etc/network/if-up.d/thunderbolt-affinity': No such file or directory
-----------------------------------------
grep: /etc/network/if-up.d/thunderbolt-affinity: No such file or directory

En06 doesn't seem to show but instead i have thunderbolt0

Code:
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 fc00::82/128 scope global
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp87s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 58:47:ca:77:1d:68 brd ff:ff:ff:ff:ff:ff
3: enp89s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 58:47:ca:77:1d:69 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5a47:caff:fe77:1d69/64 scope link
       valid_lft forever preferred_lft forever
4: enp2s0f0np0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 58:47:ca:77:1d:66 brd ff:ff:ff:ff:ff:ff
5: enp2s0f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 58:47:ca:77:1d:67 brd ff:ff:ff:ff:ff:ff
6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 58:47:ca:77:1d:67 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.21/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::5a47:caff:fe77:1d67/64 scope link
       valid_lft forever preferred_lft forever
7: thunderbolt0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 02:3a:0d:8e:59:2e brd ff:ff:ff:ff:ff:ff
 
try modify /etc/pve/datacenter.cfg
migration: network=fc00::81/112,type=insecure

I'm using ipv4, I modified it to /24 instead of /32 to make migration work.