I did not figure out what the FRR routing performance issue was with a simulated failure of a direct connection, and got tired of it all and wiped my nodes and started fresh.
For what it's worth and unrelated to that routing issue - I was running Kingston SEDC600M/960G SATA SSDs as the CEPH drives and 980 Pro NVMe 1TB as boot drives.
This is what I was getting over Thunderbolt and those Kingston enterprise SATA SSDs. I didn't make any modifications to cache which could have probably been done since they have PLP.
Code:
rados bench -p ceph-vm 120 write -b 4M -t 16 --run-name pve02 --no-cleanup
Total time run: 120.017
Total writes made: 13831
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 460.969 (Kingston advertises 530MB/s)
Stddev Bandwidth: 40.2623
Max bandwidth (MB/sec): 512
Min bandwidth (MB/sec): 316
Average IOPS: 115 (?!?Not even close to the advertised tens of thousands of IOPS)
Stddev IOPS: 10.0656
Max IOPS: 128
Min IOPS: 79
Average Latency(s): 0.138835
Stddev Latency(s): 0.0341182
Max latency(s): 0.588993
Min latency(s): 0.0137876
Since wiping, I reversed it, and now the SATA SSDs are the boot drives. I was going to set it all up again and rerun tests.
BUT I suffer from analysis paralysis. Debating doing CEPH at all. Debating using FRR and Thunderbolt. Debating ditching IPv6 and just a simple OSPF FRR implementation. Debating pinning PVE kernel to a lower version and virtualizing my iGPU's with SRIOV. Debating burning it all down to the ground and just enjoying the summer.
I'm also waiting to see how Jim's Garage gets on with his YouTube MS-01 and Thunderbolt implementation.