I have reconfigured a small homelab from ZFS to Ceph just for interest, running a small bunch of services.
All has been running pretty smoothly over the last few days.
I have 3 nodes, each with a Samsung 980 Pro 2TB (50GB root etc, balance of 1.8TB used for Ceph). I know, not ideal, but ok for this purpose.
Running Thunderbolt networking between each node, two TB3 ports in each, each connected to the next.
management: 192.168.20.0/24
tb0/tb1 on each node: 192.168.245.x/30
I have setup OpenFabric via the Proxmox GUI on 192.168.248.0/24. Each is able to communicate with the next. Overall, seeing ~25Gb between nodes with
Config based on this: https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
I have one issue I haven't been able to resolve - when I click either Host > Ceph > OSD > Manage Global Flags or Datacenter > Ceph I see the spinning wheel with a got timeout (500) error.
I have disabled firewall on the datacenter and nodes to no avail. For some reason, this feels OpenFabric related, but I am just guessing.
Any insight is appreciated.
ceph -s
ip route
/etc/frr/frr.conf from one host (others the same)
ceph config
EDIT:
Interestingly.... just reading the output above, I notice the osd_pool_default_min_size and osd_pool_default_size both at 2. The pool was switched to 3/2 via the GUI. It started at 2/2 while I was migrating data from the last host into the pool, before moving the 3rd host over to ceph. I'll look at that separately.
All has been running pretty smoothly over the last few days.
I have 3 nodes, each with a Samsung 980 Pro 2TB (50GB root etc, balance of 1.8TB used for Ceph). I know, not ideal, but ok for this purpose.
Running Thunderbolt networking between each node, two TB3 ports in each, each connected to the next.
management: 192.168.20.0/24
tb0/tb1 on each node: 192.168.245.x/30
I have setup OpenFabric via the Proxmox GUI on 192.168.248.0/24. Each is able to communicate with the next. Overall, seeing ~25Gb between nodes with
iperf3Config based on this: https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Routed_Setup_(with_Fallback)
I have one issue I haven't been able to resolve - when I click either Host > Ceph > OSD > Manage Global Flags or Datacenter > Ceph I see the spinning wheel with a got timeout (500) error.
I have disabled firewall on the datacenter and nodes to no avail. For some reason, this feels OpenFabric related, but I am just guessing.
Any insight is appreciated.
ceph -s
Code:
cluster:
id: dd1020e0-2316-436a-abd8-45ffe33aa28c
health: HEALTH_OK
services:
mon: 3 daemons, quorum proxmox-nuc01,proxmox-nuc02,proxmox-nuc03 (age 15m)
mgr: proxmox-nuc01(active, since 24h), standbys: proxmox-nuc02, proxmox-nuc03
osd: 3 osds: 3 up (since 10m), 3 in (since 25h)
data:
pools: 2 pools, 33 pgs
objects: 2.59M objects, 666 GiB
usage: 1.6 TiB used, 3.7 TiB / 5.3 TiB avail
pgs: 33 active+clean
io:
client: 2.0 KiB/s rd, 2.3 MiB/s wr, 0 op/s rd, 315 op/s wr
ip route
Code:
default via 192.168.20.1 dev vmbr0 proto kernel onlink
192.168.20.0/24 dev vmbr0 proto kernel scope link src 192.168.20.12
192.168.245.0/30 dev tb0 proto kernel scope link src 192.168.245.1
192.168.245.8/30 dev tb1 proto kernel scope link src 192.168.245.10
192.168.248.13 nhid 26 via 192.168.245.2 dev tb0 proto openfabric src 192.168.248.12 metric 20 onlink
192.168.248.14 nhid 27 via 192.168.245.9 dev tb1 proto openfabric src 192.168.248.12 metric 20 onlink
192.168.250.0/24 dev vmbr0.250 proto kernel scope link src 192.168.250.12
/etc/frr/frr.conf from one host (others the same)
Code:
frr version 10.3.1
frr defaults datacenter
hostname proxmox-nuc01
log syslog informational
service integrated-vtysh-config
!
router openfabric tb99
net 49.0001.1921.6824.8012.00
exit
!
interface dummy_tb99
ip router openfabric tb99
openfabric passive
exit
!
interface tb0
ip router openfabric tb99
openfabric hello-interval 1
openfabric csnp-interval 2
exit
!
interface tb1
ip router openfabric tb99
openfabric hello-interval 1
openfabric csnp-interval 2
exit
!
access-list pve_openfabric_tb99_ips permit 192.168.248.0/24
!
route-map pve_openfabric permit 100
match ip address pve_openfabric_tb99_ips
set src 192.168.248.12
exit
!
ip protocol openfabric route-map pve_openfabric
!
!
line vty
!
ceph config
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.248.0/24
fsid = dd1020e0-2316-436a-abd8-45ffe33aa28c
mon_allow_pool_delete = true
mon_host = 192.168.248.12 192.168.248.13 192.168.248.14
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 2
public_network = 192.168.248.0/24
EDIT:
Interestingly.... just reading the output above, I notice the osd_pool_default_min_size and osd_pool_default_size both at 2. The pool was switched to 3/2 via the GUI. It started at 2/2 while I was migrating data from the last host into the pool, before moving the 3rd host over to ceph. I'll look at that separately.
Last edited: