hello
I have 6 host and every host has 4 NIC 2x10G and 2x40G.
On 10G there is bond with LACP and bridge over bond0.
On 40G there is bond with active/backup and bridge over bond1. this network is for CEPH. On 40G interfaces I set MTU 9000.
auto lo
iface lo inet loopback
auto ens6f0
iface ens6f0 inet manual
auto ens6f1
iface ens6f1 inet manual
auto ens1f0
iface ens1f0 inet manual
mtu 9000
auto ens1f1
iface ens1f1 inet manual
mtu 9000
auto bond0
iface bond0 inet manual
bond-slaves ens6f0 ens6f1
bond-miimon 100
bond-mode 802.3ad
auto bond1
iface bond1 inet manual
bond-slaves ens1f0 ens1f1
bond-miimon 100
bond-mode active-backup
bond-primary ens1f0
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.100.10/24
gateway 192.168.100.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
auto vmbr1
iface vmbr1 inet static
address 10.15.10.10/24
bridge-ports bond1
bridge-stp off
bridge-fd 0
When I created CEPH I've chosen for public network 40G 10.15.10.10 address from master node) and for cluster network I've choose 192.168.100.10/24 (address from master node)
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.100.10/24
fsid = fcb7cb0b-0444-4006-b65e-3c1bc7910b68
mon_allow_pool_delete = true
mon_host = 10.15.10.10 10.15.10.11 10.15.10.12 10.15.10.13 10.15.10.14 10.15.10.15
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.15.10.10/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.master]
public_addr = 10.15.10.10
[mon.slave01]
public_addr = 10.15.10.11
[mon.slave02]
public_addr = 10.15.10.12
[mon.slave03]
public_addr = 10.15.10.13
[mon.slave04]
public_addr = 10.15.10.14
[mon.slave05]
public_addr = 10.15.10.15
My disks for ceph are: Intel SSD DC P4510 4.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC (SSDPE2KX040T8) and like I said every nod has 2 of them.
I am just curious because today we have tested and remove only one disk and put another one to see how fast rebuild of data will last. It took more then 2H.
Do I need to change something every proposal is good.
Maybe to remove cluster network and leave only public network or should I keep this kind of configuration.
I have 6 host and every host has 4 NIC 2x10G and 2x40G.
On 10G there is bond with LACP and bridge over bond0.
On 40G there is bond with active/backup and bridge over bond1. this network is for CEPH. On 40G interfaces I set MTU 9000.
auto lo
iface lo inet loopback
auto ens6f0
iface ens6f0 inet manual
auto ens6f1
iface ens6f1 inet manual
auto ens1f0
iface ens1f0 inet manual
mtu 9000
auto ens1f1
iface ens1f1 inet manual
mtu 9000
auto bond0
iface bond0 inet manual
bond-slaves ens6f0 ens6f1
bond-miimon 100
bond-mode 802.3ad
auto bond1
iface bond1 inet manual
bond-slaves ens1f0 ens1f1
bond-miimon 100
bond-mode active-backup
bond-primary ens1f0
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.168.100.10/24
gateway 192.168.100.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
auto vmbr1
iface vmbr1 inet static
address 10.15.10.10/24
bridge-ports bond1
bridge-stp off
bridge-fd 0
When I created CEPH I've chosen for public network 40G 10.15.10.10 address from master node) and for cluster network I've choose 192.168.100.10/24 (address from master node)
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.100.10/24
fsid = fcb7cb0b-0444-4006-b65e-3c1bc7910b68
mon_allow_pool_delete = true
mon_host = 10.15.10.10 10.15.10.11 10.15.10.12 10.15.10.13 10.15.10.14 10.15.10.15
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.15.10.10/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.master]
public_addr = 10.15.10.10
[mon.slave01]
public_addr = 10.15.10.11
[mon.slave02]
public_addr = 10.15.10.12
[mon.slave03]
public_addr = 10.15.10.13
[mon.slave04]
public_addr = 10.15.10.14
[mon.slave05]
public_addr = 10.15.10.15
My disks for ceph are: Intel SSD DC P4510 4.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC (SSDPE2KX040T8) and like I said every nod has 2 of them.
I am just curious because today we have tested and remove only one disk and put another one to see how fast rebuild of data will last. It took more then 2H.
Do I need to change something every proposal is good.
Maybe to remove cluster network and leave only public network or should I keep this kind of configuration.