Hello everyone!
Not new to Ceph config/deploy but have a number of new variables compared to my previous experience setting up Ceph.
Here's a lay of the land to get started:
We have 8 m610 blades in an m1000e chassis. The chassis has a 1GbE passthrough module in fabric A1, 1GbE switch module in A2, Infiniband switch in B1/C1, and a SFP+ switch module in B2, C2 has a blank. The m610s all have an infiniband mezzanine card installed. Only a portion of the blades have 10GbE mezzanine cards so for the purposes of this deploy we are ignoring any and all SFP+ interfaces.
Proxmox is installed on USB keys on each blade's internal USB port (yes, I'm aware this is not preferred due to the possibility of device burnout, and we made this decision regardless), and each blade has at least one 840GiB 10K SAS drive (some have two). Each node has been configured with the appropriate infiniband support and IPOIB has been configured and confirmed to work.
Public network is sitting on 10.3.2.0/8
Cluster network is sitting on 10.3.5.0/24
Core proxmox functions all seem to be healthy. All nodes successfully joined the cluster, and we are able to manage all of the nodes through the unified cluster interface without issue.
Ceph installed to all of the units through the GUI; however, initial configuration was not successful due to the networking detecting multiple possible options for cluster network and so we had to manually initialize the ceph cluster on CLI and manually configure monitors with --mon-address
Of the eight nodes we currently have four monitors configured and two managers. We initially had a manager running on node1 (pxmox-s02) but that node was behaving somewhat poorly so we created on two other nodes (pxmox-s03 and pxmox-s04) and destroyed the monitor on pxmox-s02. Here's our first issue; that monitor continues to show in the GUI and is currently showing as 'active' while the other two are in standby. Attempts to destroy this through the GUI provide an 'entry has no host' error and no entry is in /var/lib/ceph/mgr for this manager.
Here is the configuration displayed in the ceph panel:
We are frequently seeing timeouts. Often times commands must be rerun as the first try results in a "get timeout" error.
OSDs have been created, via the GUI, for each drive on each blade. Operation completes successfully but once completed the OSDs do not show up in the OSD subsection of the Ceph group in the GUI. Additionally, Ceph throws a warning that OSD count 0 < osd_pool_default_size 3. The OSDs do show in the crushmap, included below.
At this point we're stuck on how to proceed. We were able to create a Ceph pool, and a CephFS MDS but cannot create a CephFS pool as the process times out.
Any assistance and guidance would be highly appreciated!
Not new to Ceph config/deploy but have a number of new variables compared to my previous experience setting up Ceph.
Here's a lay of the land to get started:
We have 8 m610 blades in an m1000e chassis. The chassis has a 1GbE passthrough module in fabric A1, 1GbE switch module in A2, Infiniband switch in B1/C1, and a SFP+ switch module in B2, C2 has a blank. The m610s all have an infiniband mezzanine card installed. Only a portion of the blades have 10GbE mezzanine cards so for the purposes of this deploy we are ignoring any and all SFP+ interfaces.
Proxmox is installed on USB keys on each blade's internal USB port (yes, I'm aware this is not preferred due to the possibility of device burnout, and we made this decision regardless), and each blade has at least one 840GiB 10K SAS drive (some have two). Each node has been configured with the appropriate infiniband support and IPOIB has been configured and confirmed to work.
Public network is sitting on 10.3.2.0/8
Cluster network is sitting on 10.3.5.0/24
Core proxmox functions all seem to be healthy. All nodes successfully joined the cluster, and we are able to manage all of the nodes through the unified cluster interface without issue.
Ceph installed to all of the units through the GUI; however, initial configuration was not successful due to the networking detecting multiple possible options for cluster network and so we had to manually initialize the ceph cluster on CLI and manually configure monitors with --mon-address
Of the eight nodes we currently have four monitors configured and two managers. We initially had a manager running on node1 (pxmox-s02) but that node was behaving somewhat poorly so we created on two other nodes (pxmox-s03 and pxmox-s04) and destroyed the monitor on pxmox-s02. Here's our first issue; that monitor continues to show in the GUI and is currently showing as 'active' while the other two are in standby. Attempts to destroy this through the GUI provide an 'entry has no host' error and no entry is in /var/lib/ceph/mgr for this manager.
Here is the configuration displayed in the ceph panel:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.3.5.13/24
fsid = ae835579-4b40-47e3-a215-9ce2a2edc319
mon_allow_pool_delete = true
mon_host = 10.3.5.13 10.3.5.20 10.3.5.21 10.3.5.14
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.3.2.13/8
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.pxmox-s05]
host = pxmox-s05
mds standby for name = pve
[mds.pxmox-s10]
host = pxmox-s10
mds_standby_for_name = pve
[mon.pxmox-s03]
public_addr = 10.3.5.13
[mon.pxmox-s04]
public_addr = 10.3.5.14
[mon.pxmox-s10]
public_addr = 10.3.5.20
[mon.pxmox-s11]
public_addr = 10.3.5.21
We are frequently seeing timeouts. Often times commands must be rerun as the first try results in a "get timeout" error.
OSDs have been created, via the GUI, for each drive on each blade. Operation completes successfully but once completed the OSDs do not show up in the OSD subsection of the Ceph group in the GUI. Additionally, Ceph throws a warning that OSD count 0 < osd_pool_default_size 3. The OSDs do show in the crushmap, included below.
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 6 osd.6 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host pxmox-s03 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 1.637
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.819
item osd.1 weight 0.819
}
host pxmox-s04 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 1.637
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.819
item osd.3 weight 0.819
}
host pxmox-s02 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 0.819
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.819
}
host pxmox-s13 {
id -9 # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 0.819
alg straw2
hash 0 # rjenkins1
item osd.6 weight 0.819
}
host pxmox-s10 {
id -11 # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 0.819
alg straw2
hash 0 # rjenkins1
item osd.9 weight 0.819
}
host pxmox-s12 {
id -13 # do not change unnecessarily
id -14 class hdd # do not change unnecessarily
# weight 0.819
alg straw2
hash 0 # rjenkins1
item osd.11 weight 0.819
}
host pxmox-s11 {
id -15 # do not change unnecessarily
id -16 class hdd # do not change unnecessarily
# weight 0.819
alg straw2
hash 0 # rjenkins1
item osd.10 weight 0.819
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 7.368
alg straw2
hash 0 # rjenkins1
item pxmox-s03 weight 1.637
item pxmox-s04 weight 1.637
item pxmox-s02 weight 0.819
item pxmox-s13 weight 0.819
item pxmox-s10 weight 0.819
item pxmox-s12 weight 0.819
item pxmox-s11 weight 0.819
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
At this point we're stuck on how to proceed. We were able to create a Ceph pool, and a CephFS MDS but cannot create a CephFS pool as the process times out.
Any assistance and guidance would be highly appreciated!