Currently, I have a two-node PVE cluster, and one of those two nodes (srv00) has 5 HDDs devoted to a Ceph RBD and CephFS. The second node (srv01) now has 5 identical disks that I'd like to add to the cluster. By some time next week (barring any shipping delays), I'll have a third node to add to the PVE cluster, with another 5 disks just like the first two.
The current ceph.conf:
And the current crush map:
What I'd like to do is the following, in two stages:
Right now, I'd like to create OSDs on the disks in srv01 and add them to the pools, while switching from osd-level to host-level replication, changing to a default 2/minimum 1 replication settings. I am aware that, long-term, this is a "Very Bad Idea" -- but it's temporary, the data that lives in those pools is backed up (though it would be a major PITA to recover from a catastrophic failure). To achieve this, how should my crush map and/or config file be changed, assuming I want a resiliency target allowing one of the two hosts to be down at any time (maintenance, or whatever) AND one of the OSDs on the surviving host to be offline? If I'm understanding ceph replication correctly, this should permit me the same amount of available storage I have currently, but with somewhat better resilience.
When the remaining parts of the third host arrive, srv02 will be added to the PVE cluster, and 5 identical disks will then be available to add to as OSDs the pools. At that point, I would change my minimum replicas to 2, leaving the default at 2 -- which (again, if I'm understanding ceph replication correctly) should double my available storage, leaving my resiliency targets at one host down AND one OSD offline on each surviving host. What crush map and/or config changes would need to be made at this stage?
And finally, can changes to the crush map and/or configuration files be made while the pools are in use? Or is there expected down time?
The current ceph.conf:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.200.254/24
fsid = b2ad983c-5b4d-443d-93f8-a4be22300341
mon_allow_pool_delete = true
mon_host = 192.168.3.254 192.168.3.253
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 2
public_network = 192.168.3.254/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.srv00]
host = srv00
mds_standby_for_name = pve
[mds.srv01]
host = srv01
mds_standby_for_name = pve
[mon.srv00]
public_addr = 192.168.3.254
[mon.srv01]
public_addr = 192.168.3.253
And the current crush map:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host srv00 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 36.38280
alg straw2
hash 0 # rjenkins1
item osd.0 weight 10.91408
item osd.1 weight 10.91408
item osd.2 weight 5.45798
item osd.3 weight 5.45798
item osd.4 weight 3.63869
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 36.38286
alg straw2
hash 0 # rjenkins1
item srv00 weight 36.38286
}
# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type osd
step emit
}
# end crush map
What I'd like to do is the following, in two stages:
Right now, I'd like to create OSDs on the disks in srv01 and add them to the pools, while switching from osd-level to host-level replication, changing to a default 2/minimum 1 replication settings. I am aware that, long-term, this is a "Very Bad Idea" -- but it's temporary, the data that lives in those pools is backed up (though it would be a major PITA to recover from a catastrophic failure). To achieve this, how should my crush map and/or config file be changed, assuming I want a resiliency target allowing one of the two hosts to be down at any time (maintenance, or whatever) AND one of the OSDs on the surviving host to be offline? If I'm understanding ceph replication correctly, this should permit me the same amount of available storage I have currently, but with somewhat better resilience.
When the remaining parts of the third host arrive, srv02 will be added to the PVE cluster, and 5 identical disks will then be available to add to as OSDs the pools. At that point, I would change my minimum replicas to 2, leaving the default at 2 -- which (again, if I'm understanding ceph replication correctly) should double my available storage, leaving my resiliency targets at one host down AND one OSD offline on each surviving host. What crush map and/or config changes would need to be made at this stage?
And finally, can changes to the crush map and/or configuration files be made while the pools are in use? Or is there expected down time?