Degraded write performance after upgrade to Ceph Octopus

ksainc · Dec 30, 2020

Hi Everyone,

We have a small 3 nodes PVE/Ceph cluster that has been running great for over a year, up until this week. We recently upgraded from Nautilus to Octopus, using the Proxmox Guide, the upgrade went of without any issues, but soon after we noticed very poor write performance compared to Nautilus. Before the upgrade we wear getting a reasonable 60-80MB/s, but since the upgrade that number has fallen drastically down to 0.6MB/s to 1MB/s, also this is only affecting the writing of new data, reading, backfill and recovery all work at full speeds.

Any help would be greatly appreciated.

Here are is the specs and configs:

Hardware x4 (One hot spare currently idel)
Intel Xeon D-1541 Processors
2x 250GB System SDD's
4x 12TB or 14TB Storage HDD's
Gigabit NIC's

Ceph Configuration

[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = 5a421fb2-778e-4148-947f-e95f97d2be68
mon_allow_pool_delete = true
mon_host = (redacted)
osd_memory_target = 1073741824
osd_pool_default_min_size = 2
osd_pool_default_size = 3

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.burton]
host = burton
mds standby for name = pve

[mds.kamal]
host = kamal
mds_standby_for_name = pve

[mds.nagata]
host = nagata
mds_standby_for_name = pve

[mon.burton]
public_addr = (redacted)

[mon.holden]
public_addr = (redacted)

[mon.kamal]
public_addr = (redacted)

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host holden {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 43.652
alg straw2
hash 0 # rjenkins1
item osd.0 weight 10.913
item osd.1 weight 10.913
item osd.2 weight 10.913
item osd.3 weight 10.913
}
host kamal {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 43.652
alg straw2
hash 0 # rjenkins1
item osd.7 weight 10.913
item osd.5 weight 10.913
item osd.6 weight 10.913
item osd.4 weight 10.913
}
host nagata {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 50.934
alg straw2
hash 0 # rjenkins1
item osd.8 weight 12.733
item osd.9 weight 12.733
item osd.10 weight 12.733
item osd.11 weight 12.733
}
host burton {
id -9 # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 38.200
alg straw2
hash 0 # rjenkins1
item osd.12 weight 12.733
item osd.13 weight 12.733
item osd.14 weight 12.733
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 176.438
alg straw2
hash 0 # rjenkins1
item holden weight 43.652
item kamal weight 43.652
item nagata weight 50.934
item burton weight 38.200
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map

Thanks again
Sebastian

roms2000 · Jan 18, 2021

We did have the same problem.
We had to recreate all osd, node by node, to have a correct write performance 50-60Mb/s only for a 3 nodes cluster with gigabit network and hdd only.
Also, we changed snap trimming variables to get a reasonnable read/recovery while osd get recovered.

spirit · Jan 18, 2021

could you try to add "bluefs_buffered_io=true" in ceph.conf , and restart osd ?

(this have changed between nautilus && octopus because of a potentiial bug with swap.

Search

Search

Degraded write performance after upgrade to Ceph Octopus

ksainc

Member

roms2000

Active Member

spirit

Distinguished Member

We value your privacy