Ceph conf

Sep 24, 2016
9
1
8
36
Hello

m creating a Ceph cluster and wish to know the configuration set up at proxmox (size, min_size, pg_num, crush)
I want to have a single replication (I want to consume the least amount of space, while having redundancy, like RAID 5 ?)
I have, for now, 3 servers each having 12 OSD 4TB SAS (36 total), all in 10Gbps

Thx u very much

Guillaume
 
Hi,
you can tune some parameters in ceph.conf.
Do you use "write back" as cache-setting for VM-rdb-volumes?

Do you use "the right" mount options?
For an good write speed is an journal-SSD (not an consumer model) a good decision.

Here are some settings below [osd] which have an performance impact
Code:
osd_mkfs_type = xfs
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
osd mkfs options xfs = "-f -i size=2048"
osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_max_backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1 #disk threads, which are used to perform background disk intensive OSD operations such as scrubbing

filestore_op_threads = 4
osd_enable_op_tracker = false

osd_op_num_threads_per_shard = 1  # default 2
osd_op_num_shards = 10  # default 5

osd_disk_thread_ioprio_class  = idle
osd_disk_thread_ioprio_priority = 7

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
Udo
 
Hi,

My ceph.conf is
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 26f11204-d540-455e-869c-0d43ab729d6b
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24


[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

osd_mkfs_type = xfs
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
osd mkfs options xfs = "-f -i size=2048"

osd_mount_options_ext4 = "user_xattr,rw,noatime,nodiratime"
osd_mkfs_type = ext4
osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0

osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_max_backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1

filestore_op_threads = 4
osd_enable_op_tracker = false

osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10 # default 5

osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0

[mon.2]
host = CEPH-02
mon addr = 10.10.10.2:6789

[mon.1]
host = CEPH-03
mon addr = 10.10.10.3:6789

[mon.0]
host = CEPH-01
mon addr = 10.10.10.1:6789


All LXC dosen't have any problem. Performance are good.
It's only on KVM.
When i activate writeback on the vm's disk, there is no difference. Same results.
 
Hi

root@CEPH-01:~/cephdeploy# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",


With writeback or writethrough on disk, no changes
Read speed is perfect
root@debian:/home# cat /dev/sda | pv > /dev/null
2,81GiO 0:00:11 [ 693MiB/s] [ <=>

But write...

dd if=/dev/zero of=/root/testfoim bs=1M count=100 oflag=direct
100+0 enregistrements lus
100+0 enregistrements écrits
104857600 octets (105 MB) copiés, 8,74746 s, 12,0 MB/s

Just tested with "echo 4096 > /sys/block/sda/queue/read_ahead_kb"
Same results.
 
Udo if u switch off the logs u not afraid if something happen and u dont know what?

http://docs.ceph.com/docs/jewel/rados/troubleshooting/log-and-debug/

In general, the logs in-memory are not sent to the output log unless:
  • a fatal signal is raised or
  • an assert in source code is triggered or
  • upon requested. Please consult document on admin socket for more details.
Hi,
AFAIK this is no problem. I use debug off for years in an 8 node cluster (+100 OSDs) without trouble. The hint came from the ceph-users mailing list.
If something goes wrong, you woll get an info about that ;-)

Of course is monitoring of ceph health mandantory (with or without debug-logs).

Udo
 
Hi

root@CEPH-01:~/cephdeploy# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",


With writeback or writethrough on disk, no changes
Read speed is perfect
root@debian:/home# cat /dev/sda | pv > /dev/null
2,81GiO 0:00:11 [ 693MiB/s] [ <=>
Hi,
looks for me, that you measure caching. If the data, which you read inside the VM, are cached on the osd-nodes (or even better on the host) the results are fast. If the date aren't in cache the reading is much slower and an bigger read_ahaed (up to 16MB) will help a little bit - but not more than 50%.
If you clean the cache on all nodes (OSD-Nodes, PVE-host + VM) with "echo 3 > /proc/sys/vm/drop_caches" are your read speed changed?
But write...

dd if=/dev/zero of=/root/testfoim bs=1M count=100 oflag=direct
100+0 enregistrements lus
100+0 enregistrements écrits
104857600 octets (105 MB) copiés, 8,74746 s, 12,0 MB/s

Just tested with "echo 4096 > /sys/block/sda/queue/read_ahead_kb"
Same results.
read_ahead has nothing to do with writes...

About your OSDs... in your ceph.conf you have osd_mkfs_type twice - as xfs + ext4!
Only one should be there (is the filesystem format which is used by creating an new OSD).
Do you run your OSDs with ext4 or xfs? I had good experiences with ext4 but now is the ext4-support dropped by ceph and you should go with xfs (till bluestore is finished).
How are your actual mount-parameter (in ceph.conf are the parameter for the next osd-mount) "mount | grep ceph" ?
How looks the write-performance with one thread from the host?
Code:
rados bench -p rbd -t 1 60 write --no-cleanup
Udo