Ceph conf

Sep 24, 2016
9
1
8
35
Hello

m creating a Ceph cluster and wish to know the configuration set up at proxmox (size, min_size, pg_num, crush)
I want to have a single replication (I want to consume the least amount of space, while having redundancy, like RAID 5 ?)
I have, for now, 3 servers each having 12 OSD 4TB SAS (36 total), all in 10Gbps

Thx u very much

Guillaume
 
Hi,
you can tune some parameters in ceph.conf.
Do you use "write back" as cache-setting for VM-rdb-volumes?

Do you use "the right" mount options?
For an good write speed is an journal-SSD (not an consumer model) a good decision.

Here are some settings below [osd] which have an performance impact
Code:
osd_mkfs_type = xfs
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
osd mkfs options xfs = "-f -i size=2048"
osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_max_backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1 #disk threads, which are used to perform background disk intensive OSD operations such as scrubbing

filestore_op_threads = 4
osd_enable_op_tracker = false

osd_op_num_threads_per_shard = 1  # default 2
osd_op_num_shards = 10  # default 5

osd_disk_thread_ioprio_class  = idle
osd_disk_thread_ioprio_priority = 7

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
Udo
 
Hi,

My ceph.conf is
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 26f11204-d540-455e-869c-0d43ab729d6b
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24


[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

osd_mkfs_type = xfs
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
osd mkfs options xfs = "-f -i size=2048"

osd_mount_options_ext4 = "user_xattr,rw,noatime,nodiratime"
osd_mkfs_type = ext4
osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0

osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_max_backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1

filestore_op_threads = 4
osd_enable_op_tracker = false

osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10 # default 5

osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0

[mon.2]
host = CEPH-02
mon addr = 10.10.10.2:6789

[mon.1]
host = CEPH-03
mon addr = 10.10.10.3:6789

[mon.0]
host = CEPH-01
mon addr = 10.10.10.1:6789


All LXC dosen't have any problem. Performance are good.
It's only on KVM.
When i activate writeback on the vm's disk, there is no difference. Same results.
 
Hi

root@CEPH-01:~/cephdeploy# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",


With writeback or writethrough on disk, no changes
Read speed is perfect
root@debian:/home# cat /dev/sda | pv > /dev/null
2,81GiO 0:00:11 [ 693MiB/s] [ <=>

But write...

dd if=/dev/zero of=/root/testfoim bs=1M count=100 oflag=direct
100+0 enregistrements lus
100+0 enregistrements écrits
104857600 octets (105 MB) copiés, 8,74746 s, 12,0 MB/s

Just tested with "echo 4096 > /sys/block/sda/queue/read_ahead_kb"
Same results.
 
Udo if u switch off the logs u not afraid if something happen and u dont know what?

http://docs.ceph.com/docs/jewel/rados/troubleshooting/log-and-debug/

In general, the logs in-memory are not sent to the output log unless:
  • a fatal signal is raised or
  • an assert in source code is triggered or
  • upon requested. Please consult document on admin socket for more details.
Hi,
AFAIK this is no problem. I use debug off for years in an 8 node cluster (+100 OSDs) without trouble. The hint came from the ceph-users mailing list.
If something goes wrong, you woll get an info about that ;-)

Of course is monitoring of ceph health mandantory (with or without debug-logs).

Udo
 
Hi

root@CEPH-01:~/cephdeploy# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",


With writeback or writethrough on disk, no changes
Read speed is perfect
root@debian:/home# cat /dev/sda | pv > /dev/null
2,81GiO 0:00:11 [ 693MiB/s] [ <=>
Hi,
looks for me, that you measure caching. If the data, which you read inside the VM, are cached on the osd-nodes (or even better on the host) the results are fast. If the date aren't in cache the reading is much slower and an bigger read_ahaed (up to 16MB) will help a little bit - but not more than 50%.
If you clean the cache on all nodes (OSD-Nodes, PVE-host + VM) with "echo 3 > /proc/sys/vm/drop_caches" are your read speed changed?
But write...

dd if=/dev/zero of=/root/testfoim bs=1M count=100 oflag=direct
100+0 enregistrements lus
100+0 enregistrements écrits
104857600 octets (105 MB) copiés, 8,74746 s, 12,0 MB/s

Just tested with "echo 4096 > /sys/block/sda/queue/read_ahead_kb"
Same results.
read_ahead has nothing to do with writes...

About your OSDs... in your ceph.conf you have osd_mkfs_type twice - as xfs + ext4!
Only one should be there (is the filesystem format which is used by creating an new OSD).
Do you run your OSDs with ext4 or xfs? I had good experiences with ext4 but now is the ext4-support dropped by ceph and you should go with xfs (till bluestore is finished).
How are your actual mount-parameter (in ceph.conf are the parameter for the next osd-mount) "mount | grep ceph" ?
How looks the write-performance with one thread from the host?
Code:
rados bench -p rbd -t 1 60 write --no-cleanup
Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!