Ceph conf

guillaume34500 · Sep 26, 2016

Hello

m creating a Ceph cluster and wish to know the configuration set up at proxmox (size, min_size, pg_num, crush)
I want to have a single replication (I want to consume the least amount of space, while having redundancy, like RAID 5 ?)
I have, for now, 3 servers each having 12 OSD 4TB SAS (36 total), all in 10Gbps

Thx u very much

Guillaume

proxtest · Sep 26, 2016

Replica 2 is raid1! BAD IDEA!
U should use at least replica 3!

http://ceph.com/pgcalc/

guillaume34500 · Sep 26, 2016

Ok thx.
But i have really bad performance.
17mb/s write in QEMU, but LCX have good performance.
Any idea ?

udo · Sep 26, 2016

Hi,
you can tune some parameters in ceph.conf.
Do you use "write back" as cache-setting for VM-rdb-volumes?

Do you use "the right" mount options?
For an good write speed is an journal-SSD (not an consumer model) a good decision.

Here are some settings below [osd] which have an performance impact

Code:

osd_mkfs_type = xfs
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
osd mkfs options xfs = "-f -i size=2048"
osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_max_backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1 #disk threads, which are used to perform background disk intensive OSD operations such as scrubbing

filestore_op_threads = 4
osd_enable_op_tracker = false

osd_op_num_threads_per_shard = 1  # default 2
osd_op_num_shards = 10  # default 5

osd_disk_thread_ioprio_class  = idle
osd_disk_thread_ioprio_priority = 7

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0

Udo

proxtest · Sep 27, 2016

Udo if u switch off the logs u not afraid if something happen and u dont know what?

http://docs.ceph.com/docs/jewel/rados/troubleshooting/log-and-debug/

In general, the logs in-memory are not sent to the output log unless:

a fatal signal is raised or
an assert in source code is triggered or
upon requested. Please consult document on admin socket for more details.

guillaume34500 · Sep 27, 2016

Hi,

My ceph.conf is
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
filestore xattr use omap = true
fsid = 26f11204-d540-455e-869c-0d43ab729d6b
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.10.10.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

osd_mkfs_type = xfs
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k,allocsize=4M"
osd mkfs options xfs = "-f -i size=2048"

osd_mount_options_ext4 = "user_xattr,rw,noatime,nodiratime"
osd_mkfs_type = ext4
osd_mkfs_options_ext4 = -J size=1024 -E lazy_itable_init=0,lazy_journal_init=0

osd_scrub_load_threshold = 2.5
filestore_max_sync_interval = 10
osd_max_backfills = 1
osd recovery max active = 1
osd_op_threads = 4
osd_disk_threads = 1

filestore_op_threads = 4
osd_enable_op_tracker = false

osd_op_num_threads_per_shard = 1
osd_op_num_shards = 10 # default 5

osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7

debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0

[mon.2]
host = CEPH-02
mon addr = 10.10.10.2:6789

[mon.1]
host = CEPH-03
mon addr = 10.10.10.3:6789

[mon.0]
host = CEPH-01
mon addr = 10.10.10.1:6789

All LXC dosen't have any problem. Performance are good.
It's only on KVM.
When i activate writeback on the vm's disk, there is no difference. Same results.

guillaume34500 · Sep 27, 2016

Hi

root@CEPH-01:~/cephdeploy# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",

With writeback or writethrough on disk, no changes
Read speed is perfect
root@debian:/home# cat /dev/sda | pv > /dev/null
2,81GiO 0:00:11 [ 693MiB/s] [ <=>

But write...

dd if=/dev/zero of=/root/testfoim bs=1M count=100 oflag=direct
100+0 enregistrements lus
100+0 enregistrements écrits
104857600 octets (105 MB) copiés, 8,74746 s, 12,0 MB/s

Just tested with "echo 4096 > /sys/block/sda/queue/read_ahead_kb"
Same results.

udo · Sep 28, 2016

proxtest said:
Udo if u switch off the logs u not afraid if something happen and u dont know what?

http://docs.ceph.com/docs/jewel/rados/troubleshooting/log-and-debug/

In general, the logs in-memory are not sent to the output log unless:

a fatal signal is raised or

an assert in source code is triggered or

upon requested. Please consult document on admin socket for more details.

Hi,
AFAIK this is no problem. I use debug off for years in an 8 node cluster (+100 OSDs) without trouble. The hint came from the ceph-users mailing list.
If something goes wrong, you woll get an info about that ;-)

Of course is monitoring of ceph health mandantory (with or without debug-logs).

Udo

udo · Sep 28, 2016

guillaume34500 said:
Hi

root@CEPH-01:~/cephdeploy# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | grep rbd_cache
"rbd_cache": "true",
"rbd_cache_writethrough_until_flush": "true",
"rbd_cache_size": "33554432",
"rbd_cache_max_dirty": "25165824",
"rbd_cache_target_dirty": "16777216",
"rbd_cache_max_dirty_age": "1",
"rbd_cache_max_dirty_object": "0",
"rbd_cache_block_writes_upfront": "false",

With writeback or writethrough on disk, no changes
Read speed is perfect
root@debian:/home# cat /dev/sda | pv > /dev/null
2,81GiO 0:00:11 [ 693MiB/s] [ <=>

Hi,
looks for me, that you measure caching. If the data, which you read inside the VM, are cached on the osd-nodes (or even better on the host) the results are fast. If the date aren't in cache the reading is much slower and an bigger read_ahaed (up to 16MB) will help a little bit - but not more than 50%.
If you clean the cache on all nodes (OSD-Nodes, PVE-host + VM) with "echo 3 > /proc/sys/vm/drop_caches" are your read speed changed?

But write...

dd if=/dev/zero of=/root/testfoim bs=1M count=100 oflag=direct
100+0 enregistrements lus
100+0 enregistrements écrits
104857600 octets (105 MB) copiés, 8,74746 s, 12,0 MB/s

Just tested with "echo 4096 > /sys/block/sda/queue/read_ahead_kb"
Same results.

read_ahead has nothing to do with writes...

About your OSDs... in your ceph.conf you have osd_mkfs_type twice - as xfs + ext4!
Only one should be there (is the filesystem format which is used by creating an new OSD).
Do you run your OSDs with ext4 or xfs? I had good experiences with ext4 but now is the ext4-support dropped by ceph and you should go with xfs (till bluestore is finished).
How are your actual mount-parameter (in ceph.conf are the parameter for the next osd-mount) "mount | grep ceph" ?
How looks the write-performance with one thread from the host?

Code:

rados bench -p rbd -t 1 60 write --no-cleanup

Udo

Search

Search

Ceph conf

guillaume34500

Renowned Member

proxtest

Active Member

guillaume34500

Renowned Member

udo

Distinguished Member

proxtest

Active Member

guillaume34500

Renowned Member

guillaume34500

Renowned Member

udo

Distinguished Member

udo

Distinguished Member

We value your privacy