from that link / very interesting.
Quick guide for optimizing Ceph for random reads/writes:
- Only use SSDs and NVMe with supercaps. A hint: 99 % of desktop SSDs/NVMe don’t have supercaps.
yes sure. don't use consumer ssd for ceph (or zfs). you need fast sync for the bluestore journal/metadata
- Disable their cache with hdparm -W 0.
depend of model, you need to test it.
- Disable powersave: governor=performance, cpupower idle-set -D 0
definitively.
I'm using:
idle=poll intel_idle.max_cstate=0 processor.max_cstate=1 in my grub (works for intel/amd)
- Disable signatures: cephx_require_signatures = false cephx_cluster_require_signatures = false cephx_sign_messages = false (and use -o nocephx_require_signatures,nocephx_sign_messages for rbd map and cepnhfs kernel mounts)
It's better than in past, but you can still have small percent of improvement. (be careful, because you don't have authentification anymore, if you want to share cephfs for example)
- For good SSDs and NVMes: set min_alloc_size=4096, prefer_deferred_size_ssd=0 (BEFORE deploying OSDs)
it's the default since 2-3 releases
- At least until Nautilus: [global] debug objecter = 0/0 (there is a big client-side slowdown)
I'm still doing it, disabling all debug to be ure
- Try to disable rbd cache in the userspace driver (QEMU options cache=none)
I disagree. In past (nautilus?), it was slower for read, because of global lock in the layer.
Since Octopus, a new implementation is done, (writearound), as now read are as fast than with cache=none.
(and writeback really help for buffered write
if you want a good boost (default for next ceph reef release), some bluestore tuning
Code:
[osd]
bluestore rocksdb options = compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB
bluestore_allocator = bitmap
and
Code:
debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
I'm also currently to add a patch in proxmox, to reenable tcmalloc memory allocator for qemu, it give me 30% performance gain (60k iops -> 90kiops ) randread 4k with a single virtio disk.
I have sent patch to pve-devel mailing list, I'm waiting for merge.