CEPH Tuning

KLifeCorp

Active Member
Feb 13, 2017
1
0
41
49
I am building a new CEPH PMX cluster for someone and I am not seeing the performance I would expect. Any thoughts would be welcomed, I am pushing for a CEPH setup over them creating 6x individual nodes since the app needs to have HA capabilities. Worst case we will go to ZFS replicated, but seems a shame to waste such a good opportunity.


6 Nodes: 2xEPYC 7452, 512GB, 2x10Gbps PMX network, 2x 40Gbps Storage network, 2 NMVE OS drives, 6x Intel SSDPE2KX040T8 NVME for CEPH
They want to run VMs on the nodes with HA/Replication which I know will degrade performance some, but I am not seeing better than SSD performance at this time.
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
ceph version 15.2.8 (8b89984e92223ec320fb4c70589c39f384c86985) octopus (stable)
NICs: 9000 MTU bonded as active-backup currently
dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync
989MB/S - 1GB/S

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

read: IOPS=24.1k, BW=94.1MiB/s (98.7MB/s)(3070MiB/32613msec)
bw ( KiB/s): min=85008, max=125832, per=100.00%, avg=96473.95, stdev=11087.47, samples=65
iops : min=21252, max=31458, avg=24118.42, stdev=2771.86, samples=65
write: IOPS=8053, BW=31.5MiB/s (32.0MB/s)(1026MiB/32613msec); 0 zone resets
bw ( KiB/s): min=27553, max=41384, per=100.00%, avg=32244.46, stdev=3712.44, samples=65
iops : min= 6888, max=10346, avg=8061.05, stdev=928.11, samples=65
 
is your fio is done inside a vm ?

for a vm, you'll have speed limited by core frequency. by default, qemu use only 1 core all for disk.

I'm able to reach by vm , around 80000 4k read, and 20000 4k write with 3ghz cpu both on client and servers.

if can have more iops inside 1 vm, by using virtio-scsi-single controller + iothread option on disk. Then with multiple disks inside the vms, it'll scale.

Also, enabling vm writeback should help for sequential write.


And of course, this will scale with multiple vms.


you can also have a little bit more iops, disabling debug in ceph.conf
Code:
[global]
 debug asok = 0/0
 debug auth = 0/0
 debug buffer = 0/0
 debug client = 0/0
 debug context = 0/0
 debug crush = 0/0
 debug filer = 0/0
 debug filestore = 0/0
 debug finisher = 0/0
 debug heartbeatmap = 0/0
 debug journal = 0/0
 debug journaler = 0/0
 debug lockdep = 0/0
 debug mds = 0/0
 debug mds balancer = 0/0
 debug mds locker = 0/0
 debug mds log = 0/0
 debug mds log expire = 0/0
 debug mds migrator = 0/0
 debug mon = 0/0
 debug monc = 0/0
 debug ms = 0/0
 debug objclass = 0/0
 debug objectcacher = 0/0
 debug objecter = 0/0
 debug optracker = 0/0
 debug osd = 0/0
 debug paxos = 0/0
 debug perfcounter = 0/0
 debug rados = 0/0
 debug rbd = 0/0
 debug rgw = 0/0
 debug throttle = 0/0
 debug timer = 0/0
 debug tp = 0/0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!