Well, kinda, i'm assuming e100 has less then 32 GB/ram per node (as thats benchmark examples from my 3-node Cluster with <= 32 GB). On top of that the read values is what i'd expect with 7 HDD based OSD's on 3 nodes. where the single OSD benchmarks are what he posted.Hi Q-wulf,
sure? I started also with such bad read performance on ceph and doing a lot at the config / systems / upgrades help me to reach better (not perfect) values.
sure, you can probably get some 2-3% out of this (ceph-subsystem) by fine-tuning your pg's per OSD on a read_speed/Capacity forumula via primary-affinities (as he has different speeds and different capacities - much less then on my Cluster). It however does not change the large difference between synthetic benchmark and VM-Based bench results. we are talking of a 2-3x discrepancy here.
I can not speak to 3.x clients (i only have 4.x hammer based clients at work and at home)
to be 100% sure you can use bigger datasets.
E.g. 300,, 450, 700 or 1500 second long write/reads. That should give you very realistic results.
Example:
On my 3-node Cluster i have 16, 24 and 32 GB of ram on the nodes. and 450 second read does not differ from 900 second read, whereas a 300 second read gives me 1,6 GB/s reads.
I have a bench on single stock debian in KVM, using virtio and iotread=on + no cache i was able to do 147 MB/s of sustained reads (155 MB/s synthetic benchmark) on a 30 GB dataset where the VM has only 2 GB of ram assigned. Thats using my 3-node cluster described above.
My experience so far is that you ought to be able to receive around 90% of your synthetic benchmark with a virtio iothread=on vm.
Can not help you there. Never used ceph before proxmox 4.xCan iothread option be enabled on 3.x by editing config or does it only work in 4.x?
This whole cluster was built mostly from decomissioned production stuff so its older.
The three ceph nodes are:
That might explain a lot.
- I am assuming here, that without your SSD-Journals, your writes would be in the ballpark of (Readspeed / 2).
- I am also assuming that your journals are big enough to house a couple disks.
That similarly goes for Reading.
Lets assume you have 16GB of Ram on that node, you probably wanna to a read benchmark with 32 GB (twice the ram) to ensure you are producing reads outside of your potential cache range.
This goes back to what udo asked about earlier.
You probably need to run a rados bench with a bigger set and you will see your read/write numbers drop to what happens outside cache ranges.
I'd try a rados bench with 900 write + read in that case.
dd if=/dev/vda bs=1M
3384803328 bytes (3.4 GB) copied, 46.031 s, 73.5 MB/s
Same thing really, try read/writing data that is (VM-Ram x 2), so you do reads/writes outside of the cache of your vm's os.
Last edited: