performance ceph vs linu vm

pille99

Active Member
Sep 14, 2022
360
28
28
hello guys
i followed the video tutorial how to install ceph from the proxmox site. everything went smoth except one thing

the command
rados bench -p ceph-test 100 write --no--cleanup
gives me following result:
Total time run: 101.331
Total writes made: 4030
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 159.082
Stddev Bandwidth: 25.6711
Max bandwidth (MB/sec): 236
Min bandwidth (MB/sec): 40
Average IOPS: 39
Stddev IOPS: 6.41777
Max IOPS: 59
Min IOPS: 10
Average Latency(s): 0.399168
Stddev Latency(s): 0.509006
Max latency(s): 3.98511
Min latency(s): 0.0101132
Cleaning up (deleting benchmark objects)
Removed 4030 objects
Clean up completed and total clean up time :3.35888

159 mb write for ssd ?????

the strange thing if i

the following ssd is in the servers (2x) - mzql23t8hcls
https://semiconductor.samsung.com/ssd/datacenter-ssd/pm9a3/mzql23t8hcls-00a07/

in linux, like on the video shown, the results are following
read: 9.5 GB/s
write: 22.2 GB/s
it would mean 9000 MB reading and 22000 mb write (isnt normally reading faster ???)

i have the impression the performance is not good on the proxmox and its much to slow


how can that be explained ?

what can i do to get the max out ? (i followed exactly the video - so, its pretty much everything standard settings)


thx guys for your input
 
How fast is the network between the nodes used for Ceph and how high is the latency between them? For every write operation in Ceph you have a few trips across the network!

How many nodes do you have? If it is a small 3-node cluster, please consider only having 1 OSD per node or a few more. Because the issue you could run into is if one of the (current) 2 OSDs fail, Ceph will try to recover that data on the remaining OSD in the node. Unless the OSDs are only filled to a bit over 40% prior to the loss of one OSD, the remaining OSD will be too full!
If you have more OSDs per nodes, then Ceph can spread the recovered data better, reducing the chance of a single OSD getting too full.
The other option is to have more than 3 nodes so that the data can be recovered to other nodes as well, while still adhering to the rule that there never should be 2 replicas on the same host.

While Ceph can recover from a lot, running out of space is one of the things you have to avoid at all costs.
 
in linux, like on the video shown, the results are following
read: 9.5 GB/s
write: 22.2 GB/s
it would mean 9000 MB reading and 22000 mb write (isnt normally reading faster ???)
Sounds like there was some cache involved in the benchmark.
 
Sounds like there was some cache involved in the benchmark.
maybe the write cache from the virtual machine creating, i choose this option - "write back"
but its pretty close what the nvme can do and what i expect to see
 
How fast is the network between the nodes used for Ceph and how high is the latency between them? For every write operation in Ceph you have a few trips across the network!

How many nodes do you have? If it is a small 3-node cluster, please consider only having 1 OSD per node or a few more. Because the issue you could run into is if one of the (current) 2 OSDs fail, Ceph will try to recover that data on the remaining OSD in the node. Unless the OSDs are only filled to a bit over 40% prior to the loss of one OSD, the remaining OSD will be too full!
If you have more OSDs per nodes, then Ceph can spread the recovered data better, reducing the chance of a single OSD getting too full.
The other option is to have more than 3 nodes so that the data can be recovered to other nodes as well, while still adhering to the rule that there never should be 2 replicas on the same host.

While Ceph can recover from a lot, running out of space is one of the things you have to avoid at all costs.
sorry for late response

the ceph private network is 10 gb
the ceph public network are 1 gb

3 nodes
i have 2 OSD per node, in complete 6 osd, all 4 tb/each nvme drives

it doesnt explain why the read and write is very bad ? 400 mb read (the drive can do something like 6gb) and 140 mb write (something like 900 is possible)
i am happy to hear your input
 
One question for you:
How many bytes per second you can send over a 1 gigabit/s network card?
 
Last edited:
any input ???
if i copy from one drive to another in the same server it makes 8.4 gb per min, which is 130 mb per second. its not acceptable for drives which makes 900mb write per second.
 
the ceph private network is 10 gb
the ceph public network are 1 gb
Do you mean, the Ceph Cluster network is 10 Gbit and the Ceph Public network is using a 1 Gbit link?
That is not ideal. Please have a look at the schemata in the Ceph docs. The Ceph Public network is mandatory and used for the overall Ceph communication as well as communication with the clients (VMs in this case). The Ceph Cluster network is optional and can be used to move the traffic between OSDs (replication, heartbeat) to a different network.

Both need to be fast networks. Reconfigure your network so that the Ceph Public network is also using at least a 10 Gbit link or faster. You could have both using the same physical network by separating the subnets into VLANs for example. If the physical network becomes the bottleneck, you can move one of the networks to additional NICs to spread the load.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!