CEPH WAL/DB monitoring/measurements

Jun 19, 2018
14
0
6
54
Hi,

we already have a 4 node proxmox cluster running ceph and are thinking about expanding it.
We are trying to reevaluate our hardware choices by observing the performance of our current cluster and are now trying to find out how much the WAL and DB is used on our system.
Each node has one intel 900P (280GB) SSD and each of the 12 OSDs (8TB HDD) gets a 20GB partition (we weren't aware of the rocksdb allocation level sizes back then...) from that SSD for WAL/DB. But no matter what benchmark we run on our CEPH, we never get above ~1.5k IOPS on that SSD, despite it housing the WAL/DB of 12 HDDs.
So now to my actual question: is there a way to monitor the WAL and DB usage of an OSD live?
We want to see that it is actually used and then be able to see the usage for different workloads.

This cluster is mostly meant as a data grave where we can just move data to when we run out of storage somewhere else, so performance isn't that importat. We don't want to waste money on optane if other SSDs are good enough (especially if size is more important than performance).
 
So now to my actual question: is there a way to monitor the WAL and DB usage of an OSD live?
Yes, but usage is not performance. To directly check perf counters, run ceph daemon osd.<ID> perf dump or deploy a monitoring through the MGR.
 
  • Like
Reactions: herzkerl

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!